wsgiref 源码阅读

本贴最后更新于 2363 天前,其中的信息可能已经事过境迁


介绍

Web 服务器网关接口(WSGI)是 Web 服务器软件和用 Python 编写的 Web 应用程序之间的标准接口。 wsgiref 是 PEP 333 定义的 WSGI 规范的实现,可用于向 Web 服务器或框架添加 WSGI 支持。wsgiref 提供了以下几个功能:

  • 操作 WSGI 环境变量
  • response headers 的处理
  • 用于实现 WSGI 服务器的基类
  • 简单的 HTTP Server
  • 一个验证工具,用于检查 WSGI 服务器和应用程序(applicatons)是否符合 WSGI 规范

简单示例

from wsgiref.simple_server import make_server def hello_world_app(environ, start_response): """每个 WSGI 应用程序都应该有一个 Application 对象,一个接受 evirion 和 start_response 参数的 callable(可调用)对象 """ status = "200 OK" headers = [('Content-Type', 'text/plain; chartset=utf-8')] start_response(status, headers) return [b'Hello, World'] httpd = make_server('', 8000, hello_world_app) print('Serving on port 8000 ...') # 服务直到进程被 killed httpd.serve_forever()

运行上面一段代码,使用 curl -i localhost:8000 访问,结果如下所示:

$ python test_wsgiref.py Serving on port 8000 ... 127.0.0.1 - - [20/Oct/2018 15:10:13] "GET / HTTP/1.1" 200 12 $ curl -i localhost:8000 HTTP/1.0 200 OK Date: Sat, 20 Oct 2018 07:10:13 GMT Server: WSGIServer/0.1 Python/2.7.10 Content-Type: text/plain; chartset=utf-8 Content-Length: 12 Hello, World%

wsgiref 源码结构

可以去 github 的 cpython 项目找到 wsgiref 的源码,下面是 wsgiref 的代码结构

. ├── __init__.py ├── handlers.py ├── headers.py ├── simple_server.py ├── util.py └── validate.py * util -- Miscellaneous useful functions and wrappers * 一些有用的函数和包装器 * headers -- Manage response headers * response 头部处理的逻辑 * handlers -- base classes for server/gateway implementations * 服务端/网关 实现的基类(核心处理部分) * simple_server -- a simple BaseHTTPServer that supports WSGI * 一个简单的 WSGI HTTP服务端 * validate -- validation wrapper that sits between an app and a server to detect errors in either * app 和 server 之间的包装器,用于检测其中的错误

simple_server 模块

前面简单示例中,使用了 simple_server 模块的 make_server 函数来开启一个 WSGI 服务器,所以先从这里当做入口,来看看 make_server 的源码实现:

def make_server( host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler ): """Create a new WSGI server listening on `host` and `port` for `app`""" # 初始化 WSGIServer 实例 # WSGIServer -> HTTPServer.__init__ # -> TCPServer.__init__ # -> TCPServer.server_bind # -> TCPServer.socket.bind (socket绑定监听地址) # -> TCPServer.socker_activate # -> TCPServer.socket.listen (开始 TCP 监听) server = server_class((host, port), handler_class) server.set_app(app) return server

从代码看到,这段函数的作用是,监听主机 host 的 port 端口,当收到客户端的请求后,经过 WSGIServer 和 WSGIRequestHandler 处理后,再把处理后的请求发送给 app 应用程序,app 返回请求的结果。
代码虽然只有几行,但从中我们知道,一个 WSGI 服务启动需要一些东西:

  • host、port 监听的地址及端口
  • server_class 用于监听端口,接收请求
  • handler_class 用于处理请求

从上可以看出,生成 server 实例时,默认的 server_class 是 WSGIServer 类,WSGIServer 是 HTTPServer 的子类,HTTPServer 类又是 TCPServer 的子类,而 TCPServer 的基类是 BaseServer。因此在实例化 WSGIServer 时会沿着继承链走下去,最终由 TCPServer 来实现 socket 的绑定(bind)和监听(listen)。

WSGIServer 和 WSGIRequestHandler 的作用

上面说到 WSGI 服务端收到客户端请求后,会经过 WSGIServer 和 WSGIRequestHandler 的处理,那么它们主要做了什么工作呢?可以简单通过一张图来看看:

其中 WSGIServer、WSGIRequestHandler 类的作用如下图所示:
simple_server 模块处理流程
可以看出 WSGIServer 主要是封装了 socket 连接、解析 http 请求然后把请求交给 WSGIRequestHandler 处理。下面进入 WSGIServer 来了解一下,该类具体做了什么:

class WSGIServer(HTTPServer): """BaseHTTPServer that implements the Python WSGI protocol""" application = None def server_bind(self): """Override server_bind to store the server name.""" HTTPServer.server_bind(self) self.setup_environ() def setup_environ(self): # Set up base environment env = self.base_environ = {} env['SERVER_NAME'] = self.server_name env['GATEWAY_INTERFACE'] = 'CGI/1.1' env['SERVER_PORT'] = str(self.server_port) env['REMOTE_HOST']='' env['CONTENT_LENGTH']='' env['SCRIPT_NAME'] = '' def get_app(self): return self.application def set_app(self,application): self.application = application

通过上面代码,了解到 WSGIServer 继承 HTTPServer, 并在该基础上添加一下符合 WSGI 规范的内容:

  • 重写了 server_bind 函数,作用是初始化 environ 变量
  • 提供了 get_app 和 set_app 来获取或设置 WSGI Applciaton(应用程序)

下面是 WSGIServer 的继承链:

+------------+ | BaseServer | +------------+ | v +------------+ +------------------+ | TCPServer |------->| UnixStreamServer | +------------+ +------------------+ | v +------------+ | HTTPServer | +------------+ | v +------------+ | WSGIServer | +------------+

从继承链中可以看出,WSGIServer 继承自 HTTPServer, 而 HTTPServer 继承自 TCPServer, 而 TCPServer 继承于 BaseServer。HTTPServer 来自于 http 模块的 server.py 部分,其余的来自于 socketserver 模块。

接下来看看 WSGIRequestHandler 类的实现:

class WSGIRequestHandler(BaseHTTPRequestHandler): server_version = "WSGIServer/" + __version__ def get_environ(self): env = self.server.base_environ.copy() env['SERVER_PROTOCOL'] = self.request_version env['SERVER_SOFTWARE'] = self.server_version env['REQUEST_METHOD'] = self.command if '?' in self.path: path,query = self.path.split('?',1) else: path,query = self.path,'' env['PATH_INFO'] = urllib.parse.unquote(path, 'iso-8859-1') env['QUERY_STRING'] = query host = self.address_string() if host != self.client_address[0]: env['REMOTE_HOST'] = host env['REMOTE_ADDR'] = self.client_address[0] if self.headers.get('content-type') is None: env['CONTENT_TYPE'] = self.headers.get_content_type() else: env['CONTENT_TYPE'] = self.headers['content-type'] length = self.headers.get('content-length') if length: env['CONTENT_LENGTH'] = length for k, v in self.headers.items(): k=k.replace('-','_').upper(); v=v.strip() if k in env: continue # skip content length, type,etc. if 'HTTP_'+k in env: env['HTTP_'+k] += ','+v # comma-separate multiple headers else: env['HTTP_'+k] = v return env def get_stderr(self): return sys.stderr def handle(self): """Handle a single HTTP request""" # 读取客户端发送的请求行 self.raw_requestline = self.rfile.readline(65537) # 如果请求 URI 过长,报 414 错误 if len(self.raw_requestline) > 65536: self.requestline = '' self.request_version = '' self.command = '' self.send_error(414) return # 解析客户端的请求行和请求头 if not self.parse_request(): # An error code has been sent, just exit return # 通过 ServerHandler 来调用wsgi application handler = ServerHandler( self.rfile, self.wfile, self.get_stderr(), self.get_environ() ) handler.request_handler = self # backpointer for logging handler.run(self.server.get_app())

从上面代码看,WSGIRequestHandler 继承自 BaseHTTPRequestHandler,该类主要作用是处理客户端 http 请求,WSGIRequestHandler 在这个的基础上添加符合 wsgi 规范的相关内容。该类提供了几个函数:

  • get_environ: 负责解析 environ 变量, 并添加一下变量
  • handle: 处理 HTTP 请求,将封装好的 environ 变量传给 ServerHandler 处理,并使用 run 函数运行 wsgi application

下面是 WSGIRequestHandler 的继承链

+------------------------+ | BaseRequestHandler | +------------------------+ | v +------------------------+ | StreamRequestHandler | +------------------------+ | v +------------------------+ | BaseHTTPRequestHandler | +------------------------+ | v +------------------------+ | WSGIRequestHandler | +------------------------+

ServerHandler 作用

ServerHandler 类接受参数为 socket 读端(self.rfile),输出端(self.wfile),错误输出端(self.get_stderr)以及一个包含请求信息的字典(self.get_environ), 其中 self.get_environ 函数就是解析 environ 变量部分, 返回包含 web 应用程序的环境变量和请求的环境变量的字典。

ServerHandler 继承自 SimpleHandler, 而 SimpleHandler 继承自 BaseHandler, 下面继续查看 Server Handler 的源码:

# ServerHandler 类 class ServerHandler(SimpleHandler): server_software = software_version def close(self): try: self.request_handler.log_request( self.status.split(' ',1)[0], self.bytes_sent ) finally: SimpleHandler.close(self) # SimpleHandler 类 class SimpleHandler(BaseHandler): """Handler that's just initialized with streams, environment, etc. This handler subclass is intended for synchronous HTTP/1.0 origin servers, and handles sending the entire response output, given the correct inputs. Usage:: handler = SimpleHandler( inp,out,err,env, multithread=False, multiprocess=True ) handler.run(app)""" def __init__(self,stdin,stdout,stderr,environ, multithread=True, multiprocess=False ): self.stdin = stdin self.stdout = stdout self.stderr = stderr self.base_env = environ self.wsgi_multithread = multithread self.wsgi_multiprocess = multiprocess def get_stdin(self): return self.stdin def get_stderr(self): return self.stderr def add_cgi_vars(self): self.environ.update(self.base_env) def _write(self,data): result = self.stdout.write(data) if result is None or result == len(data): return from warnings import warn warn("SimpleHandler.stdout.write() should not do partial writes", DeprecationWarning) while True: data = data[result:] if not data: break result = self.stdout.write(data) def _flush(self): self.stdout.flush() self._flush = self.stdout.flush # BaseHandler 的部分代码 class BaseHandlerdef run(self, application): """Invoke the application""" # Note to self: don't move the close()! Asynchronous servers shouldn't # call close() from finish_response(), so if you close() anywhere but # the double-error branch here, you'll break asynchronous servers by # prematurely closing. Async servers must return from 'run()' without # closing if there might still be output to iterate over. try: self.setup_environ() self.result = application(self.environ, self.start_response) self.finish_response() except: try: self.handle_error() except: # If we get an error handling an error, just give up already! self.close() raise # ...and let the actual server figure it out.

下面是 ServerHandler 的继承链

+--------------+ | BaseHandler | +--------------+ | v +--------------+ | SimpleServer | +--------------+ | v +---------------+ | ServerHandler | +---------------+

相关帖子

欢迎来到这里!

我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。

注册 关于
请输入回帖内容 ...