wsgiref 源码阅读

本贴最后更新于 2192 天前,其中的信息可能已经事过境迁


介绍

Web 服务器网关接口(WSGI)是 Web 服务器软件和用 Python 编写的 Web 应用程序之间的标准接口。 wsgiref 是 PEP 333 定义的 WSGI 规范的实现,可用于向 Web 服务器或框架添加 WSGI 支持。wsgiref 提供了以下几个功能:

  • 操作 WSGI 环境变量
  • response headers 的处理
  • 用于实现 WSGI 服务器的基类
  • 简单的 HTTP Server
  • 一个验证工具,用于检查 WSGI 服务器和应用程序(applicatons)是否符合 WSGI 规范

简单示例

from wsgiref.simple_server import make_server

def hello_world_app(environ, start_response):
    """每个 WSGI 应用程序都应该有一个 Application 对象,一个接受
    evirion 和 start_response 参数的 callable(可调用)对象
    """
    status = "200 OK"
    headers = [('Content-Type', 'text/plain; chartset=utf-8')]
    start_response(status, headers)
    return [b'Hello, World']
    
httpd = make_server('', 8000, hello_world_app)
print('Serving on port 8000 ...')
# 服务直到进程被 killed 
httpd.serve_forever()
    

运行上面一段代码,使用 curl -i localhost:8000 访问,结果如下所示:

$ python test_wsgiref.py
Serving on port 8000 ...
127.0.0.1 - - [20/Oct/2018 15:10:13] "GET / HTTP/1.1" 200 12

$ curl -i localhost:8000
HTTP/1.0 200 OK
Date: Sat, 20 Oct 2018 07:10:13 GMT
Server: WSGIServer/0.1 Python/2.7.10
Content-Type: text/plain; chartset=utf-8
Content-Length: 12

Hello, World%

wsgiref 源码结构

可以去 github 的 cpython 项目找到 wsgiref 的源码,下面是 wsgiref 的代码结构

.
├── __init__.py
├── handlers.py
├── headers.py
├── simple_server.py
├── util.py
└── validate.py

* util -- Miscellaneous useful functions and wrappers
* 一些有用的函数和包装器

* headers -- Manage response headers 
* response 头部处理的逻辑

* handlers -- base classes for server/gateway implementations
* 服务端/网关 实现的基类(核心处理部分)

* simple_server -- a simple BaseHTTPServer that supports WSGI
* 一个简单的 WSGI HTTP服务端

* validate -- validation wrapper that sits between an app and a server to detect errors in either
* app 和 server 之间的包装器,用于检测其中的错误

simple_server 模块

前面简单示例中,使用了 simple_server 模块的 make_server 函数来开启一个 WSGI 服务器,所以先从这里当做入口,来看看 make_server 的源码实现:

def make_server(
    host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler
):
    """Create a new WSGI server listening on `host` and `port` for `app`"""
    # 初始化 WSGIServer 实例
    #  WSGIServer -> HTTPServer.__init__
    #             -> TCPServer.__init__
    #               -> TCPServer.server_bind
    #                 -> TCPServer.socket.bind (socket绑定监听地址)
    #               -> TCPServer.socker_activate
    #                 -> TCPServer.socket.listen (开始 TCP 监听)
    server = server_class((host, port), handler_class)
    server.set_app(app)
    return server

从代码看到,这段函数的作用是,监听主机 host 的 port 端口,当收到客户端的请求后,经过 WSGIServer 和 WSGIRequestHandler 处理后,再把处理后的请求发送给 app 应用程序,app 返回请求的结果。
代码虽然只有几行,但从中我们知道,一个 WSGI 服务启动需要一些东西:

  • host、port 监听的地址及端口
  • server_class 用于监听端口,接收请求
  • handler_class 用于处理请求

从上可以看出,生成 server 实例时,默认的 server_class 是 WSGIServer 类,WSGIServer 是 HTTPServer 的子类,HTTPServer 类又是 TCPServer 的子类,而 TCPServer 的基类是 BaseServer。因此在实例化 WSGIServer 时会沿着继承链走下去,最终由 TCPServer 来实现 socket 的绑定(bind)和监听(listen)。

WSGIServer 和 WSGIRequestHandler 的作用

上面说到 WSGI 服务端收到客户端请求后,会经过 WSGIServer 和 WSGIRequestHandler 的处理,那么它们主要做了什么工作呢?可以简单通过一张图来看看:

其中 WSGIServer、WSGIRequestHandler 类的作用如下图所示:
simple_server 模块处理流程
可以看出 WSGIServer 主要是封装了 socket 连接、解析 http 请求然后把请求交给 WSGIRequestHandler 处理。下面进入 WSGIServer 来了解一下,该类具体做了什么:

class WSGIServer(HTTPServer):

    """BaseHTTPServer that implements the Python WSGI protocol"""

    application = None

    def server_bind(self):
        """Override server_bind to store the server name."""
        HTTPServer.server_bind(self)
        self.setup_environ()

    def setup_environ(self):
        # Set up base environment
        env = self.base_environ = {}
        env['SERVER_NAME'] = self.server_name
        env['GATEWAY_INTERFACE'] = 'CGI/1.1'
        env['SERVER_PORT'] = str(self.server_port)
        env['REMOTE_HOST']=''
        env['CONTENT_LENGTH']=''
        env['SCRIPT_NAME'] = ''

    def get_app(self):
        return self.application

    def set_app(self,application):
        self.application = application

通过上面代码,了解到 WSGIServer 继承 HTTPServer, 并在该基础上添加一下符合 WSGI 规范的内容:

  • 重写了 server_bind 函数,作用是初始化 environ 变量
  • 提供了 get_app 和 set_app 来获取或设置 WSGI Applciaton(应用程序)

下面是 WSGIServer 的继承链:

        +------------+
        | BaseServer |
        +------------+
              |
              v
        +------------+        +------------------+
        | TCPServer  |------->| UnixStreamServer |
        +------------+        +------------------+
              |
              v
        +------------+ 
        | HTTPServer |
        +------------+ 
              |
              v
        +------------+ 
        | WSGIServer |
        +------------+  

从继承链中可以看出,WSGIServer 继承自 HTTPServer, 而 HTTPServer 继承自 TCPServer, 而 TCPServer 继承于 BaseServer。HTTPServer 来自于 http 模块的 server.py 部分,其余的来自于 socketserver 模块。

接下来看看 WSGIRequestHandler 类的实现:

class WSGIRequestHandler(BaseHTTPRequestHandler):

    server_version = "WSGIServer/" + __version__

    def get_environ(self):
        env = self.server.base_environ.copy()
        env['SERVER_PROTOCOL'] = self.request_version
        env['SERVER_SOFTWARE'] = self.server_version
        env['REQUEST_METHOD'] = self.command
        if '?' in self.path:
            path,query = self.path.split('?',1)
        else:
            path,query = self.path,''

        env['PATH_INFO'] = urllib.parse.unquote(path, 'iso-8859-1')
        env['QUERY_STRING'] = query

        host = self.address_string()
        if host != self.client_address[0]:
            env['REMOTE_HOST'] = host
        env['REMOTE_ADDR'] = self.client_address[0]

        if self.headers.get('content-type') is None:
            env['CONTENT_TYPE'] = self.headers.get_content_type()
        else:
            env['CONTENT_TYPE'] = self.headers['content-type']

        length = self.headers.get('content-length')
        if length:
            env['CONTENT_LENGTH'] = length

        for k, v in self.headers.items():
            k=k.replace('-','_').upper(); v=v.strip()
            if k in env:
                continue                    # skip content length, type,etc.
            if 'HTTP_'+k in env:
                env['HTTP_'+k] += ','+v     # comma-separate multiple headers
            else:
                env['HTTP_'+k] = v
        return env

    def get_stderr(self):
        return sys.stderr

    def handle(self):
        """Handle a single HTTP request"""
        # 读取客户端发送的请求行
        self.raw_requestline = self.rfile.readline(65537)
        # 如果请求 URI 过长,报 414 错误
        if len(self.raw_requestline) > 65536:
            self.requestline = ''
            self.request_version = ''
            self.command = ''
            self.send_error(414)
            return
        # 解析客户端的请求行和请求头
        if not self.parse_request(): # An error code has been sent, just exit
            return
        
        # 通过 ServerHandler 来调用wsgi application
        handler = ServerHandler(
            self.rfile, self.wfile, self.get_stderr(), self.get_environ()
        )
        handler.request_handler = self      # backpointer for logging
        handler.run(self.server.get_app())

从上面代码看,WSGIRequestHandler 继承自 BaseHTTPRequestHandler,该类主要作用是处理客户端 http 请求,WSGIRequestHandler 在这个的基础上添加符合 wsgi 规范的相关内容。该类提供了几个函数:

  • get_environ: 负责解析 environ 变量, 并添加一下变量
  • handle: 处理 HTTP 请求,将封装好的 environ 变量传给 ServerHandler 处理,并使用 run 函数运行 wsgi application

下面是 WSGIRequestHandler 的继承链

        +------------------------+
        |   BaseRequestHandler   |
        +------------------------+
                    |
                    v
        +------------------------+
        |  StreamRequestHandler  |
        +------------------------+
                    |
                    v
        +------------------------+ 
        | BaseHTTPRequestHandler |
        +------------------------+ 
                    |
                    v
        +------------------------+ 
        |    WSGIRequestHandler  |
        +------------------------+ 

ServerHandler 作用

ServerHandler 类接受参数为 socket 读端(self.rfile),输出端(self.wfile),错误输出端(self.get_stderr)以及一个包含请求信息的字典(self.get_environ), 其中 self.get_environ 函数就是解析 environ 变量部分, 返回包含 web 应用程序的环境变量和请求的环境变量的字典。

ServerHandler 继承自 SimpleHandler, 而 SimpleHandler 继承自 BaseHandler, 下面继续查看 Server Handler 的源码:

# ServerHandler 类
class ServerHandler(SimpleHandler):

    server_software = software_version

    def close(self):
        try:
            self.request_handler.log_request(
                self.status.split(' ',1)[0], self.bytes_sent
            )
        finally:
            SimpleHandler.close(self)

# SimpleHandler 类
class SimpleHandler(BaseHandler):
    """Handler that's just initialized with streams, environment, etc.

    This handler subclass is intended for synchronous HTTP/1.0 origin servers,
    and handles sending the entire response output, given the correct inputs.

    Usage::

        handler = SimpleHandler(
            inp,out,err,env, multithread=False, multiprocess=True
        )
        handler.run(app)"""

    def __init__(self,stdin,stdout,stderr,environ,
        multithread=True, multiprocess=False
    ):
        self.stdin = stdin
        self.stdout = stdout
        self.stderr = stderr
        self.base_env = environ
        self.wsgi_multithread = multithread
        self.wsgi_multiprocess = multiprocess

    def get_stdin(self):
        return self.stdin

    def get_stderr(self):
        return self.stderr

    def add_cgi_vars(self):
        self.environ.update(self.base_env)

    def _write(self,data):
        result = self.stdout.write(data)
        if result is None or result == len(data):
            return
        from warnings import warn
        warn("SimpleHandler.stdout.write() should not do partial writes",
            DeprecationWarning)
        while True:
            data = data[result:]
            if not data:
                break
            result = self.stdout.write(data)

    def _flush(self):
        self.stdout.flush()
        self._flush = self.stdout.flush

# BaseHandler 的部分代码
class BaseHandler:
    def run(self, application):
        """Invoke the application"""
        # Note to self: don't move the close()!  Asynchronous servers shouldn't
        # call close() from finish_response(), so if you close() anywhere but
        # the double-error branch here, you'll break asynchronous servers by
        # prematurely closing.  Async servers must return from 'run()' without
        # closing if there might still be output to iterate over.
        try:
            self.setup_environ()
            self.result = application(self.environ, self.start_response)
            self.finish_response()
        except:
            try:
                self.handle_error()
            except:
                # If we get an error handling an error, just give up already!
                self.close()
                raise   # ...and let the actual server figure it out.

下面是 ServerHandler 的继承链

        +--------------+
        | BaseHandler  |
        +--------------+
              |
              v
        +--------------+ 
        | SimpleServer |
        +--------------+ 
              |
              v
        +---------------+ 
        | ServerHandler |
        +---------------+ 

相关帖子

欢迎来到这里!

我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。

注册 关于
请输入回帖内容 ...