Re: Python 3.0 and WSGI 1.0.
2009-05-04 15:02:48 GMT
Hello everybody, I just recently started looking at supporting Python 3 with one of my libraries (Werkzeug), mainly because the MoinMoin projects considers using it which uses the library in question. Right now what Werkzeug does is consider HTTP being Unicode aware in the sense that everything that carries text data is encoded and decoded into a known encoding. This is partially against the specification and not entirely correct, but it works the best on modern browsers and is also what Django and Paste are doing. It's basically that the incoming request data is .decode(encoding)d (usually utf-8) before passed to the user code and unicode data is encoded back into the same encoding before it's sent to the server. Now why is the current behavior of Python 3 a problem here? The encode, decode hack from above is obviously a solution for these kinds of applications, albeit not a good one. Interfaces like mod_wsgi already have the data as bytestring, would decode it from latin1 just that the application can encode it back and decode as utf-8. Not only is this slow but also does this mean that the code does not survive a run through 2to3. Now you could argue that the libraries where wrong in the first place and should support unicode strings that were encoded from latin1 and decoded, but seems like very few libraries support that. Now which strings carry data that could contain non-ascii characters from a source with an unknown encoding? Right now these are the following: * PATH_INFO(Continue reading)