1 Aug 2010 17:14
Re: Python 3 and encoding for online resources
Michiel de Hoon <mjldehoon <at> yahoo.com>
2010-08-01 15:14:23 GMT
2010-08-01 15:14:23 GMT
According to this post: http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3 we need only one parser which always parses a byte stream. Bio.Entrez uses File.UndoHandle but just to look for potential errors in the first few lines when opening the Entrez url, which in my opinion we shouldn't be doing anyway since it's the parser's job to decide whether the input is well-formed. So I'd suggest to not use File.UndoHandle (at all), make sure our parser works with Python 3 byte streams, and ask users to open any downloaded Entrez XML files in binary mode. Is there a Biopython version (in trunk or otherwise) that is ready for Python 3? If so, I can have a look at the parser to see if it handles byte streams correctly. --Michiel. --- On Tue, 7/27/10, Peter <biopython <at> maubp.freeserve.co.uk> wrote: > From: Peter <biopython <at> maubp.freeserve.co.uk> > Subject: [Biopython-dev] Python 3 and encoding for online resources > To: "Biopython-Dev Mailing List" <biopython-dev <at> biopython.org> > Date: Tuesday, July 27, 2010, 9:23 AM > Hi all, > > One of the remaining (pure python) problems with Biopython > under Python 3 relates to parsing online resources like > the > NCBI Entrez API or even Bio.ExPASy.get_sprot_raw(). > See for example test_SeqIO_online.py for a failure. > > In Python 2, urlopen from urlib or urllib2 would give a > string handle. In python 3, you get a bytes handle (not > a unicode handle and choosing the encoding is tricky):(Continue reading)
RSS Feed