Ben O'Loghlin | 1 Nov 2009 01:58
Picon
Favicon

Re: [Biopython] Entrez.read return value is typed as a string??

Thanks Peter, another small step up the learning curve!

Ben

-----Original Message-----
From: p.j.a.cock <at> googlemail.com [mailto:p.j.a.cock <at> googlemail.com] On Behalf
Of Peter
Sent: Friday, 30 October 2009 2:37 AM
To: Ben O'Loghlin
Cc: Michiel de Hoon; biopython <at> biopython.org
Subject: Re: [Biopython] Entrez.read return value is typed as a string??

On Thu, Oct 29, 2009 at 2:59 PM, Ben O'Loghlin <bassbabyface <at> yahoo.com>
wrote:
> Thanks Michiel.
>
> What is the function of the 'u' in the string discussed below?
> That's the bit that's got me confused.
>
> Best regards,
> Ben
>
> p.s. assistance on this list is fast and useful. Nice!

Again, its a bit of Python basics rather than anything Biopython
specific. The u is for unicode, thus "fred" gives a normal string
while u"fred" gives a unicode string. Unless you are messing
about with odd foreign characters (e.g. letters with accents) you
won't have to worry about this. Python 3 gets rid of the dichotomy
by using unicode for all strings.
(Continue reading)

Kyle Ellrott | 2 Nov 2009 21:06
Picon

[Biopython] Using SeqLocation to extract subsequence

This should be a relatively simple question, but I didn't find any google
hits...
I'm parsing a genbank file of a chromosome, and I want to take the
FeatureLocation data from a SeqFeature and extract the referenced DNA.
Basically take a 'CDS' feature and get the gene DNA that coded it.  Is there
a function that I can pass the location data from a feature record and it
will extract the DNA, including doing segment joining and reverse
translation?

I could write this myself, but it seems like a better idea to use something
that has been well tested.

Kyle
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Peter | 2 Nov 2009 21:24
Picon
Picon

Re: [Biopython] Using SeqLocation to extract subsequence

On Mon, Nov 2, 2009 at 8:06 PM, Kyle Ellrott <kellrott <at> gmail.com> wrote:
> This should be a relatively simple question, but I didn't find any google
> hits...
>
> I'm parsing a genbank file of a chromosome, and I want to take the
> FeatureLocation data from a SeqFeature and extract the referenced DNA.
> Basically take a 'CDS' feature and get the gene DNA that coded it.  Is there
> a function that I can pass the location data from a feature record and it
> will extract the DNA, including doing segment joining and reverse
> translation?
>
> I could write this myself, but it seems like a better idea to use something
> that has been well tested.

You missed this thread earlier this month:
http://lists.open-bio.org/pipermail/biopython/2009-October/005695.html

Are you on the dev mailing list? I was hoping to get a little discussion
going there, before moving over to the discussion list for more general
comment. The code mentioned there is the best tested bit of code I
can suggest for now:
http://lists.open-bio.org/pipermail/biopython-dev/2009-October/006922.html

Note there is no such thing as a SeqLocation object. There is a
FeatureLocation, but you need the strand information - hence my
code requires a SeqFeature object to fully describe the location.

Peter

_______________________________________________
(Continue reading)

Kyle Ellrott | 2 Nov 2009 22:31
Picon

Re: [Biopython] Using SeqLocation to extract subsequence

>
> You missed this thread earlier this month:
> http://lists.open-bio.org/pipermail/biopython/2009-October/005695.html
>
> Are you on the dev mailing list? I was hoping to get a little discussion
> going there, before moving over to the discussion list for more general
> comment.

I didn't need to do it when the original discussion came through, so it got
'filtered' ;-)  I guess if multiple people are asking the same question
independently, it's probably a timely issue.

I'll probably go ahead and pull the SeqRecord fork into my git fork and
start playing around with it.

Kyle
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Peter | 2 Nov 2009 23:30
Picon
Picon

Re: [Biopython] Using SeqLocation to extract subsequence

On Mon, Nov 2, 2009 at 9:31 PM, Kyle Ellrott <kellrott <at> gmail.com> wrote:
>>
>> You missed this thread earlier this month:
>> http://lists.open-bio.org/pipermail/biopython/2009-October/005695.html
>>
>> Are you on the dev mailing list? I was hoping to get a little discussion
>> going there, before moving over to the discussion list for more general
>> comment.
>
> I didn't need to do it when the original discussion came through, so it got
> 'filtered' ;-)  I guess if multiple people are asking the same question
> independently, it's probably a timely issue.
>
> I'll probably go ahead and pull the SeqRecord fork into my git fork and
> start playing around with it.

Cool - sorry if the previous email was brusque - I was in the middle
of dinner preparation and shouldn't have been checking emails.

If you just want to try the sequence extraction for a SeqFeature,
the code is on the trunk (as noted, as a function in a unit test).
My SeqRecord github branch is looking at other issues.

Peter

_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

(Continue reading)

Peter | 3 Nov 2009 13:52
Picon
Picon

Re: [Biopython] Problems parsing with PSIBlastParser

On Fri, Oct 16, 2009 at 1:04 AM, Michiel de Hoon <mjldehoon <at> yahoo.com> wrote:
>
> Last time I checked (which was a few weeks ago), a multiple-query PSIBlast
> search gives a file consisting of concatenated XML files. The problem is in
> the design of Blast XML output. For a single-query PSIBlast, the fields under
> <BlastOutput_iterations> are used to store the output of the PSIBlast iterations.
> For multiple-query regular Blast, the same fields are used to store the search
> results of each query. With multiple-query PSIBlast, there is then no way to
> store the output in the current XML format. I've been meaning to write to NCBI
> about this, but I haven't gotten round to it yet. Will do so this weekend.
>
> --Michiel.

Did you get any reply?

Peter
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Michiel de Hoon | 3 Nov 2009 13:56
Picon
Favicon

Re: [Biopython] Problems parsing with PSIBlastParser

> Did you get any reply?
> 
Yes, but just that they'll look into it. Nothing concrete yet, but I guess changing the Blast XML output is
something that needs to be done very carefully, so it may take a while. Will keep you guys posted if I get a reply.

--Michiel.

--- On Tue, 11/3/09, Peter <biopython <at> maubp.freeserve.co.uk> wrote:

> From: Peter <biopython <at> maubp.freeserve.co.uk>
> Subject: Re: [Biopython] Problems parsing with PSIBlastParser
> To: "Michiel de Hoon" <mjldehoon <at> yahoo.com>
> Cc: "Biopython Mailing List" <biopython <at> lists.open-bio.org>
> Date: Tuesday, November 3, 2009, 7:52 AM
> On Fri, Oct 16, 2009 at 1:04 AM,
> Michiel de Hoon <mjldehoon <at> yahoo.com>
> wrote:
> >
> > Last time I checked (which was a few weeks ago), a
> multiple-query PSIBlast
> > search gives a file consisting of concatenated XML
> files. The problem is in
> > the design of Blast XML output. For a single-query
> PSIBlast, the fields under
> > <BlastOutput_iterations> are used to store the
> output of the PSIBlast iterations.
> > For multiple-query regular Blast, the same fields are
> used to store the search
> > results of each query. With multiple-query PSIBlast,
> there is then no way to
(Continue reading)

Peter | 3 Nov 2009 14:32
Picon
Picon

Re: [Biopython] Problems parsing with PSIBlastParser

On Tue, Nov 3, 2009 at 1:16 PM, Chris Fields <cjfields <at> illinois.edu> wrote:
>
> We had the same problem w/ the BioPerl XML parser and ended up preprocessing
> the data into separate XML files, carrying over the relevant information
> into each file (yes, there is a better way, but it essentially involves a
> redesign of the XML parser and related objects).
>
> BTW, the same thing happens if one runs multiple queries in the same file.
>  All individual report XML are in one single XML file, and information
> relevant to all reports is only found into the first report.  I think this
> has been known for a while.  I've repeatedly tried contacting NCBI but
> haven't had a response re: this problem.
>
> chris

Hi Chris,

Old versions of blastall (also) used to produce concatenated XML files for
multiple queries, but from about 2.2.14 they started (ab)using the iteration
fields originally for PSI-BLAST output to hold multiple queries (there was
some discussion of this on Biopython Bugs 1933 and 1970 - Biopython
*should* cope with either).

Apparently pgpblast was left producing concatenated XML files.
The upshot of this is multi-query BLASTP etc XML files look just like
single query multi-round PSI-BLAST XML files. This means having a
single BLAST XML parser that automatically treats the two differently
is tricky.

Does that fit with your experience?
(Continue reading)

Chris Fields | 3 Nov 2009 14:16
Favicon
Gravatar

Re: [Biopython] Problems parsing with PSIBlastParser

Peter,

On Nov 3, 2009, at 6:52 AM, Peter wrote:

> On Fri, Oct 16, 2009 at 1:04 AM, Michiel de Hoon  
> <mjldehoon <at> yahoo.com> wrote:
>>
>> Last time I checked (which was a few weeks ago), a multiple-query  
>> PSIBlast
>> search gives a file consisting of concatenated XML files. The  
>> problem is in
>> the design of Blast XML output. For a single-query PSIBlast, the  
>> fields under
>> <BlastOutput_iterations> are used to store the output of the  
>> PSIBlast iterations.
>> For multiple-query regular Blast, the same fields are used to store  
>> the search
>> results of each query. With multiple-query PSIBlast, there is then  
>> no way to
>> store the output in the current XML format. I've been meaning to  
>> write to NCBI
>> about this, but I haven't gotten round to it yet. Will do so this  
>> weekend.
>>
>> --Michiel.
>
> Did you get any reply?
>
> Peter

(Continue reading)

Chris Fields | 3 Nov 2009 14:40
Favicon
Gravatar

Re: [Biopython] Problems parsing with PSIBlastParser

On Nov 3, 2009, at 7:32 AM, Peter wrote:

> On Tue, Nov 3, 2009 at 1:16 PM, Chris Fields <cjfields <at> illinois.edu>  
> wrote:
>>
>> We had the same problem w/ the BioPerl XML parser and ended up  
>> preprocessing
>> the data into separate XML files, carrying over the relevant  
>> information
>> into each file (yes, there is a better way, but it essentially  
>> involves a
>> redesign of the XML parser and related objects).
>>
>> BTW, the same thing happens if one runs multiple queries in the  
>> same file.
>>  All individual report XML are in one single XML file, and  
>> information
>> relevant to all reports is only found into the first report.  I  
>> think this
>> has been known for a while.  I've repeatedly tried contacting NCBI  
>> but
>> haven't had a response re: this problem.
>>
>> chris
>
> Hi Chris,
>
> Old versions of blastall (also) used to produce concatenated XML  
> files for
> multiple queries, but from about 2.2.14 they started (ab)using the  
(Continue reading)


Gmane