bugzilla-daemon | 1 Feb 2010 01:21

[Bug 3004] PSL alignment format parsing in Bio.AlignIO

http://bugzilla.open-bio.org/show_bug.cgi?id=3004

------- Comment #2 from forgetta <at> gmail.com  2010-01-31 19:21 EST -------
Now on github:

http://github.com/vforget/PyBLATPSL

Vince

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Feb 2010 12:17

[Bug 3004] PSL alignment format parsing

http://bugzilla.open-bio.org/show_bug.cgi?id=3004

biopython-bugzilla <at> maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla <at> maubp.freeserve.co.
                   |                            |uk
            Summary|PSL alignment format parsing|PSL alignment format parsing
                   |in Bio.AlignIO              |

------- Comment #3 from biopython-bugzilla <at> maubp.freeserve.co.uk  2010-02-01 06:17 EST -------
(In reply to comment #2)
> Now on github:
> 
> http://github.com/vforget/PyBLATPSL
> 
> Vince
> 

Thanks for the link.

I don't see how this connects to sequence alignments for Bio.AlignIO as
suggested in your original comment (bug title edited accordingly). I see
you are parsing tabular output into an object, with addition methods for
scores etc. This looks fairly useful, but is not appropriate for the
Bio.AlignIO module. Maybe it can go under a new namespace instead, maybe
Bio.BLAT?

(Continue reading)

bugzilla-daemon | 1 Feb 2010 12:27

[Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry?

http://bugzilla.open-bio.org/show_bug.cgi?id=3000

------- Comment #2 from biopython-bugzilla <at> maubp.freeserve.co.uk  2010-02-01 06:26 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > Still, I suspect this will
> > reformat the entry (currently I see trailing dot removed from KEYWORDS, no
> > REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being
> > re-ordered).
> 
> Yes, using Bio.SeqIO to read/write a GenBank record will give you (slightly)
> different output. We do not guarantee a 100% round trip (even on simpler
> formats like FASTA). Even little things like line wrapping would make this
> very difficult.
> 
> Regarding GenBank KEYWORDS, please file a bug.

Don't worry about reporting a bug for this, I've just fixed the missing period
for KEYWORDS:

http://github.com/biopython/biopython/commit/5a87b070fc1f4fb911d4cf8a2e53c330cd6bd83d

Peter

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Feb 2010 14:35

[Bug 2294] Writing GenBank files with Bio.SeqIO

http://bugzilla.open-bio.org/show_bug.cgi?id=2294

------- Comment #17 from biopython-bugzilla <at> maubp.freeserve.co.uk  2010-02-01 08:35 EST -------
(In reply to comment #16)
> 
> > * Writing references
> 
> Not done yet, but for my personal needs this is low priority.

Reference output in GenBank format from SeqIO just committed on github,
http://github.com/biopython/biopython/commit/42707bda738d0239a9ff85a39c39c89c8024549d

> > * Extending to cover writing EBML files
> 
> Not done yet, but should be comparatively straight forward. Let's track this
> possible enhancement on a separate bug.

EMBL output in SeqIO was done a while ago and was included in Biopython 1.52
(although we don't yet write references in EMBL output).

Things still to do on GenBank output include better handling of the LOCUS
line, such as the data division. See also Bug 2578 for the molecule type.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Feb 2010 15:43

[Bug 2294] Writing GenBank files with Bio.SeqIO

http://bugzilla.open-bio.org/show_bug.cgi?id=2294

------- Comment #18 from biopython-bugzilla <at> maubp.freeserve.co.uk  2010-02-01 09:43 EST -------
(In reply to comment #17)
> 
> EMBL output in SeqIO was done a while ago and was included in Biopython 1.52
> (although we don't yet write references in EMBL output).

References in EMBL output implemented now:
http://github.com/biopython/biopython/commit/370e02053a45aec6209bd826aebab7bfc29d7e84

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Peter | 2 Feb 2010 19:37
Picon
Picon

Getting raw unparsed records with SeqIO?

Hi all,

Over on enhancement Bug 3000, Martin was asking about
getting raw unparsed strings for each record in a sequence file:
http://bugzilla.open-bio.org/show_bug.cgi?id=3000

This makes sense for sequential files like FASTA and GenBank,
but not for interlaced files like PHYLIP, and has less obvious
uses when there is any kind of header or footer (e.g. XML or
SFF files).

The particular example Martin gave was selecting a subset of
records in a large GenBank file (I've done this myself in the past).
While this can be done via Bio.SeqIO, the process of parsing
the data into a SeqRecord and saving it again is lossy. While
there is room for improvement. For this particular example, I
suggested Martin use the "old" iterator class in Bio.GenBank.

In general things like white space and wrapping mean that a
SeqIO parse/write cannot guarantee a 100% unaltered round
trip, and will also be slower than using the raw record as a string.

Martin suggested adding an optional argument to the parse
function. I'm not sure this is a good API choice, as it would
dramatically alter the return values. Perhaps we could have
a new iterator function in Bio.SeqIO for suitable sequential
files only which returns a series of strings, one for each
record, unmodified?

Either way I don't see how this would be used - surely
(Continue reading)

bugzilla-daemon | 2 Feb 2010 19:40

[Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry?

http://bugzilla.open-bio.org/show_bug.cgi?id=3000

------- Comment #3 from biopython-bugzilla <at> maubp.freeserve.co.uk  2010-02-02 13:40 EST -------
Created an attachment (id=1436)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1436&action=view)
Adds a get_raw method to the dictionaries returned by Bio.SeqIO.index()

Outline implementation of an alternative proposal, allowing access to the
raw text for each record via the Bio.SeqIO.index() dictionary like objects.
See discussion here:
http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007301.html

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Kristian Rother | 3 Feb 2010 11:29
Picon
Gravatar

report: what happens on 'from Bio import PDB'?


Hi,

I'm currently checking what my application is using its memory for
(because it uses way too much for non-Biopython related things). However,
as soon as the simple command

from Bio import PDB

is executed, these are the objects that Python has in memory after running
the gc:

1 <class 'codecs.CodecInfo'>
1 <class 'ctypes.PyDLL'>
1 <class 'ctypes._endian._swapped_meta'>
1 <class 'numpy.core.numeric._unspecified'>
1 <class 'numpy.lib._datasource._FileOpeners'>
1 <class 'numpy.lib.index_tricks.CClass'>
1 <class 'numpy.lib.index_tricks.RClass'>
1 <class 'numpy.ma.core.MaskedArray'>
1 <class 'numpy.ma.core._maximum_operation'>
1 <class 'numpy.ma.core._minimum_operation'>
1 <class 'numpy.ma.extras.mr_class'>
1 <class 'random.Random'>
1 <class 'site._Helper'>
1 <class 'string._TemplateMetaclass'>
1 <class 'unittest.TestLoader'>
1 <type 'NoneType'>
1 <type 'NotImplementedType'>
1 <type '_ctypes.ArrayType'>
(Continue reading)

Laura Padioleu | 3 Feb 2010 12:35
Picon

Multiple alignment - Clustalw etc...

On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org
<http://lists.open-bio.org/mailman/listinfo/biopython-dev>> wrote:
>*
*>* Hi Folks,
*>*
*>* this is a demo that i use to create then align my fasta sequences
using clustalw. Hope it helps.
here's the code
*
>def clustal(list_struc):
>
>
>	hash_table={}
>	for i in range (len(list_struc)):
>		for j in range (i+1,len(list_struc)):
>			pair=(list_struc[i],list_struc[j])
>			hash_table
>[pair]=0
>
>
>	for pair in hash_table
>.keys():
>		fasta_fic=open("fasta.fasta",'w')
>		for ID in pair:
>			fasta_fic.write(">"+ID.get_id()+'\n')
>
>			# recuperation des sequences des acides amines
>			for chain in ID.get_chains():
>       			ppb = PPBuilder()
>
(Continue reading)

Brad Chapman | 3 Feb 2010 13:46
Gravatar

Re: Multiple alignment - Clustalw etc...

Hi Laura;

[clustalw example from Cymon]

> im using python version 2.5 but i can't compile this code correctly
> what version of python and biopython you are using ?

We could help more with some additional information. Could you copy
and paste the error message you are seeing?

Brad

Gmane