bugzilla-daemon | 1 Aug 11:41 2008

[Bug 2561] New: SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

           Summary: SeqRecord format method to get a string in a given file
                    format
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev <at> biopython.org
        ReportedBy: biopython-bugzilla <at> maubp.freeserve.co.uk

If you have a SeqRecord, it is sometimes useful to be be able to convert it
into a FASTA format string, or indeed any suitable file format.  Note that this
only makes sense for file formats which support a single record, such as
sequential formats like FASTA, GenBank, EMBL, SwissProt, ...

See http://portal.open-bio.org/pipermail/biopython-dev/2008-June/003793.html

PEP 3101 "Advanced String Formatting" describes a new __format__ method for
objects wishing to support the new python format() function in Python 2.6 and
3.0, see http://www.python.org/dev/peps/pep-3101/

In the short term we could expose this functionality as a method named 
tostring(), to_string(), to_format() or some other suitable suggestion.  Using
tostring() would be consistent with the Bio.Seq.Seq and Bio.Seq.MutableSeq
objects (although those do not take a format argument).
(Continue reading)

bugzilla-daemon | 1 Aug 12:01 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #1 from biopython-bugzilla <at> maubp.freeserve.co.uk  2008-08-01 06:01 EST -------
We've have several people request this functionality, and I am keen to add
this.    I think the only issue is the naming of the function (and any default
behaviour - for example calling the __str__ method if given no format).

P.S. As an obvious extension of this idea, it would make sense to me to add a
similar method to the Alignment object using Bio.AlignIO internally.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Aug 12:19 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #2 from mdehoon <at> ims.u-tokyo.ac.jp  2008-08-01 06:19 EST -------
I would be in favor of adding a .tostring(format) method to the SeqRecord
class.
If I am not mistaken, such a method would make SeqIO.write superfluous:

for record in records:
    handle.write(record.tostring(format))

does the same thing as

Bio.SeqIO.write(handle, records, format)

To keep the Biopython API clean, I would therefore suggest to add
record.tostring(format) and to remove SeqIO.write (after properly deprecating
it and having a bunch of releases with SeqIO.write deprecated).

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Aug 12:26 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #3 from biopython-bugzilla <at> maubp.freeserve.co.uk  2008-08-01 06:26 EST -------
(In reply to comment #2)
> I would be in favor of adding a .tostring(format) method to the SeqRecord
> class.

OK.

> If I am not mistaken, such a method would make SeqIO.write superfluous:
> 
> for record in records:
>     handle.write(record.tostring(format))
> 
> does the same thing as
> 
> Bio.SeqIO.write(handle, records, format)

This would do the same thing ONLY for sequential file formats (which admittedly
are the most commonly used ones).  It wouldn't work for anything more
structured with a file header/footer (e.g. any XML format, and most alignment
file formats).

> To keep the Biopython API clean, I would therefore suggest to add
> record.tostring(format) and to remove SeqIO.write (after properly deprecating
> it and having a bunch of releases with SeqIO.write deprecated).

I don't think we can or should deprecate Bio.SeqIO.write() for the reason
above.

(Continue reading)

bugzilla-daemon | 1 Aug 13:38 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #4 from biopython-bugzilla <at> maubp.freeserve.co.uk  2008-08-01 07:38 EST -------
(In reply to comment #2)
> I would be in favor of adding a .tostring(format) method to the SeqRecord
> class.

Were you happy with making the format optional, and defaulting to the full
sequence as a plain string (as in comment 0 of this bug)?

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Aug 15:36 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #5 from mdehoon <at> ims.u-tokyo.ac.jp  2008-08-01 09:36 EST -------
> > If I am not mistaken, such a method would make SeqIO.write superfluous:
> > 
> > for record in records:
> >     handle.write(record.tostring(format))
> > 
> > does the same thing as
> > 
> > Bio.SeqIO.write(handle, records, format)
> 
> This would do the same thing ONLY for sequential file formats (which admittedly
> are the most commonly used ones).  It wouldn't work for anything more
> structured with a file header/footer (e.g. any XML format, and most alignment
> file formats).

I see. Then indeed we still need Bio.SeqIO.write.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Aug 15:37 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #6 from mdehoon <at> ims.u-tokyo.ac.jp  2008-08-01 09:37 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > I would be in favor of adding a .tostring(format) method to the SeqRecord
> > class.
> 
> Were you happy with making the format optional, and defaulting to the full
> sequence as a plain string (as in comment 0 of this bug)?
> 
Yes that makes sense.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 1 Aug 15:41 2008

[Bug 2446] Comments in CT tags cause Bio.Sequencing.Ace.ACEParser to fail.

http://bugzilla.open-bio.org/show_bug.cgi?id=2446

------- Comment #5 from mdehoon <at> ims.u-tokyo.ac.jp  2008-08-01 09:41 EST -------
Some information about these comment blocks from the polyphred developers:

---------------
They are intentional, though I'm not sure they are limited to
Polyphred's tags.

The format that I have typically seen is more like this:

CT{
Contig1 repeat phrap 52 53 555456:555432
COMMENT{
First line.
Second line.
C}
}

Specifically, the CT block always seems to end with the regex '^}$' and
the COMMENT block always ends with '^C}$'. I assume the literal 'C' was
added on the assumption that non-COMMENT-aware parsers would always be
looking for the brace at the beginning of the line. It's not exactly a
C-like, flexible-whitespace format.

In Consed (13.95 Beta; don't ask) adding a tag with a comment produces
this format in the ACE file. I don't know whether this has been changed
in later versions.

Admittedly, the latest Consed documentation does not mention this style.
(Continue reading)

bugzilla-daemon | 1 Aug 16:49 2008

[Bug 2561] SeqRecord format method to get a string in a given file format

http://bugzilla.open-bio.org/show_bug.cgi?id=2561

------- Comment #7 from biopython-bugzilla <at> maubp.freeserve.co.uk  2008-08-01 10:48 EST -------
Created an attachment (id=981)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=981&action=view)
Patch to Bio/SeqRecord.py and Bio/Align/Generic.py

This is a little different from the above suggestion:

(a) I am calling the method .to_format() rather than .tostring().

I think this makes it clearer that it is intended to give some kind file
format, rather than being a variation on the str(...) functionality.  Also,
this name seems to match the planned Python 2.6/3.0 feature fairly well.

We've already labeled the Seq/MutableSeq .tostring() method as "old" and
suggest using str(my_seq) in the documentation instead.  To introducing a new
methods for other objects called .tostring() could be seen as a step backwards.

(b) There is no default format.

While for the SeqRecord, using the raw sequence as a string makes a good file
format neutral choice for the sequence there is no obvious choice for the
Alignment object.  Defaulting to FASTA format in both cases might make sense.

On the other hand, the new format() functionality in python will default to
using the str() behaviour in the absence of a format:

http://www.python.org/dev/peps/pep-3101/
> For all built-in types, an empty format specification will produce
(Continue reading)

bugzilla-daemon | 1 Aug 18:09 2008

[Bug 2446] Comments in CT tags cause Bio.Sequencing.Ace.ACEParser to fail.

http://bugzilla.open-bio.org/show_bug.cgi?id=2446

mdehoon <at> ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

------- Comment #6 from mdehoon <at> ims.u-tokyo.ac.jp  2008-08-01 12:09 EST -------
Fixed in CVS.
Please use Ace.read(handle) instead of Ace.ACEParser().parse(handle),
and Ace.parse(handle) instead of Ace.Iterator(handle, Ace.RecordParser()).

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Gmane