Michiel De Hoon | 1 Aug 2007 03:50
Favicon

Re: Improving the Alignment object

Peter wrote:
> I'm not sure if requests for part of a single row or column like
> [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning
> sub-alignments or as special cases (strings/Seq and Seq/SeqRecord
> respectively?).

Jan wrote:
> For instance, the Alignment object should
> support changing characters in the alignment without a need of copying 
> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
> a list of SeqRecord objects with sequences implemented as immutable Seq 
> objects ?
>

If we allow
>>> aln[a,x] = "D"

then we should also allow
>>> aln[a,x:x+4] = "DEFG"
>>> aln[a:a+5,x] = "KLMNO"
and perhaps even
>>> aln[a:a+5,x:x+3] = ["KLMNO","PQRST","UVWXY"]

For consistency, I feel that then aln[a,x:y] and aln[a:b,x] should both
return a string.

--Michiel

Michiel de Hoon
Center for Computational Biology and Bioinformatics
(Continue reading)

Peter | 8 Aug 2007 12:59
Picon
Picon

Re: Subversion Repository (moving from CVS to SVN)

Chris Lasher wrote:
>>> I'm obviously missing another target, and BOSC 2007 is fast
>>> approaching.
 >>
>> Are you going to BOSC 2007 Chris?
> 
> I wish I were going to BOSC, but unfortunately, I will not go. 

While at BOSC 2007 I had a chance to chat to Jason Stajich from BioPerl 
and the Open Bioinformatics Foundation (OBF, the nice guys who look 
after our servers). The BioPerl project is looking at moving from CVS to 
SVN, and assuming that all goes smoothly, moving Biopython over as well 
should be simple enough.

Peter
Michiel De Hoon | 8 Aug 2007 04:57
Favicon

Bio.Wise

Hi everybody,

Bio.Wise currently causes a deprecation warning when running the Biopython
tests (using Biopython from CVS).
This warning is caused by the deprecated Bio.SeqIO.FASTA:

# In Bio.Wise.__init__.py:
from Bio.SeqIO.FASTA import FastaReader, FastaWriter

The FastaReader, FastaWriter functions are used as follows:

        for filename, input_file in zip(pair, input_files):
            input_file.close()
            FastaWriter(file(input_file.name,
"w")).write(FastaReader(file(filename)).next())

To me, it looks like all this does is to read one Fasta record from filename,
and then store it in input_file.
I was wondering why we go through the Fasta reader/writer instead of
reading/writing the file contents directly, as in

        for filename, input_file in zip(pair, input_files):
            input_file.close()
            file(input_file.name, "w").write(file(filename).read())

On a related note, the input_file refers to a temporary file. To create this
temporary file, Bio.Wise prefers to use NamedTemporaryFile in the poly
module, instead of NamedTemporaryFile in the tempfile module:

try:
(Continue reading)

Michiel De Hoon | 9 Aug 2007 02:28
Favicon

Re: Bio.Wise

Sebastian Bassi wrote:
> On 8/7/07, Michiel De Hoon <mdehoon <at> c2b2.columbia.edu> wrote:
> > I was wondering why we go through the Fasta reader/writer instead of
> > reading/writing the file contents directly, as in
> >         for filename, input_file in zip(pair, input_files):
> >             input_file.close()
> >             file(input_file.name, "w").write(file(filename).read())
> 
> The old Fasta writer used to write a 70 column formated fasta file.
> Your method (and I think also the new seq.io) write the fasta data as
> a one big line.

Peter, can we change the behavior of SeqIO.write so that it writes the fasta
data in some fixed column format? For comparison, Bioperl appears to use a
column width of 60 characters:

http://www.bioperl.org/wiki/FASTA_sequence_format

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
Peter | 9 Aug 2007 10:10
Picon
Picon

Re: Wrapping sequences in Fasta output

Michiel De Hoon wrote:
> Sebastian Bassi wrote:
>> On 8/7/07, Michiel De Hoon <mdehoon <at> c2b2.columbia.edu> wrote:
>>> I was wondering why we go through the Fasta reader/writer instead of
>>> reading/writing the file contents directly, as in
>>>         for filename, input_file in zip(pair, input_files):
>>>             input_file.close()
>>>             file(input_file.name, "w").write(file(filename).read())
>> The old Fasta writer used to write a 70 column formated fasta file.
>> Your method (and I think also the new seq.io) write the fasta data as
>> a one big line.

Maybe wise doesn't like its input as one long line?

> Peter, can we change the behavior of SeqIO.write so that it writes the fasta
> data in some fixed column format? For comparison, Bioperl appears to use a
> column width of 60 characters:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> --Michiel.

That would be easy, and might improve compatibility with some tools 
which recommend the lines be at most 80 letters long. 60 does seem to be 
considered a default.

My personal preference is with no line breaks, partly because I tend to 
work more with domain sequences (usually less than 100 characters). This 
also means that when viewing a sequence in a text editor I can simply 
halve the line number to get the record number.
(Continue reading)

bugzilla-daemon | 9 Aug 2007 12:58

[Bug 2323] New functions: GCG Checksum and CRC64

http://bugzilla.open-bio.org/show_bug.cgi?id=2323

------- Comment #26 from tiagoantao <at> gmail.com  2007-08-09 06:58 EST -------
Created an attachment (id=724)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=724&action=view)
Documentation for the GenePop parser

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Sebastian Bassi | 9 Aug 2007 19:22
Picon

Re: Wrapping sequences in Fasta output

On 8/9/07, Peter <biopython-dev <at> maubp.freeserve.co.uk> wrote:
....
> Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files
> with a max sequence line length of 60.

Do you mean a default length of 60, but could be set to other length
if desired (as before with the old fasta writer)? That is good to me.

--

-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
bugzilla-daemon | 11 Aug 2007 07:35

[Bug 2323] New functions: GCG Checksum and CRC64

http://bugzilla.open-bio.org/show_bug.cgi?id=2323

------- Comment #27 from mdehoon <at> ims.u-tokyo.ac.jp  2007-08-11 01:35 EST -------
I have committed the documentation for the GenePop parser to CVS.
Next time, please don't attach your patch to a bug report that is unrelated to
GenePop.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 11 Aug 2007 08:15

[Bug 2323] New functions: GCG Checksum and CRC64

http://bugzilla.open-bio.org/show_bug.cgi?id=2323

------- Comment #28 from mdehoon <at> ims.u-tokyo.ac.jp  2007-08-11 02:15 EST -------
[Comment 22 from Peter]
> I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py
> and noticed that while crc64, gcg and seguid will cope with both strings and
> Seq objects, crc32 will only cope with strings.
> 
> Any objections to me fixing this like so:

[Comment 24 from Michiel]
> A better solution would be for Seq to inherit from str, instead of Seq having
> str as a member. Then we don't have to modify crc32, and other code in
> Biopython will also become simpler.

[Comment 25 from Peter]
> Changing the Seq object to be a subclass of string might be nice... 
> More importantly, wouldn't this dramatic change break a lot of
> existing scripts? Probably something for the mailing list!

OK, so I have committed your solution from comment #22 to CVS.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 11 Aug 2007 08:37

[Bug 2323] New functions: GCG Checksum and CRC64

http://bugzilla.open-bio.org/show_bug.cgi?id=2323

------- Comment #29 from mdehoon <at> ims.u-tokyo.ac.jp  2007-08-11 02:37 EST -------
I have committed the unit test by Peter (from comment #23) to CVS, with some
slight modifications to remove the try/except at the end.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Gmane