Peter | 10 Dec 19:39 2005
Picon
Picon

Bio.Geo for NCBI's GEO microarry SOFT files

I've just been looking at the Bio.Geo module by Katharine Lindner, 
contributed back in 2002 which should parse the NCBI's Gene Expression 
Omnibus (GEO) microarray data files.

http://www.ncbi.nlm.nih.gov/geo/

Is anyone using Bio.Geo at the moment?

The NCBI seem to call these SOFT files, (*.soft) and the format is 
documented here:

http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat

Apparently in 2005, they began a switch to a revised file format, new 
format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/

Old format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old/
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old_gz/

As far as I can tell, neither the "old" or "new" versions work in 
Bio.Geo, so there may have been another format change between 2002 and 2005.

In addition the 2005 change introduces new lines, before and after the 
actual data:

!dataset_table_begin
(Continue reading)

Ivan Rossi | 13 Dec 21:35 2005

tiny Align.AlignInfo patch


Dear BioPythoneers,
   I am submitting a tiny patch to the pos_specific_score_matrix method of 
Bio.Align.AlignInfo

It allows for the generation of PSSMs composed by the "alphabet+gap" symbols. 
I use it all the time to generate 21-symbols PSSMs for proteins, that we use 
as inputs for neural networks and HMMs.

The patch is not invasive at all and it preserves the default behavior of 
AlignInfo.pos_specific_score_matrix()

I hope it will be considered for inclusion in the CVS.

Ivan

--
  Ivan Rossi, Ph.D. - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
  BioDec s.r.l., Via Fanin 48, I-40127 Bologna (Italy)
  Phone: +39-051-4200321 - fax: +39-051-4200317 - web: www.biodec.com
*** AlignInfo.py.orig	Tue Dec 13 18:09:22 2005
--- AlignInfo.py	Tue Dec 13 18:18:40 2005
***************
*** 335,341 ****
  
  
      def pos_specific_score_matrix(self, axis_seq = None,
!                                   chars_to_ignore = []):
          """Create a position specific score matrix object for the alignment.
(Continue reading)

Michiel De Hoon | 13 Dec 21:43 2005

RE: tiny Align.AlignInfo patch

Hi Ivan,

Thanks for the patch. But could you submit it through bugzilla? Patches
posted to mailing lists tend to get lost. (They shouldn't, but it happens a
lot in practice).

Thanks again,

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032

-----Original Message-----
From: biopython-dev-bounces <at> portal.open-bio.org on behalf of Ivan Rossi
Sent: Tue 12/13/2005 3:35 PM
To: biopython-dev <at> biopython.org
Subject: [Biopython-dev] tiny Align.AlignInfo patch

Dear BioPythoneers,
   I am submitting a tiny patch to the pos_specific_score_matrix method of 
Bio.Align.AlignInfo

It allows for the generation of PSSMs composed by the "alphabet+gap" symbols.

I use it all the time to generate 21-symbols PSSMs for proteins, that we use 
as inputs for neural networks and HMMs.
(Continue reading)

Peter | 13 Dec 23:23 2005
Picon
Picon

Re: Updates to the tutorial for parsing GenBank files

Are there any others on the list interested in parsing GenBank files who 
wouldn't mind proofreading/commenting on this change to the 
Tutorial/Cookbook?

i.e. Changes to this document, section 3.4 GenBank:

http://www.biopython.org/docs/tutorial/Tutorial004.html#toc13
http://www.biopython.org/docs/tutorial/Tutorial.pdf

The patch is on the mailing list archive here:

http://www.biopython.org/pipermail/biopython-dev/2005-November/002193.html

Or I could log a bug & attach the patch to it.

Would I be better off asking on the Discussion List, rather than the 
Development List for this sort of question?

Bonus question: where could I find multi-record GenBank files?

Peter

On 10 Nov 2005, I wrote:
> There should be a patch attached for Biopython Doc/Tutorial.tex which 
> tries to clarify GenBank parsing.
> 
> Created on Windows using:-
> 
> diff cvs_Tutorial.tex new_Tutorial.tex -E -Naur > patch.txt
> 
(Continue reading)

Peter | 14 Dec 19:33 2005
Picon
Picon

Re: Updates to the tutorial for parsing GenBank files

Marc Colosimo wrote:
> The patch looks go to me , but i could have missed something there. I  
> forgot about the Discussion List. I really should join that list.

Motion seconded - any developer want to accept this?

> Also, I probably will be filling a bug on Bio.Fasta documentation.  
> There are two basic doc changes that should be made:
> 
> Under the doc for Fasta:
> RecordParser  Parses FASTA sequence data into a Record object <-  change 
> to a Fasta.Record object which is not the same as a Seq.Record

Sounds sensible

> Cookbooks:
> 
> Then maybe in the Cookbook, give an example on using  
> Fasta.SequenceParser with title2ids. With out title2ids, you don't  get 
> name or id. You only get description which is the title.  Fasta.Record 
> only has title, which maybe should be renamed   (depreciated to) 
> description to make it the same default behavior as  SequenceParser.

I don't usually bother with the title2ids function either.

I agree that the fact that its .title and .description depending on the 
parser used (Fasta.RecordParser or Fasta.SequenceParser) is odd.

> It seems odd that the Fasta stuff is buried within Chapter 2 (2.4.3  
> Making it easier - plus it is missing "import string").
(Continue reading)

Colosimo, Marc E. | 14 Dec 20:01 2005
Picon

Re: Updates to the tutorial for parsing GenBank files


On 12/14/05 1:33 PM, "Peter" <biopython-dev <at> maubp.freeserve.co.uk> wrote:

> Marc Colosimo wrote:
> 
>> It seems odd that the Fasta stuff is buried within Chapter 2 (2.4.3
>> Making it easier - plus it is missing "import string").
> 
> Yes, but I think it would be better to avoid using the string module
> completely, and use the split method of the string object instead:
> 

I totally agree with you on this. I was just following the coding style used
in the cookbook and not my own.

> from Bio import Fasta
> 
> def parseTitle2Ids(title):
>       return title.split("|")[:3]
> 
> parser = Fasta.SequenceParser(title2ids = parseTitle2Ids)
> file = open("ls_orchid.fasta")
> iterator = Fasta.Iterator(file, parser)
> ...
> 
Marc 
bugzilla-daemon | 15 Dec 20:51 2005

[Bug 1919] New: Transcribe DNA

http://bugzilla.open-bio.org/show_bug.cgi?id=1919

           Summary: Transcribe DNA
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev <at> biopython.org
        ReportedBy: tmagalhaes <at> dcc.fc.up.pt

I was reading some examples in the biopython tutorial and cookbook and for the
first time, since I'd already read it many times, I get confused...
Transcribing the dna sequence ATCG produces the AUCG rna sequence or the UAGC?
Biopython does the first one, but until today I was completely sure that the
correct one is the second.
Probably this is a Tania's bug :) and not a biopython bug, and probably this is
not the right place to put that kind of questions, but at this time I really
don't know how the transcribe works, I'm really confused because in the
internet I found sites where they do like I thought it was (or at least it
seems to me the same thing)...

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon | 15 Dec 22:55 2005

[Bug 1919] Transcribe DNA

http://bugzilla.open-bio.org/show_bug.cgi?id=1919

------- Comment #1 from biopython-bugzilla <at> maubp.freeserve.co.uk  2005-12-15 16:55 -------
Transcription:
DNA {using A,T,C and G} --> mRNA {using A,U,C and G}

Translation:
mRNA {using A,U,C and G} --> Protein {Amino Acids}

Note that the BioPython Translation object can use used to go direct from DNA
{ATCG} to Protein {Amino Acids} which may be helpful.

Are you asking about the effect of complementation that also happens as part of
the transciption in biology?  Because your example was just the four
nucleotides I wasn't entirely clear on what you meant.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Marshall Hampton | 15 Dec 21:06 2005
Picon

Re: [BioPython] blastx works fine?


Hi,

I am a new user of biopython - I like it a lot, thanks for all
those contributions! - and I have been wondering about this too.  It would
help me a lot to automate some blastx searches.  What is the best way to
do this?

Thanks,
Marshall Hampton
Dept. Mathematics & Statistics
University of Minnesota, Duluth

Frank Kauff wrote:

>Hi all,
>
>qblast currently says it works only for blastp and blastn. Actually it
>seems to work fine with blastx as well - xml output parses well with
>NCBIXML. Or am I missing something?
>
>Frank
>
>
>--
>Frank Kauff
>Dept. of Biology
>Duke University
bugzilla-daemon | 18 Dec 21:45 2005

[Bug 1920] Bio.Geo does not support recent GEO files

http://bugzilla.open-bio.org/show_bug.cgi?id=1920

------- Comment #1 from biopython-bugzilla <at> maubp.freeserve.co.uk  2005-12-18 15:45 -------
Created an attachment (id=260)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=260&action=view)
Patch for Bio/Geo/*.py

Changes to the Martel format definition in Bio/Geo/geo_format.py

Changes to the Geo.Iterator in Bio/Geo/__init__.py

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Gmane