Michael Thon | 1 Nov 2010 06:41
Picon
Gravatar

Re: [Biopython] getting the parent of a Clade


On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote:

> 
> 
> On Sun, Oct 31, 2010 at 1:57 PM, Eric Talevich <eric.talevich <at> gmail.com> wrote:
> On Sun, Oct 31, 2010 at 12:03 PM, Michael Thon <mike.thon <at> gmail.com> wrote:
> I have a Clade object and I need to access its parent clade.  I thought that clade.root should do this but this
seems to contain a reference to itself:
> 
> (Pdb) main_clade == main_clade.root
> True
> 
> Is there some other way?
> Thanks
> Mike
> 
> 
> Hi Mike,
> 
> You can do this, assuming you have the original tree object (call it "tree"):
> 
> parent = tree.get_path(main_clade)[-2]
> 
> This is an O(n) operation on the tree, so if you need to do it repeatedly on a large tree, it's faster to call
tree.get_path(clade) once outside the loop and then reuse the resulting list.
> 
> Is the operation you're doing here part of something you'd like to see implemented as a tree method?
> 
> 
(Continue reading)

Peter | 1 Nov 2010 12:34
Picon
Picon

Re: [Biopython] Entrez.efetch problem when querying pccompound database

On Mon, Nov 1, 2010 at 11:24 AM, saikari keitele <saikari78 <at> gmail.com> wrote:
> Many thanks for your reply.
> Does that mean that pccompound and pcassay databases can not be queried
> programmatically, they just have to be queried manually?
> Thanks again

Please ask the NCBI about this, and let us know what they say.

Thank you,

Peter
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

saikari keitele | 1 Nov 2010 12:24
Picon

Re: [Biopython] Entrez.efetch problem when querying pccompound database

Many thanks for your reply.
Does that mean that pccompound and pcassay databases can not be queried
programmatically, they just have to be queried manually?
Thanks again

On Fri, Oct 29, 2010 at 1:13 PM, Peter <biopython <at> maubp.freeserve.co.uk>wrote:

> On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele <saikari78 <at> gmail.com>
> wrote:
> > Hi,
> >
> > I'm using BioPython to query the NCBI pccompound database.
> > I'm trying to retrieve the molecular weight of a compound given its
> > InChIKey.
> > Gettting the ID of the compound with esearch works fine. For instance:
> >
> > Entrez.esearch(db="pccompound",
> > term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]')
> >
> > However, when I try to retrieve the record's content with efetch from the
> ID
> > returned by esearch, like this:
> >
> > Entrez.efetch(db="pcassay", id="2244")
> >
> > I get the following response:
> > ...
> > Error occurred: Report 'ASN1' not found in 'pccompound' presentation
> > ...
> >
(Continue reading)

Chris Fields | 1 Nov 2010 15:50
Favicon
Gravatar

Re: [Biopython] Entrez.efetch problem when querying pccompound database

Try using esummary instead of efetch to get that information programmatically.  Some database
information can't be retrieved via efetch (I think pcassay/pccomopund are two of those), but the summary
of the information for any database is retrievable.  

Using the BioPerl eutil interface, one does this to just dump the returned information.  One can also get at
the various bits of that data programmatically as well using generic constructs, but you have to know the
tag names for the data you are looking for.  There should be an analogous Biopython way to do this.

===================================

use Bio::DB::EUtilities;

my $term = '"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]';

my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                     -term  => $term,
                                     -email => 'cjfields <at> bioperl.org',
                                     -db    => 'pccompound',
                                     -usehistory => 'y');

my $hist = $eutil->next_History || die "Can't retrieve history data";

$eutil->set_parameters(-eutil     => 'esummary',
                         -history   => $hist);

$eutil->print_all;

===================================

chris
(Continue reading)

saikari keitele | 1 Nov 2010 16:53
Picon

Re: [Biopython] Entrez.efetch problem when querying pccompound database

Many thanks! By using esummary as you suggest I can retrieve all the
information from pccompound and pcassay with BioPython.
For instance, for retrieving the molecular weight of a compound given its
InChIKey :

handle=Entrez.esearch(db="pccompound",
term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]')

records = Entrez.read(handle)

molWeight = Entrez.read(Entrez.esummary(db="pccompound",
id=records["IdList"][0]))[0]['MolecularWeight']

Thanks again.

On Mon, Nov 1, 2010 at 2:50 PM, Chris Fields <cjfields <at> illinois.edu> wrote:

> Try using esummary instead of efetch to get that information
> programmatically.  Some database information can't be retrieved via efetch
> (I think pcassay/pccomopund are two of those), but the summary of the
> information for any database is retrievable.
>
> Using the BioPerl eutil interface, one does this to just dump the returned
> information.  One can also get at the various bits of that data
> programmatically as well using generic constructs, but you have to know the
> tag names for the data you are looking for.  There should be an analogous
> Biopython way to do this.
>
> ===================================
>
(Continue reading)

Eric Talevich | 2 Nov 2010 02:20
Picon
Gravatar

Re: [Biopython] getting the parent of a Clade

On Mon, Nov 1, 2010 at 1:41 AM, Michael Thon <mike.thon <at> gmail.com> wrote:

>
> On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote:
> >
> > Is the operation you're doing here part of something you'd like to see
> implemented as a tree method?
> >
> >
> Maybe - it seems to me that if I can access children of a clade from the
> clade, then I should also be able to go the other way and access the parent.
>  I don't know how often people would need this functionality though.
>
> Does a Clade contain a reference to its tree? I have a recursive function
> that does some crunching on a Clade and then recursively processes the child
> clades.  I could pass in  the tree object as well, but I figure that a Clade
> must know about its tree so there should be some way to access it.
>

PyCogent does work that way, but Bio.Phylo's data structure is simpler -- a
Tree has a single root Clade (tree.root or tree.clade), and each Clade has a
plain Python list of child Clades (clade.clades), all the way down. It
doesn't track any references to the parent or the original tree, so the tree
can never have an inconsistent internal state... because there is no
internal state.

I haven't needed the parent references so far for the Tree/Clade methods or
my own scripts, surprisingly. Calling get_path once or twice has been
enough. (I could probably speed up common_ancestor by using the all_parents
dictionary approach in the cookbook, at the expense of memory.)
(Continue reading)

Michael Thon | 2 Nov 2010 10:58
Picon
Gravatar

Re: [Biopython] getting the parent of a Clade

Hi Eric 
> 
> Do you or anyone else want to try plugging that all_parents function into your code to see if it helps
significantly? If it does, I could add it as a Tree/Clade method in the next Biopython release.
> 

I can try it - I have a few 1000 trees to parse so any differences in performance should be more obvious.  

But first, I realized that I should have explained the problem I'm solving in more detail, to see if I'm
approaching it the right way.  I need to visit every node in the tree, and then compare the node to its parent
and do some calculations.  I'm doing this by writing a recursion that starts with tree.clade and then calls
itself twice with clade.clade[0] and clade.clades[1] .  then within the function I need to get the parent
clade and do the calculations.  

def crunch_clade(tree, clade):
	compute_data(clade, get_parent(tree, clade)
	crunch_clade(tree, clade.clades[0])
	crunch_clade(tree, clade.clades[1])

Is there a better way to do it?  Like maybe starting with the terminal clades?

Mike

_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Eric Talevich | 2 Nov 2010 16:44
Picon
Gravatar

Re: [Biopython] getting the parent of a Clade

On Tue, Nov 2, 2010 at 5:58 AM, Michael Thon <mike.thon <at> gmail.com> wrote:

> Hi Eric
> >
> > Do you or anyone else want to try plugging that all_parents function into
> your code to see if it helps significantly? If it does, I could add it as a
> Tree/Clade method in the next Biopython release.
> >
>
>
> I can try it - I have a few 1000 trees to parse so any differences in
> performance should be more obvious.
>
> But first, I realized that I should have explained the problem I'm solving
> in more detail, to see if I'm approaching it the right way.  I need to visit
> every node in the tree, and then compare the node to its parent and do some
> calculations.  I'm doing this by writing a recursion that starts with
> tree.clade and then calls itself twice with clade.clade[0] and
> clade.clades[1] .  then within the function I need to get the parent clade
> and do the calculations.
>
> def crunch_clade(tree, clade):
>        compute_data(clade, get_parent(tree, clade)
>        crunch_clade(tree, clade.clades[0])
>        crunch_clade(tree, clade.clades[1])
>
> Is there a better way to do it?  Like maybe starting with the terminal
> clades?
>
> Mike
(Continue reading)

Erick Matsen | 3 Nov 2010 13:06
Gravatar

[Biopython] Lightweight version of Biopython?

Hello there Biopython community--

We're writing some python code to use SCons for reproducible
bioinformatics research with intelligent dependencies. As part of the
project, we often need to do very simple bioinformatics tasks, such as
reading in various formats and spitting out others. We could use
Biopython for such things, but it's a very heavy dependency for such
trivial tasks.

I'm curious if there exists a Biopython "lite". The ideal situation
would be a tiny module that we could include directly in our project.
I have searched the Biopython mailing list and have yet to find
anything.

Thanks in advance,

Erick
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Peter | 3 Nov 2010 13:10
Picon
Picon

Re: [Biopython] Lightweight version of Biopython?

On Wed, Nov 3, 2010 at 12:06 PM, Erick Matsen <matsen <at> fhcrc.org> wrote:
> Hello there Biopython community--
>
> We're writing some python code to use SCons for reproducible
> bioinformatics research with intelligent dependencies. As part of the
> project, we often need to do very simple bioinformatics tasks, such as
> reading in various formats and spitting out others. We could use
> Biopython for such things, but it's a very heavy dependency for such
> trivial tasks.
>
> I'm curious if there exists a Biopython "lite". The ideal situation
> would be a tiny module that we could include directly in our project.
> I have searched the Biopython mailing list and have yet to find
> anything.
>
> Thanks in advance,
>
> Erick

Hi Erick,

Why do you consider Biopython a heavy dependency? It can
be installed with no 3rd party libraries (although we do strongly
recommend NumPy, if you are not using anything numerical
you don't need it).

Peter
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
(Continue reading)


Gmane