Peter Cock | 1 Feb 18:22
Gravatar

Re: [Biopython] regarding retrieving antigen information of specific gene using Biopython

On Tue, Jan 31, 2012 at 7:00 AM, shweta dubey <sweta.dubey31 <at> gmail.com> wrote:
> hello everyone,
>
> I am new to Biopython.I have a set of genes and i want information of
> antigens specific to these genes from a database(suppose, Antigen
> Database).
>
> How can i do the same using Biopython??
>
> Thanks in advance
>
> Shweta Dubey

Hi,

Which antigen database are you trying to use? If it is one of the
NCBI ones you can probably use their Entrez API via Biopython.

Peter
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Sarttu Bourvir | 2 Feb 16:45
Picon

[Biopython] parsing Blast results (xml)

Hi,
I am new to biopython and having problems parsing a blast reulst file (xml
format).
I can get out alignments, alignment length, title etc.
But I would additionally need to print the query title , percent
similarity, e-value.

How does one do that?  Is there anywhere else than Biopython cookbook and
help(Bio.Blast.NCBIXML.Record) to
look for information. I feel like I don't really understand the
Blast.Record and where in there things can be found.
Is the sequence query title in the header?

Example code would be greatly appreciated!
Thank you,
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Peter Cock | 2 Feb 17:09
Gravatar

Re: [Biopython] parsing Blast results (xml)

On Thu, Feb 2, 2012 at 3:45 PM, Sarttu Bourvir <bpkth2012 <at> gmail.com> wrote:
> Hi,
> I am new to biopython and having problems parsing a blast reulst file (xml
> format).
> I can get out alignments, alignment length, title etc.
> But I would additionally need to print the query title , percent
> similarity, e-value.

Well e-value is easy, and covered in the tutorial - e.g.

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            print '****Alignment****'
            print 'sequence:', alignment.title
            print 'length:', alignment.length
            print 'e value:', hsp.expect
            print hsp.query[0:75] + '...'
            print hsp.match[0:75] + '...'
            print hsp.sbjct[0:75] + '...'

For percentage similarity I think you must use hsp.positives
and the alignment length. Likewise hsp.identities can be used
to get the percentage identity.

> How does one do that?  Is there anywhere else than Biopython
> cookbook and help(Bio.Blast.NCBIXML.Record) to look for information.

I assume you also know about dir(...) as well? e.g. try dir(hsp)
after the above example or dir(alignment) to see what attributes
these objects have.
(Continue reading)

Favicon

[Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others

I am new to Biopython and have tried installing Biopython according to instructions. When I run the test
after installing I get  many errors, 96 errors (see below some examples) in all out of 154 test runs. Two
errors that keep popping up are  not being able to find module Seq and module Alphabet.

ImportError: No module named Seq

ImportError: No module named Alphabet

NameError: name 'Seq' is not defined

NameError: name 'record' is not defined

NameError: name 'protein_rec' is not defined

NameError: name 'protein_rec' is not defined

Dennis

_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Favicon

[Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others

I solved some of my earlier problems with adjusting the path with the sys.path.append, adding directories
where packages are located. However, now I keep getting this error below. I've searched for this error but
can't find any mention of it. Can anyone help?

ERROR: Bio.Wise
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 327, in runTest
    module = __import__(name, None, None, name.split("."))
  File "/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/Wise/__init__.py",
line 20, in <module>
    from Bio import SeqIO
  File
"/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/SeqIO/__init__.py",
line 308, in <module>
    import Seq
  File "/usr/local/biopython/biopython-1.58/Bio/Seq.py", line 31, in <module>
    import ambiguous_dna_complement, ambiguous_rna_complement
ImportError: No module named ambiguous_dna_complement

From: Obukowicz, Dennis R.
Sent: Tuesday, February 07, 2012 8:42 AM
To: 'biopython <at> lists.open-bio.org'
Subject: Problems Installing: Can't find modules Seq and Alphabet plus many others

I am new to Biopython and have tried installing Biopython according to instructions. When I run the test
after installing I get  many errors, 96 errors (see below some examples) in all out of 154 test runs. Two
errors that keep popping up are  not being able to find module Seq and module Alphabet.

ImportError: No module named Seq
(Continue reading)

Picon

[Biopython] comparing sequences.qustion

Hi,

I have a list of > 200, 000   UNIQUE short EQUAL length sequences.
I do the following

I am comparing ALL sequences against ALL sequences so there will be (200000
* 199999 )/2 comparisons
Once a sequence is compared, if they differ from one another by ONE letter
only . then I do another more detailed alignment using a BLOSUM matrix.

Currently I use the pairwise sequence comparison code found in BIOPYTHON
for both comparison, simple comparison where I set
match = 0
mismatch = -1
If the total alignment score is equal to -1 (meaning only one mismatch)
then I go a further step and do a BLOSUM alignment.

This works but its taking a long long time, I suspect its because I am
using TWO alignments but I think there could be a way to do the first
simple alignment WITHOUT using the pairwise alignment code for the first
part will speed up this calculation.
Unfortunately I don't have much more than a desktop to do this, so if
someone can suggest a quicker way to do this, I would appreciate it.

Thank you,
George
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

(Continue reading)

Eric Talevich | 8 Feb 02:50
Picon
Gravatar

Re: [Biopython] comparing sequences.qustion

On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan
<devaniranjan <at> gmail.com>wrote:

> Hi,
>
> I have a list of > 200, 000   UNIQUE short EQUAL length sequences.
> I do the following
>
> I am comparing ALL sequences against ALL sequences so there will be (200000
> * 199999 )/2 comparisons
> Once a sequence is compared, if they differ from one another by ONE letter
> only . then I do another more detailed alignment using a BLOSUM matrix.
>
> Currently I use the pairwise sequence comparison code found in BIOPYTHON
> for both comparison, simple comparison where I set
> match = 0
> mismatch = -1
> If the total alignment score is equal to -1 (meaning only one mismatch)
> then I go a further step and do a BLOSUM alignment.
>
> This works but its taking a long long time, I suspect its because I am
> using TWO alignments but I think there could be a way to do the first
> simple alignment WITHOUT using the pairwise alignment code for the first
> part will speed up this calculation.
> Unfortunately I don't have much more than a desktop to do this, so if
> someone can suggest a quicker way to do this, I would appreciate it.
>
> Thank you,
> George
>
(Continue reading)

Nathan Edwards | 8 Feb 16:50
Picon

Re: [Biopython] comparing sequences.qustion


Classical method (essentially BYP, obligatory reference to Goldberg):

* for each sequence, divide in two, get s1 and s2.
* place the sequences (or an reference/index) in a dictionary with list 
values at key s1 and s2.

This is linear time.

Any pair of sequences that differ in only one position _must_ have at 
least one of their halves in common, so do detailed alignment on all 
pairs of sequences with a common key. You specified unique, so each pair 
must be considered at most once. If you had duplicates, these would be 
aligned for each of their halves (and you'd have to normalize these out, 
somehow). This will be a small fraction of all pairs, assuming these are 
not pathological sequences.

This works well as long as the halves have enough specificity - for DNA 
length 10 halves should work. Note that this doesn't distinguish between 
left-halves and right-halves, which might have the same key values, but 
obviously won't differ by one. Fixing this is an easy modification. BTW, 
this works even for edit-distance. Only concern is the use of the 
in-memory dictionary data-structure, which can get big.

Untested pseudocode:

from collections import defaultdict
from itertools import combinations

n = 20
(Continue reading)

Nathan Edwards | 8 Feb 17:08
Picon

Re: [Biopython] comparing sequences.qustion


Argh, Gusfield "Algorithms on Strings, Trees, and Sequences" is the 
obligatory string matching reference...

- n

On 2/8/2012 10:50 AM, Nathan Edwards wrote:
>
> Classical method (essentially BYP, obligatory reference to Goldberg):
>
> * for each sequence, divide in two, get s1 and s2.
> * place the sequences (or an reference/index) in a dictionary with list
> values at key s1 and s2.
>
> This is linear time.
>
> Any pair of sequences that differ in only one position _must_ have at
> least one of their halves in common, so do detailed alignment on all
> pairs of sequences with a common key. You specified unique, so each pair
> must be considered at most once. If you had duplicates, these would be
> aligned for each of their halves (and you'd have to normalize these out,
> somehow). This will be a small fraction of all pairs, assuming these are
> not pathological sequences.
>
> This works well as long as the halves have enough specificity - for DNA
> length 10 halves should work. Note that this doesn't distinguish between
> left-halves and right-halves, which might have the same key values, but
> obviously won't differ by one. Fixing this is an easy modification. BTW,
> this works even for edit-distance. Only concern is the use of the
> in-memory dictionary data-structure, which can get big.
(Continue reading)

David Martin | 9 Feb 15:36
Picon
Favicon

[Biopython] Proteomics tools in BioPython

We are planning to develop some proteomics tools in python and have a view to submit them as part of Biopython.
Primarily we will be writing wrappers/parsers for the OpenMS tools/output formats and analytic tools on
top of that. If anyone else is working on python wrappers for openms then I'd be happy to share expertise.

..d

Dr David Martin
College of Life Sciences
University of Dundee

The University of Dundee is a registered Scottish Charity, No: SC015096

_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


Gmane