Liu, XiaoChuan | 4 May 2012 00:46

[Biopython] How to use SeqRecord to get the subseq location information


Dear all,

I face a problem: How to use SeqRecord to get the subseq location information?

My code is like this:

>>> from Bio.Seq import Seq

>>> simple_seq = Seq("gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugaguccugcucuuguucugagcaccaccccucucucaga")

>>> from Bio.SeqRecord import SeqRecord

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation

>>> example_feature = SeqFeature(FeatureLocation(25382494, 25382583), type="mRNA", strand=-1)

>>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature])

>>> simple_seq_r
SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga',
Alphabet()), id='17_329.4', name='<unknown name>', description='<unknown description>', dbxrefs=[])

>>> simple_seq_r.features
[SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1)]

>>> simple_seq_r.features[0]
SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1)

>>> subseq=simple_seq_r[3:10]
(Continue reading)

Wibowo Arindrarto | 4 May 2012 08:09
Picon
Gravatar

Re: [Biopython] How to use SeqRecord to get the subseq location information

Hi Liu,

It looks like the problem is caused by the values you put in your
SeqFeature. Your sequence length is less than the feature location
values. If you try plugging in a number in range, like this:

>>> example_feature = SeqFeature(FeatureLocation(5, 7), type="mRNA", strand=-1)

You should still keep the feature in your subsequence, like so:

>>> subseq = simple_seq_r[3:10]
>>> subseq.features
[SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4),
strand=-1), type='mRNA')]

Hope that helps :),

cheers,
Bow
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Liu, XiaoChuan | 4 May 2012 18:19

Re: [Biopython] How to use SeqRecord to get the subseq location information

Hi Bow,

Thank you very much for your helps!
But according to your suggestion, I also face this problem. See below:

>>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1)
>>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature])
>>> simple_seq_r
SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga',
Alphabet()), id='17_329.4', name='<unknown name>', description='<unknown description>', dbxrefs=[])
>>> simple_seq_r.features
[SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)]
>>> subseq=simple_seq_r[3:10]
>>> subseq
SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='<unknown name>',
description='<unknown description>', dbxrefs=[])
>>> subseq.features
[]

I could not get the location information of subseq yet. Why? Thank you very much!

Best,

Xiaochuan

-----Original Message-----
From: Wibowo Arindrarto [mailto:w.arindrarto <at> gmail.com] 
Sent: Friday, May 04, 2012 2:10 AM
To: Liu, XiaoChuan
Cc: biopython <at> biopython.org
(Continue reading)

Peter Cock | 4 May 2012 18:31
Gravatar

Re: [Biopython] How to use SeqRecord to get the subseq location information

On Fri, May 4, 2012 at 5:19 PM, Liu, XiaoChuan <xiaochuan.liu <at> mssm.edu> wrote:
> Hi Bow,
>
> Thank you very much for your helps!
> But according to your suggestion, I also face this problem. See below:
>
>>>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1)
>>>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature])
>>>> simple_seq_r
> SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga',
Alphabet()), id='17_329.4', name='<unknown name>', description='<unknown description>', dbxrefs=[])
>>>> simple_seq_r.features
> [SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)]
>>>> subseq=simple_seq_r[3:10]
>>>> subseq
> SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='<unknown name>',
description='<unknown description>', dbxrefs=[])
>>>> subseq.features
> []
>
> I could not get the location information of subseq yet. Why? Thank you very much!
>

What numbers are you trying to get?

In your example the parent sequence (simple_seq_r) has a feature from
0 to 88, but when you slice a SeqRecord only features fully inside the
slice are kept - so no features are kept for the child record
(subseq). We do not breakup larger features which straddle the cut
sites.
(Continue reading)

Peter Cock | 6 May 2012 13:09
Gravatar

[Biopython] Fwd: 2012 SciPy Bioinformatics Workshop

Dear Biopythoneers,

Are any of us planning to attend the SciPy meeting? The 2012 SciPy
Bioinformatics Workshop is crying out for a Biopython related talk... and
from the email below it sounds like they're not just looking for a
developers perspectives, but also how Python is being used in
bioinformatics.

Is it quite close after BOSC and ISMB but July 19 doesn't actually clash:
http://www.open-bio.org/wiki/BOSC_2012

SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it
clashes with the planned CodeFest too:
http://www.open-bio.org/wiki/EU_Codefest_2012

July is definitely conference season...

Peter

---------- Forwarded message ----------
From: *Chris Mueller*
Date: Thursday, May 3, 2012
Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop
To: "chris.mueller <at> lab7.io" <chris.mueller <at> lab7.io>

We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in
conjunction with SciPy 2012 this July in Austin, TX.

Python in biology is not dead yet... in fact, it's alive and well!

(Continue reading)

Tiago Antão | 6 May 2012 13:16
Picon
Gravatar

Re: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop

Hi,

On Sun, May 6, 2012 at 12:09 PM, Peter Cock <p.j.a.cock <at> googlemail.com> wrote:
> SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it
> clashes with the planned CodeFest too:
> http://www.open-bio.org/wiki/EU_Codefest_2012

Are any people from here going to the codefest?

Tiago
_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

Fields, Christopher J | 6 May 2012 17:03
Favicon
Gravatar

Re: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop


On May 6, 2012, at 6:12 AM, "Peter Cock" <p.j.a.cock <at> googlemail.com> wrote:
> ...
> Is it quite close after BOSC and ISMB but July 19 doesn't actually clash:
> http://www.open-bio.org/wiki/BOSC_2012
> 
> SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it
> clashes with the planned CodeFest too:
> http://www.open-bio.org/wiki/EU_Codefest_2012
> 
> July is definitely conference season...

Galaxy community conference as well.

Chris 

> 
> Peter
> 
> ---------- Forwarded message ----------
> From: *Chris Mueller*
> Date: Thursday, May 3, 2012
> Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop
> To: "chris.mueller <at> lab7.io" <chris.mueller <at> lab7.io>
> 
> 
> We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in
> conjunction with SciPy 2012 this July in Austin, TX.
> 
> Python in biology is not dead yet... in fact, it's alive and well!
(Continue reading)

Lenna Peterson | 6 May 2012 23:26
Picon
Gravatar

[Biopython] GSoC python variant update

Hi all,

I've written a few new posts on my blog; here's the latest:

http://arklenna.tumblr.com/post/22542372076/spot-isa-dog

I will attach a UML diagram and include the part of the post
addressing the diagram. Click through to the full post for a bonus
Einstein quote!

-------

My main goals are not limited to:

 * Make the structure parser and file-format agnostic: an abstracted
OO design should allow anything to be slotted in (for example,
Marjan's C GFF parser?)
 * Maintain encapsulation: limit how much each object can see of
objects above and below it
 * Allow extension at multiple levels: some existing parsers may
process data in different ways; this structure should allow handling
both raw data and data in various formats.

The `Variant` object's constructor allows an end user to change the
default parsers. Practical implementation details of `parse()` and
`write()` will need to be finessed - for example, ways to help the
user sift through immense quantities of data. I'm still in the process
of comparing the data contained in VCF/GVF files as well as the APIs
of PyVCF and BCBio.GFF.

(Continue reading)

Peter Cock | 7 May 2012 10:37
Gravatar

Re: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop

On Sun, May 6, 2012 at 12:16 PM, Tiago Antão <tiagoantao <at> gmail.com> wrote:
> Hi,
>
> On Sun, May 6, 2012 at 12:09 PM, Peter Cock <p.j.a.cock <at> googlemail.com> wrote:
>> SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it
>> clashes with the planned CodeFest too:
>> http://www.open-bio.org/wiki/EU_Codefest_2012
>
> Are any people from here going to the codefest?
>
> Tiago

Brad is going to the pre-BOSC CodeFest in California,
http://www.open-bio.org/wiki/Codefest_2012

I'm not sure if we have any Biopython folk signed up for the post-BOSC
EU CodeFest in Italy yet.
http://www.open-bio.org/wiki/EU_Codefest_2012

I aim to attend one of the CodeFests - trying to firm up summer
travel plans now...

Peter

_______________________________________________
Biopython mailing list  -  Biopython <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython

George Devaniranjan | 8 May 2012 00:25
Picon

[Biopython] PDBParser

Hi,

I have a question about using  PDBParser

from Bio.PDB.PDBParser import PDBParser

parser=PDBParser()

structure=parser.get_structure("test", "1fat.pdb")
model=structure[0]
chain=model["A"]
residue=chain[1]

I want to use it to extract and WRITE to a file the coordinates of residues
10 to 20 only.
(or whatever residue range I specify)

Using the PDB Parser file I can extract residue id  in the range but how to
I back trace and write the file in the exact format that is found in the
PDB so that I can view it in a program like VMD/Pymol?
(that is I want to write the coordinates and all information as found in
the PDB but only for selected residues that I pass into it )
I know I can do it using VMD but I want to do it for thousands of PDB and
would like to write a database of such extracted fragments.

The other alternative is of course to go line by line in each file and
write the lines that match the residue range specified but I was wondering
if there is a way of doing the same thing using the PDBParser?

Thank you,
(Continue reading)


Gmane