Frederico Moraes Ferreira | 22 Jul 20:14 2014
Picon
Picon

[Biopython] PDB parser: suppress warnings

Hi Biopythonees,
Has anybody happen to know how to suppress warnings when reading PDBs 
with hydrogenated waters?

from Bio.PDB.PDBParser import PDBParser
struct = PDBParser().get_structure('tmp', pdbf)

My PDBs came from MD so as all water molecules have hydrogens causing 
thousands of warnings like:
...
/usr/lib64/python2.7/site-packages/Bio/PDB/PDBParser.py:284: 
PDBConstructionWarning: PDBConstructionException: Atom 2H defined twice 
in residue <Residue HOH het=W resseq=2640 icode= > at line 71037.
Exception ignored.
Some atoms or residues may be missing in the data structure.
   % message, PDBConstructionWarning)
...

Thanks in advance.
Fred

--

-- 
Dr. Frederico Moraes Ferreira
University of Sao Paulo
Heart Institute, School of Medicine
Laboratoy of Immunology
Av. Dr. Enéas de Carvalho Aguiar, 44
05403-900     Sao Paulo - SP
Brasil

(Continue reading)

Mark Budde | 16 Jul 21:03 2014
Picon

[Biopython] upgrade problems on Mac

Hi all,
I am trying to upgrade to the current version on my Mac, but get an error. I've tried the suggestions online about

$ xcode-select --install

but that doesn't work.

I'm running enthought 2.7 (Python 2.7.3 -- EPD 7.3-2 (32-bit)) on OSX Version 10.9.3

Any help would be much appreciated.

Thanks,

-Mark


Here is what my session looks like:

*****************************

Marks-MacBook-Pro-3:biopython-1.64 markbudde$ sudo python setup.py install

Password:

running install

running build

running build_py

running build_ext

building 'Bio.cpairwise2' extension

Compiling with an SDK that doesn't seem to exist: /Developer/SDKs/MacOSX10.5.sdk

Please check your Xcode installation

gcc -DNDEBUG -g -O3 -arch i386 -isysroot /Developer/SDKs/MacOSX10.5.sdk -Qunused-arguments -Qunused-arguments -I/Library/Frameworks/Python.framework/Versions/7.3/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.5-i386-2.7/Bio/cpairwise2module.o

clang: warning: no such sysroot directory: '/Developer/SDKs/MacOSX10.5.sdk'

In file included from Bio/cpairwise2module.c:12:

/Library/Frameworks/Python.framework/Versions/7.3/include/python2.7/Python.h:33:10: fatal error: 'stdio.h' file not found

#include <stdio.h>

         ^

1 error generated.

error: command 'gcc' failed with exit status 1



_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Evan Parker | 16 Jul 19:24 2014
Picon

[Biopython] GSoC Biopython project update

Hello fellow Biopythoneers and OBF-SoC students,

I have a new blog post regarding my ongoing work with the lazy-loading and indexing sequence file parser. I've been stabilizing and testing the GenBank parser this last week. You can always check out my github commit log if you want to see the most recent changes.

- Evan Parker.
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
halima saker | 11 Jul 13:23 2014
Picon

[Biopython] blast against genomes in biopython

from Bio import Entrez, SeqIO from Bio.Blast import NCBIXML from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast("blastn","nr", "CACTTATTTAGTTAGCTTGCAACCCTGGATTTTTGTTTACTGGAGAGGCC",entrez_query='"Beutenbergia cavernae DSM 12333" [Organism]') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: for alignment in blast_record.alignments: for hsp in alignment.hsps: print(hsp.query[0:75] + '...') print(hsp.match[0:75] + '...') print(hsp.sbjct[0:75] + '...')



this does not give me an output, although the sequence is actually a sequence of the genome, so i must get a result. where is the error? the query is correct?
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Kenan Azam | 7 Jul 20:07 2014
Picon

[Biopython] Retrieving protein transcripts for a protein coding gene

Hi there,

I am using biopython's wrapper API for ncbi eutils to retrieve related proteins, identical proteins and variant proteins (transcripts, splice variants, etc) for a certain protein coding gene. 

This information is displayed for a protein coding gene on its ncbi page under the "mRNA and Protein(s)" section

http://www.ncbi.nlm.nih.gov/gene/10555

I am retrieving identical proteins via LinkName=protein_protein_identical and related via LinkName=protein_protein

http://www.ncbi.nlm.nih.gov/protein?LinkName=protein_protein_identical&from_uid=69122971

Is there a way to retrieve the transcripts for a protein coding gene?

Thank you, K

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Evan Parker | 2 Jul 19:01 2014
Picon

[Biopython] GSoC Biopython update:

Hello fellow Biopythoneers and OBF-SoC students,

I have a new blog post regarding my ongoing work with the GenBank file format. You can always check out my github repository if you want to see my most recent commits.

-Evan Parker
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Ismail Uddin | 29 Jun 12:43 2014
Picon

[Biopython] FASTA file parsing

Dear Sir or Madam,

I would like to post a question regarding FASTA file parsing using the BioPython module. The current tutorial online indicates how to parse a FASTA file, but the output is in the format Seq('<<sequence here>>', SingleLetterAlphabet())

I would like to know how one may simply print out the entire sequence without any adjoining text i.e. 'ACTACGGCGAT'

I ask this question, as I am trying to write a script that will read each entry in the FASTA file and produce a dictionary of key being the ID and the value being the raw sequence.

​Thank you in advance for your help and cooperation,
Ismail Uddin
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
João Rodrigues | 26 Jun 11:31 2014
Picon

Re: [Biopython] Merging different pdbs into a single object structure and writing it

Good to hear ! ​
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Claudia Millán Nebot | 25 Jun 17:48 2014
Picon

[Biopython] Merging different pdbs into a single object structure and writing it

Hi :) I'm newbie to BioPython and I am trying to do the following:

I have a set of different pdbs that I want to merge together into a single file. I would like to take into consideration that there could be issues with the naming, so, after reading a few other posts in this same list, I came up with the following code:

            list_parsers=[]
            list_of_structures=[]
            for index in range(len(list_of_filenames)):
                parser=PDBParser()
                list_parsers.append(parser)
                structure=parser.get_structure(list_of_filenames[index][:-4],list_of_filenames[index])
                list_of_structures.append(structure)
            i_chain = 65
            for structure in list_of_structures:
              for chain in structure:
                chain.id = chr(i_chain)
                i_chain += 1
            io=PDBIO()    
            for structure in list_of_structures:
                io.set_structure(structure)
            io.save(clust_fold+key[:-4]+"_fused.pda")

This is not working, as I guess i'm just changing the structure set each time I do  io.set_structure, and writing the last one. And as there is not such a thing as the append_structure() method I have just tried a silly thing. So my question would be which is the best way to get the pdbs merged? Should I save as independent unfold entities and then write them by using a Select class?

Thanks in advance and regards,

Claudia Millán (cmncri <at> ibmb.csic.es)

Crystallographic Methods Group

http://chango.ibmb.csic.es

Institut de Biologia Molecular de Barcelona (IBMB-CSIC)

Barcelona, Spain

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Ivan Gregoretti | 25 Jun 05:14 2014
Picon

Re: [Biopython] File format autodetection.

Hello everybody.

After considering all contributions, for our case I have decided for a
solution based on first_line.startswith(">").

With that, I can now handle without exceptions " <at> " FASTQ, ">" FASTA
and the tricky "" empty file that is neither FASTQ nor FASTA (or
perhaps both at the same time).

Peter, great catch, empty files are common in our work.

Thank you all.

Ivan

Ivan Gregoretti, PhD
Bioinformatics

On Tue, Jun 24, 2014 at 5:50 PM, Rohan Sachdeva <rsachdev <at> usc.edu> wrote:
> Seems like you could just do this in pure python?
>
> for line in sys.stdin:
>   if line[0] == '>':
>         print fasta
>     elif line[0] == ' <at> ':
>         print 'fastq'
>     break
>
>
>
> --------------------------------
> Rohan Sachdeva
>  <at> archaeaologist
> Doctoral Candidate, John Heidelberg Laboratory
> Marine Environmental Biology
> University of Southern California
> Los Angeles, CA
> rsachdev <at> usc.edu
> (213) 740-4748
>
>
> On Tue, Jun 24, 2014 at 12:40 PM, Ivan Gregoretti <ivangreg <at> gmail.com>
> wrote:
>>
>> Thank you Lenna. It works now. It took me a little while to find that
>> I was expected to pass a file handle to UndoHandle().
>>
>> Ivan
>>
>>
>>
>> Ivan Gregoretti, PhD
>> Bioinformatics
>>
>>
>>
>> On Tue, Jun 24, 2014 at 2:59 PM, Lenna Peterson <arklenna <at> gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 24, 2014 at 2:00 PM, Ivan Gregoretti <ivangreg <at> gmail.com>
>> > wrote:
>> >>
>> >> Indeed, the STDIN stream is the challenge. That is why I though that
>> >> the question was worth documenting in the Biopython list.
>> >>
>> >> Would anybody mind showing how peekline() is used? I tried using it on
>> >> a SeqIO.parse generator but I get an error:
>> >>
>> >> AttributeError: 'generator' object has no attribute 'peekline'
>> >
>> >
>> > peekline() is a method of UndoHandle, not the generator.
>> >
>> > Cheers,
>> >
>> > Lenna
>> >
>> >
>> >>
>> >>
>> >> I am using Biopython 1.61 and Python 2.7.3 on linux 64bit.
>> >>
>> >> Thank you,
>> >>
>> >> Ivan
>> >>
>> >>
>> >>
>> >>
>> >> Ivan Gregoretti, PhD
>> >> Bioinformatics
>> >>
>> >>
>> >> On Tue, Jun 24, 2014 at 1:41 PM, Fields, Christopher J
>> >> <cjfields <at> illinois.edu> wrote:
>> >> > On Jun 24, 2014, at 11:54 AM, Peter Cock <p.j.a.cock <at> googlemail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi Ivan,
>> >> >>
>> >> >> Biopython's SeqIO does not (and will not) do automatic file
>> >> >> format detection, it is just too hard to get right so instead
>> >> >> that's the user's task:
>> >> >>
>> >> >> Zen of Python: Explicit is better than implicit.
>> >> >> http://legacy.python.org/dev/peps/pep-0020/
>> >> >>
>> >> >> (BioPerl's SeqIO can do format guessing)
>> >> >
>> >> > (somewhat)
>> >> >
>> >> > You are welcome to try it, but Bio::Tools::GuessSeqFormat is IMHO one
>> >> > of
>> >> > the misbegotten step-children of Bioperl; if you delve into it,
>> >> > you’ll find
>> >> > it also tries to guess whether something is a sequence or an
>> >> > alignment file.
>> >> > My general feeling is that if you don’t know the source of your data
>> >> > (and
>> >> > from that the format) then there is only so much we can do to help.
>> >> > Doing
>> >> > so from STDIN is even trickier.
>> >> >
>> >> > So, it’s there, it works in most cases so we keep it around, but
>> >> > caveat
>> >> > emptor.  We really don’t really maintain that module any more than
>> >> > very
>> >> > routine bugs fixes.
>> >> >
>> >> >> Your use case is one which highlights a technical reason
>> >> >> why this is hard - you are using stdin, a read-once handle.
>> >> >> You cannot peek at the file, guess the format, seek back to
>> >> >> the beginning, and then give the handle to a specific parser.
>> >> >>
>> >> >> You could use Biopython's UndoHandle here, but it will
>> >> >> impose a (modest) performance overhead.
>> >> >>
>> >> >> from Bio.File import UndoHandle
>> >> >> help(UndoHandle)
>> >> >>
>> >> >> e.g. Use the .peekline() method to spot FASTA vs FASTQ?
>> >> >>
>> >> >> Peter
>> >> >
>> >> > That seems like a pretty reasonable option.
>> >> >
>> >> > chris
>> >> >
>> >> >> On Tue, Jun 24, 2014 at 5:16 PM, Ivan Gregoretti
>> >> >> <ivangreg <at> gmail.com>
>> >> >> wrote:
>> >> >>> Hello Biopythoneers,
>> >> >>>
>> >> >>> The question:
>> >> >>>
>> >> >>> What is the strategy currently used for file format autodetection?
>> >> >>>
>> >> >>>
>> >> >>> The context:
>> >> >>>
>> >> >>> I have written a command line program that gets a stream of FASTQ
>> >> >>> data
>> >> >>> and reports how many records are contained. You can visualise it
>> >> >>> like
>> >> >>> this
>> >> >>>
>> >> >>> zcat myfile.fq.gz | fxcounttags.py -i /dev/stdin -o /dev/stdout >
>> >> >>> myfile.counts
>> >> >>>
>> >> >>> That works fine for FASTQ but I need to extend the functionality to
>> >> >>> FASTA streams. How would you write fxcounttags.py to detect
>> >> >>> FASTQ/FASTA?
>> >> >>>
>> >> >>> Thank you,
>> >> >>>
>> >> >>> Ivan
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Ivan Gregoretti, PhD
>> >> >>> Bioinformatics
>> >> >>> _______________________________________________
>> >> >>> Biopython mailing list  -  Biopython <at> mailman.open-bio.org
>> >> >>> http://mailman.open-bio.org/mailman/listinfo/biopython
>> >> >> _______________________________________________
>> >> >> Biopython mailing list  -  Biopython <at> mailman.open-bio.org
>> >> >> http://mailman.open-bio.org/mailman/listinfo/biopython
>> >> >
>> >>
>> >> _______________________________________________
>> >> Biopython mailing list  -  Biopython <at> mailman.open-bio.org
>> >> http://mailman.open-bio.org/mailman/listinfo/biopython
>> >
>> >
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython <at> mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>
>

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Ivan Gregoretti | 24 Jun 18:16 2014
Picon

[Biopython] File format autodetection.

Hello Biopythoneers,

The question:

What is the strategy currently used for file format autodetection?

The context:

I have written a command line program that gets a stream of FASTQ data
and reports how many records are contained. You can visualise it like
this

zcat myfile.fq.gz | fxcounttags.py -i /dev/stdin -o /dev/stdout > myfile.counts

That works fine for FASTQ but I need to extend the functionality to
FASTA streams. How would you write fxcounttags.py to detect
FASTQ/FASTA?

Thank you,

Ivan

Ivan Gregoretti, PhD
Bioinformatics
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython


Gmane