Jakub Stanislaw Nowak | 29 Aug 01:06 2014
Picon
Picon

[Biopython] alignment in biopython using `clustalW

Hello,

I am trying to run alignment using ClustalW in python.

I have used this code to compile

cline = ClustalwCommandline("clustalw2", infile="test.fasta")
print(cline)
stdout, stderr = cline()
align = AlignIO.read("test.aln", "clustal")
print(align)

But it is generating file. I think I have some problem with setting a proper pathway to  clustal.

This is the error:

clustalw2 -infile=test.fasta
Traceback (most recent call last):
  File "/Users/user/Google Drive/Bioinformatics/smallRNAseq/Python U densities/trial4old.py", line 95, in <module>
    stdout, stderr = cline()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/Bio/Application/__init__.py", line 513, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 127 from 'clustalw2 -infile=test.fasta', message '/bin/sh: clustalw2: command not found'


Can you suggest some solution?

Thanks,

Jakub


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
def87 | 26 Aug 12:42 2014
Picon
Picon

[Biopython] phylo loop through nodes in tree

Hi,
 
I am trying to do a tree traversal with the phylo package (just loop over every node). I have read the tree into a Newick tree object. Now the class docu of TreeMixin shows me the available methods: _filter_search seems to be what I need because it says that it's for tree traversals. However I don't want to apply a filter because I want to loop over every node in the tree. That will probably be easy to realize after having understood how _filter_search works (something like filter = 1 = true for all = loop over all nodes). But there is no documentation for this function and I don't understand how it works by just reading what its arguments are:
_filter_search(self, filter_func, order, follow_attrs)
Could someone please tell me how to loop over every node (I need each node as a clade object, something like "for node in tree:")? I couldn't find anything regarding _filter_search in the cookbook.
 
Regards,
Robert
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Jim Mailman | 19 Aug 02:13 2014
Picon

[Biopython] convert coordinates from one sequence to another in an alignment

Hello everyone, 
I have two sequences A and B with about 90% identity and a few small indels. They are about 30kb long. If I align sequences A to B using either BLAST or MUSCLE, for every base in A, how can I get the base coordinates of A aligned to any given coordinates in B? 
Is there something in Biopython to do this? Are there any existing tools for this? I looked at the 
Package Bio :: Package Blast :: Module Record :: Class HSP

If there is no existing tools, is this the best place to get started? 

Thanks,

Jim
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Lior Glick | 18 Aug 16:14 2014
Picon
Picon

[Biopython] Using Biopython to retrieve details on an unknown sequence by BLAST

Dear Biopython list users,

I'm using Biopython for the first time. I have sequence data from unknown organisms, and trying to use BLAST to tell which organism they are most likely to have come from. I wrote the following function to do that:

def find_organism(file): """ Receives a fasta file with a single seq, and uses BLAST to find from which organism it was taken. """ # get seq from fasta file seqRecord = SeqIO.read(file,"fasta") # run BLAST blastResult = NCBIWWW.qblast("blastn", "nt", seqRecord.seq) # get first hit blastRecord = NCBIXML.read(blastResult) firstHit = blastRecord.alignments[0] # get hit's gi number title = firstHit.title gi = title.split("|")[1] # search NCBI for the gi number ncbiResult = Entrez.efetch(db="nucleotide", id=gi, rettype="gb", retmode="text") ncbiResultSeqRec = SeqIO.read(ncbiResult,"gb") # get organism annotatDict = ncbiResultSeqRec.annotations return(annotatDict['organism'])

It works fine, but takes about 2 minutes to retrieve the organism for each species, which seems very slow to me. I'm just wondering if I could do better. I know that I may create a local copy of NCBI to improve performance, and I might do that. However, I suspect that querying BLAST first, then take the id and use it to query Entrez is not the way to go. Do you have any other suggestions for improvements?
Thanks!

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Jurgens de Bruin | 14 Aug 09:03 2014
Picon

[Biopython] Writing large Fastq

Hi All,

I would appreciate any help on the following, I have script that that filters read from a FASQ files files are 1.5GB in size. I want to write the filtered read to a new fastq file and this is where I seem to have  bug as the writing of the file newer finishes I have left the script for 4 hours and nothing so I stop the script. This is currently what I have :

from Bio import SeqIO
fastq_parser = SeqIO.parse(ls_file,ls_filetype)
wanted = (rec for rec in fastq_parser if rec.description in ll_llist )
ls_filename = "%s_filered.fastq"%ls_file.split(".")[0]
handle = open(ls_filename,'wb')
SeqIO.write(wanted, handle , "fastq")
handle.close()  

Thanks inadvance
--
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Evan Parker | 13 Aug 20:36 2014
Picon

[Biopython] GSoC Biopython project update - details and docs

Hey again Biopythoneers and OBF-SoC students,

The last week I have been working on reaching feature-parity with the existing Index_db parser and I've been doing various code cleanups. I have posted a blog update about my recent work if you're interested.

Have a good week,

-Evan Parker

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Mark Budde | 31 Jul 07:40 2014
Picon

[Biopython] Adding a new restriction enzyme?

Hi,
I would like to add the restriction enzyme I-SceI for use as a Bio.Restriction class. Would the best way to accomplish this be to import Bio.Restriction, then manually add I-SceI to Bio.Restriction.Restriction_Dictionary.rest_dict, using another enzyme as a template? Will that be sufficient?
Thanks,
Mark
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Evan Parker | 30 Jul 23:51 2014
Picon

[Biopython] GSoC Biopython project update - XML parsing

Hello again fellow Biopythoneers and OBF-SoC students,

I have a new blog post regarding my ongoing work with the lazy-loading and indexing sequence file parsers. Last I spent some time working with the fundamental's of XML parsing in Python. Extracting file offset information from XML files is a moderately difficult task in Python and I had to make a custom solution to pull this information efficiently.

Have a good week,

- Evan Parker
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Francesco Strozzi | 30 Jul 10:46 2014
Picon

[Biopython] EU OBF Codefest - September 18th and 19th 2014 <at> EBI - Hinxton UK

Dear all,
apologies if you receive this multiple times.

We are glad to announce that we are organising the European edition of
the OBF Codefest, which will take place on September 18th and 19th at
the EBI in Hinxton, UK. This is the second OBF Codefest this year, the
first was held this month in Boston just before the BOSC and ISMB
conferences and in September we will have the chance to expand and
carry on the work and discussions started in July in the USA.

The EU Codefest will precede the Genome
Informatics Conference in Cambridge, so we hope that developers
attending the main conference will be also interested in joining us
for a couple of days of coding and discussions on collaborative
projects and new ideas. The main topics that were proposed so far are:

* The OpenBio projects development (BioPerl, BioPython, BioRuby, BioJava)
* Semantic web technologies for biological data (e.g. RDF, OWL)
* Software deployment and bioinformatics pipelines, including
CloudBiolinux, Docker and GNU GUIX
* NoSQL databases and NGS data mining
* Biological data visualisation with e.g. D3/JS and BioJS.

We of course invite attendees to add other topics of interest.

For more information, you can visit the OBF Wiki page, which includes
also information on accommodation and registration:
http://www.open-bio.org/wiki/EU-Codefest_2014
If you plan to attend, please use the link in the registration section
and complete the simple steps on the EBI website, so that
we can be aware of the total number of attendees and arrange the
organisation accordingly. Registration is also completely free.

We will be grateful if you could also share this announce over your
network and social media, this will help spreading the word.

Thanks and we hope to see many of you in September!

Francesco Strozzi
James Malone
Raoul Bonnal
Pjotr Prins
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Ilya Flyamer | 25 Jul 14:55 2014
Picon

[Biopython] Edit feature's location

Hi,

is there a way to either change SeqFeature's location in place or create a copy with a different location? Assigning to feature.location.start raises an AttributeError.

My ultimate goal is to move all features in a genbank file by some specific number of nucleotides (for example, add 1000 to all coordinates). If someone can help me and tell about an easier way, I will appreciate it.

Best,
Ilya
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Frederico Moraes Ferreira | 22 Jul 20:14 2014
Picon
Picon

[Biopython] PDB parser: suppress warnings

Hi Biopythonees,
Has anybody happen to know how to suppress warnings when reading PDBs 
with hydrogenated waters?

from Bio.PDB.PDBParser import PDBParser
struct = PDBParser().get_structure('tmp', pdbf)

My PDBs came from MD so as all water molecules have hydrogens causing 
thousands of warnings like:
...
/usr/lib64/python2.7/site-packages/Bio/PDB/PDBParser.py:284: 
PDBConstructionWarning: PDBConstructionException: Atom 2H defined twice 
in residue <Residue HOH het=W resseq=2640 icode= > at line 71037.
Exception ignored.
Some atoms or residues may be missing in the data structure.
   % message, PDBConstructionWarning)
...

Thanks in advance.
Fred

--

-- 
Dr. Frederico Moraes Ferreira
University of Sao Paulo
Heart Institute, School of Medicine
Laboratoy of Immunology
Av. Dr. Enéas de Carvalho Aguiar, 44
05403-900     Sao Paulo - SP
Brasil

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython


Gmane