Anna Kostikova | 23 May 14:03 2015
Picon

[Biopython] retmax for Entrez.elink

Dear list,

Is it possible to limit the number of returned matches with
Entrez.elink function?
something like retmax for Entrez.esearch?

Thanks a lot,
Anna
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Anna Kostikova | 23 May 14:00 2015
Picon

[Biopython] retmax for Entrez.elink

Dear list,

Is it possible to limit the number of returned matches with
Entrez.elink function?
something like retmax for Entrez.esearch?

Thanks a lot,
Anna
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Anna Simpson | 17 May 22:24 2015
Picon

[Biopython] Parsing xml from Bioproject without DTD - how to use schema?

Hi all,
I've been trying to parse xml files from an efetch query to the bioproject database, and kept getting an error message about no dtd (and validation=False gets me no data at all) when using Entrez.read or Entrez.parse. I found a post on this mailing list from 2013, where a gentleman had the same problem - he emailed NCBI and was told the following:

"Yes this is the "normal" but it is an oversight as a dtd was never created for this database. I will have to open a ticket to the developers to create this and have it included in the XML and on the DTD web page."

I've emailed NCBI about this again but I'm guessing there still isn't one (and I can't find it in the DTD index page). But my various googlings have led me to find that there is a schema for bioproject, and that perhaps, somehow, it could be used to parse these xml files. How  might I go about doing that?

I've been trying to use xml parsers like element tree and Beautiful Soup but keep running into walls (how to stick an entrez handle into a parser, how to get it to give me deeply nested information when the nesting is different for each xml document I get and I'm running this through a loop) so it would be great if I could ...stop doing that.

Thanks,
Anna
University of Washington, Seattle
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Sauer, David | 13 May 17:49 2015
Picon

[Biopython] Local Uniprot

Hi all,
I have a script where I query the UniProt website for particular protein entries, following BioPython and UniProt’s own python access notes. However, as a way to be polite, I wait a few seconds between queries, but this makes my script fairly slow. I would like to keep the database locally, but the only downloads I can find for the UniProtKB are as two huge xml files for Swiss-Prot and TrEMBL. I am unclear how to parse these compared to the individual protein xml files on the website, which are easily parsable by BioPython. 

Does anyone have guidance on parsing and running the UniProtKB locally?

Thanks in advance!

David Sauer

Da-Neng Wang Lab
Structural Biology Program
New York University School of Medicine

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Horea Christian | 13 May 13:47 2015
Picon

[Biopython] In Silico ~PCR

 I recently started working on a function for in silico PCR. Basically this identifies potential(ly unwanted) amplicons.

I have implemented some basic paralellization for the function, and it can currently PCR against the whole mouse genome in about 20min on my laptop. I guess I could make it even faster by not BLASTing both primers against the whole genome, but just BLASTing one, and then BLASTing the second downstream of the first (the function takes a maximum amplicon length argument).

Do you think this could be a worthwhile thing to include in biopython?


Best Regards,
Christian


-- 
Horea Christian, M.Sc.
Doctoral Researcher
Institute for Biomedical Engineering
Neuroscience Center Zurich
ETH Zurich and UZH

Email, horea.christ <at> gmail.com
Online portal, chymera.eu


_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Horea Christian | 13 May 13:38 2015
Picon

[Biopython] Thermocycler program file format(ting)

I would like to store my Thermocycler (pcr) programs on my server to better keep track of them. Do you know any good format/style to do that in? does biopythonhave any thermocycler program objects?

Right now I was thinking of saving this in a normal .csv file, but the entire iterative aspect of PCR seems difficult for me to convey in a concise matter in a .csv file. Would you have any suggestions?


Best Regards,
Christian

--
Horea Christian, M.Sc.
Doctoral Researcher
Institute for Biomedical Engineering
Neuroscience Center Zurich
ETH Zurich and UZH

Email, horea.christ <at> gmail.com
Online portal, chymera.eu
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Martin Mokrejs | 6 May 16:22 2015
Picon

[Biopython] Upcoming NCBI BLAST XML2 format

Hi,
   are you aware of new changes in BLAST's XML format? Time for feedback before it emerges. ;-)

ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/xml2.pdf

Martin
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Patrick McGrath | 5 May 12:57 2015
Picon

[Biopython] Reading in chimera generated files

Hello,

I am trying to use Bio.PDB to read in a PDB file generated from matchmaker in chimera. It seems that Chimera changes the HG to Hg and ZN to Zn in the last column of the pdb file which causes an error:

assert not element or element == element.upper(), element
AssertionError: Hg

Is there any solution to this besides creating a script to manually edit the pdb file generated by chimera?
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Ryan J. Hope | 4 May 19:11 2015
Picon

[Biopython] codon optimisation script

Does anyone know of a python script for codon optimization of a FASTA file using a codon usage table as a reference?

Hi, I need a template Python script (or a complete script or perhaps just some advice) for conversion of a gene belonging to R. eutrophia for heterologous expression in C. acetobutylicum. I have a codon usage table as a .txt file copied from Kazusa and the gene I'm interested in as FASTA. If anyone has a .py file they can upload that would be great :)

Kind regards,
Ryan H.

_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
Linlin Zhao | 30 Apr 16:34 2015
Picon

[Biopython] Suggestions for working sequence data

Hi There,

I am new to Biopython, but have some experience with Python. Python is 
my favourite language, I really want to learn and apply Biopython.

I need to work on a large dataset, the whole genomic sequences of all 
bacteria (~2000) from genBank, with size of about 80GB. Each folder 
within the dataset is one organism, which includes several files for 
different types of data.

My task is easy to describe but I need some suggestions of how to work 
on the whole dataset efficiently with Biopython:

1. load FASTA file .ffn (protein genes) in each folder, which looks like 
this

> gi|158303474|gb|CP000828.1|:3233-4009 Acaryochloris marina MBIC11017, 
> complete genome
ATGCTAGGTGCAATTGC....

According to the address (3233-4009) of this gene, I then go to .fna 
file within the same folder which has the whole genome sequence, and 
read 20 base pairs (3213-3232) as the approximate promoter for the gene. 
Finally I construct a new gene sequence with address 3213-4009, which is 
just adding 20bp in front of the original gene sequence.

2. run motifs, for instance "TACGTC", through all those gene sequences 
to find out their frequency of appearance in all bacterial protein 
genes.

I hope my description is clear. Problem in short is that I need to work 
on two files in all 2000 folders, how to load and write file 
efficiently?

Would anyone give some hints?

Thanks in advance,
Linlin
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Katie Edmonds | 30 Apr 16:15 2015
Picon

[Biopython] graphics library suggestions?

Hi,

I want to write a script that reads a list of genes from a table, including their start and endpoint on the chromosome, and generates vector graphics arrows on a line, with lengths and positions to scale for their size and location in the genome, colored by some value given in the table. Does anyone have any suggestions for a useful library for creating nice vector graphics arrows?

Here's an example of what I'm talking about:
https://github.com/betainverse/other/blob/master/gene_arrows.pdf

Alternatively, is there already a good tool that does what I want?

Thanks,
Katie
_______________________________________________
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

Gmane