Mark A. Jensen | 1 Apr 07:28
Picon
Favicon
Gravatar

#bioperl bot talk

Hi All, 
Some cool stuff going on on the IRC node (freenode.net/#bioperl). 
Andrew Stewart has been prototyping an irc bot with Bioperl 
functionality built-in. The possibilities for improving support and 
logging our increasing irc traffic are terrifying. I've set up a 
wiki page (http://www.bioperl.org/wiki/Bots) under the new 
IRC category for discussions. Please feel free to contribute
use cases, ideas, praise and blame. 
cheers, 
Mark
Johann PELLET | 1 Apr 12:14
Picon
Favicon

load_seqdatabase error with a specific locus from genbank

Hi all,

With the latest version of BioPerl and BioSQL, I have tried to insert  
entry from a GenBank file, which I have downloaded from the NCBI  
website (648 937 records)

After successfully loading ncbi_taxonomy i am getting following error  
message while loading sequences into database.

perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -dbname  
biosql

--------------------- WARNING ---------------------
MSG: The supplied lineage does not start near 'Human papillomavirus  
type 2c' (I was supplied 'Human papillomavirus - 2 |  
Alphapapillomavirus | Pa
pillomaviridae')

the script is not stopped until this entry: S67864

--------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed,  
values were ("1","19)","1","3") FKs (41914,<NULL>)
ERROR:  invalid input syntax for integer: "19)"

---------------------------------------------------
Could not store S67864:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: error while executing statement in  
Bio::DB::BioSQL::LocationAdaptor::find_by_unique_key: ERROR:  current  
(Continue reading)

Florent Angly | 1 Apr 19:03
Picon

Re: taxonomy ID

FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you 
won't be able to put its information in a hash (unless you have a lot of 
memory).
Florent

Smithies, Russell wrote:
> The taxonomy information isn't in the blast output unless you created custom fasta headers for your blast database.
> The easiest way to get the tax_id for your accessions would be to download the gi->tax_id list from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz.
> If you load that file into a hash, parse the accessions out of the blast hits then lookup the tax_id from that
hash, I think it should be fairly fast. 
>
> Checking which are prokaryotes and which are eukaryotes based on tax_id is a separate problem  :-)
> If you grab the taxdump.tar.gz file from the same site, the nodes.dmp file contained within lists what
division each tax_id belongs to (Bacteria, Invertebrates, Mammals, Phages, Plants, etc) so you can
probably work it out from that.
>
> It's not a very BioPerly solution but sometimes just looking up the answer from a file/table/hash is the
simplest way. 
>
> Hope this helps,
>
> Russell Smithies 
>
> Bioinformatics Applications Developer 
> T +64 3 489 9085 
> E  russell.smithies <at> agresearch.co.nz 
>
> Invermay  Research Centre 
> Puddle Alley, 
> Mosgiel, 
(Continue reading)

Miguel Pignatelli | 1 Apr 19:15
Picon
Favicon

Is it possible to retrieve full pubmed articles

Hi all,

I have a list of PUBMED IDs and I am trying to retrieve automatically  
the *full article* in any format (not just the abstract). Is there any  
method in bioperl that allows this? any other solution?
Currently I am trying to solve this using WWW::Mechanize, but do you  
know of any other method to do this?

Any help would be appreciated,

Thanks in advance,

M;
Bryan Bishop | 1 Apr 20:18
Picon

Re: Is it possible to retrieve full pubmed articles

On Wed, Apr 1, 2009 at 12:15 PM, Miguel Pignatelli
<miguel.pignatelli <at> uv.es> wrote:
> I have a list of PUBMED IDs and I am trying to retrieve automatically the
> *full article* in any format (not just the abstract). Is there any method in
> bioperl that allows this? any other solution?
> Currently I am trying to solve this using WWW::Mechanize, but do you know of
> any other method to do this?

You can try pubget.com- it's a web gateway to download pubmedcentral
articles. Unfortunately this means it does not have pubmed articles.
What I have found with pubmed is that it's mainly a listing of
abstracts, and then the various papers may or may not be online in
their respective journals on the web somewhere else, and rarely are
there any links to the publisher website. So how are you using
WWW::Mechanize in this context? Is there some secret to attaining
papers that are listed via pubmed? There's no magical links to the
publisher websites .. so what's going on?

- Bryan
http://heybryan.org/
1 512 203 0507
Smithies, Russell | 1 Apr 21:33
Picon

Re: taxonomy ID

There's always more than one way to do it.
I have no trouble loading it into a hash but you could just grep the file:

my(undef,$tax_id) = split("\s", `grep -w -P "^$accession" gi_taxid_prot.dmp`);

--Russell

> -----Original Message-----
> From: Florent Angly [mailto:florent.angly <at> gmail.com]
> Sent: Thursday, 2 April 2009 6:03 a.m.
> To: Smithies, Russell
> Cc: 'shalabh sharma'; 'bioperl-l'
> Subject: Re: [Bioperl-l] taxonomy ID
> 
> FYI, the gi_taxid_nucl.dmp.gz is very large, thus it's likely that you
> won't be able to put its information in a hash (unless you have a lot of
> memory).
> Florent
> 
> Smithies, Russell wrote:
> > The taxonomy information isn't in the blast output unless you created custom
> fasta headers for your blast database.
> > The easiest way to get the tax_id for your accessions would be to download
> the gi->tax_id list from
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz.
> > If you load that file into a hash, parse the accessions out of the blast
> hits then lookup the tax_id from that hash, I think it should be fairly fast.
> >
> > Checking which are prokaryotes and which are eukaryotes based on tax_id is a
> separate problem  :-)
(Continue reading)

Smithies, Russell | 1 Apr 21:48
Picon

Re: Is it possible to retrieve full pubmed articles

Not all articles have full-text at Pubmed but if you know the article ID, you can usually get the whole
article (if available) like this:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez

or as pdf
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf

I'd just build a URL and use wget.

If you're searching Pubmed directly, use a query like this to ensure you only get articles with links to full text:

	cancer AND (free full text[sb])
eg 	http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&term=cancer+AND+(free+full+text[sb])

Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies <at> agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 

> -----Original Message-----
> From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-
(Continue reading)

Miguel Pignatelli | 2 Apr 00:14
Picon
Favicon

Re: Is it possible to retrieve full pubmed articles

Thanks for the response,

I have PMIDs extracted from Genbank flat files, is there a way to  
convert PMIDs to PMCIDs?
I found this page:

http://www.ncbi.nlm.nih.gov/sites/pmctopmid

Is it possible to download the underlying conversion table for local  
use?

Thank you very much in advance,

M;

El 01/04/2009, a las 21:48, Smithies, Russell escribió:

> Not all articles have full-text at Pubmed but if you know the  
> article ID, you can usually get the whole article (if available)  
> like this:
> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1307096&tool=pmcentrez
>
> or as pdf
> http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1307096&blobtype=pdf
>
> I'd just build a URL and use wget.
>
> If you're searching Pubmed directly, use a query like this to ensure  
> you only get articles with links to full text:
>
(Continue reading)

Smithies, Russell | 2 Apr 00:47
Picon

Re: Is it possible to retrieve full pubmed articles

Try this: http://www.pubmedcentral.nih.gov/about/ftp.html#Obtaining_DOIs

Use ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz to associate PMC articles with a PMC ID, a
PubMed ID, and the corresponding DOI.

PMC-ids.csv.gz is a comma separated file with the following fields:

    * Journal Title
    * ISSN
    * Electronic ISSN
    * Publication Year
    * Volume
    * Issue
    * Page
    * DOI (if available)
    * PMC ID
    * PubMed ID (if available)
    * Manuscript ID (if available)
    * Release Date (Mmm DD YYYY or live)

--Russell

> -----Original Message-----
> From: Miguel Pignatelli [mailto:miguel.pignatelli <at> uv.es]
> Sent: Thursday, 2 April 2009 11:14 a.m.
> To: Smithies, Russell
> Cc: 'bioperl-l <at> lists.open-bio.org'
> Subject: Re: [Bioperl-l] Is it possible to retrieve full pubmed articles
> 
> Thanks for the response,
(Continue reading)

Tristan Lefebure | 2 Apr 05:11
Picon

Bio::SimpleAlign, uniq_seq

Hi there,

I'm trying to use the uniq_seq function from the Bio::SimpleAlign module.
Here is the description:

 Title     : uniq_seq
 Usage     : $aln->uniq_seq():  Remove identical sequences in
             in the alignment.  Ambiguous base ("N", "n") and
             leading and ending gaps ("-") are NOT counted as
             differences.
 Function  : Make a new alignment of unique sequence types (STs)
 Returns   : 1. a new Bio::SimpleAlign object (all sequences renamed as "ST")
             2. ST of each sequence in STDERR
 Argument  : None

What I'm trying to obtain is the ST composition (i.e. what is supposed to go 
to STDERR), but I see nothing...

An example:

--------test.fasta:
>seq1
AAATTTC
>seq2
CAATTTC
>seq3
AAATTTC
-------

----------test.pl:
(Continue reading)


Gmane