Re: bioperl and kegg(out of memory problem )
Chris Fields <cjfields <at> uiuc.edu>
2007-04-02 15:32:59 GMT
Ambrose,
Data is persisting in your hashes (in particular DBLink objects),
which is eating away at your memory. If I take a sample KEGG gene
file and simply parse it:
while (my $seq = $io->next_seq) {
print $seq->accession,"\n";
}
there are no memory issues, but if I store the data in hashes
declared outside the loop:
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
while (my $seq = $io->next_seq) {
# store Bio::Seq data in hashes
}
I see problems with only one genome file with KEGG records. You'll
definitely run into memory issues if you are parsing many genome
files, which you appear to be:
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
for my $genomefile ( <at> genomelist) {
(Continue reading)