Re: Run time for large query files?
An addendum:
The fast e-value approximation is disabled by default in the cvs code.
To use it, add --fast-evalue-approximation to the mpiblast command line.
I tested it out over the weekend with a blastp between e. coli and yeast
AA sequences. In that dataset the e-values were correct within a factor
of 2, so it may be desirable to adjust the e-value cutoff slightly when
searching for very weak homology. For example, a hit given an e-value
of 8 by NCBI blastall may be given an approximate evalue of 11 by
mpiblast (using --fast-evalue-approximation), which would be above the
default e-value cutoff of 10 and thus discarded.
-Aaron
Stephen Ficklin wrote:
> Hi, I have a query file with 359,001 ESTs (218MB). I'm running a blastx using mpiBLAST against the
Swiss-Prot Uniprot database containing 241,242 protein sequences (119MB) in size. The job on the master
node has been doing all the work. The worker nodes just sit idle and after more than 12 hours I still have no
results and the workers nodes are still doing nothing.
>
> Here's my command-line entry:
> /usr/local/mpich/bin/mpirun -np 12 -machinefile /home/userx/test/mpiblast/machines
/usr/local/mpiblast/bin/mpiblast -p blastx -i /local/scratch/rosaceae2006_6_14.trim.lib -d
uniprot_sprot.fasta -e 1e-5 -F F -o /home/userx/test/mpiblast/results.out --debug=/test/mpiblast/debug.out
>
> I'm running this job on a 60 node cluster with each node having dual 64bit AMD Opteron 2.2Ghz processors and
2GB of RAM. The database files, mpiblast binaries and output files all reside on NFS mounted filesystems.
>
> Is mpiblast really just in the "preparation" phase before the workers get busy? Can anyone tell me if I am
(Continue reading)