Re: Run time for large query files?
2006-12-01 04:23:03 GMT
Hi Stephen, The mpiblast behavior you are observing is "normal" and results from mpiblast carefully calculating the exact effective search space size for every query in the large query set you are using. Unfortunately, that part of mpiblast 1.4.0 is still a serial component of the algorithm which takes place prior to the database search. Others have also complained about how long the initial search space calculation takes on AA queries, so I have introduced a bit of code to mpiblast.cpp that should provide a fast approximation to the true effective search space. My logic behind doing so is that the error introduced by the approximation will in most cases be small enough that it shouldn't matter (ask me if you want a more detailed explanation). The approximation works by taking the effective database size and query length adjustment from the first query and applying it to all subsequent queries. This will be more likely to cause e-value inaccuracy if the first query is not representative of the remaining queries, i.e. the first query is substantially longer, shorter, or contains a substantially different amount of low-complexity sequence that would be removed by the dust filter. I have committed the changes to mpiblast.cpp in our CVS repository. Follow these instructions to build from the CVS repository: http://mpiblast.lanl.gov/Docs.FAQ.html#cvs The changes may take up to a day to sync between the developer repository and the public repository. Note that I have not yet had time to test the changes, but they are small, so hopefully they'll "just work." I'll make an effort to do some testing over the weekend... hope this helps, -Aaron(Continue reading)