Re: Benchmarking Lingua::Stem
Allan Fields <afieldsml <at> idirect.ca>
2003-04-15 02:13:43 GMT
Hi,
On Mon, Apr 14, 2003 at 09:04:19AM -0700, Benjamin Franz wrote:
> On Mon, 14 Apr 2003, Benjamin Franz wrote:
>
> > While doing a little Googling to find out if/how people are using Perl
> > modules I've written, I ran across the Snowball-discuss list benchmarking
> > of Lingua::Stem last year
> > <URL:http://www.snowball.tartarus.org/archives/snowball-discuss/0193.html>
> >
> > I feel that your criticism of the performance of Lingua::Stem is
> > mis-placed. Your benchmark used it in its _lowest_ performance mode (one
> > word at a time, caching disabled). If you process _all_ the words in one
> > pass you will find its performance _exceeds_ its competition - by quite a
> > large margin. That is _without_ even turning on the stem caching system
> > (which can multiply the performance several times on large stemming
> > operations).
Yes, that's true..
I guess this should probably be followed up on the list as a point of
clarification. When I did those benchmarks, I did them assuming there would
be only one word per call, and many (multiple) calls of the subroutines, so
it followed that subroutine overhead was as much an issue as the
implementation of the algorithm itself. I remember looking at the batch
features, but deciding to benchmark it the way I was expecting to call it.
In hindsight, that probably wasn't the fairest benchmarking strategy (and
it was a quick benchmark.)
As for my other point: although your stemmer was one of the most featureful,
I just hadn't understood why some of the program structure used symbolic
(Continue reading)