Re: goodbye QuartzBufferedTable
Olly Betts <olly <at> survex.com>
2004-08-13 11:54:40 GMT
On Fri, Aug 13, 2004 at 09:10:44AM +0200, Arjen van der Meijden wrote:
> On 13-8-2004 1:56, Olly Betts wrote:
> > I should add something so the batch size can be set without
> > recompiling though.
>
> I'll watch the cvs-commits for this. Will you also allow a switch (or an
> environment value or whatever) on scriptindex to adjust this?
For the time being, I'll probably just pull the value from an
environment variable inside quartz itself. We should also look at
whether a document count based flush is the best approach - now that
we only cache changed postings in memory, counting the number of
cached postings might be more appropriate since that'll mostly
dictate memory usage and how much work the merging step does.
> Making it runtime/startuptime adjustable will at least allow easier
> searching for semi-optimal values. Finding the real-optimal values will
> probably cost a lot of extra time, while not really improving the
> performance that much.
I believe we can pick a reasonable default for most users. If you've
got 10,000,000 documents, it's worth your while spending a bit of time
tuning.
Also, with a smaller collection, it's nice to be able to see documents
searchable while the indexer is still running. With a large collection
you'd rather get the indexing done sooner.
Perhaps omindex (and maybe scriptindex) ought to force a flush after 10,
100, 1000 documents or something like that. Mind you, my first batch of
(Continue reading)