omindex --duplicates=duplicate ?
2003-06-06 01:12:12 GMT
I've just been converting omega's omindex to use the Xapian namespace.
I'm rather puzzled by the duplicate handling options.
You can choose to replace an existing document with the same URL as one
being added (which is sensible - it means you can reindex a document
tree by running omindex with an existing database, though deleted
documents won't be removed from the index).
You can also ignore a duplicate, which I can see might be useful in odd
circumstances - you might build a database by running omindex several
times with different filename->URL mappings, and two could produce the
same URL I suppose. But this option seems to be redundant since you
could just run the omindex commands in reverse order replacing
duplicates. So that the first, preserved entry becomes the last,
replacing entry. And this way you can update an existing database,
which you can't if you ignore duplicates.
The final option is to create another record with the same URL. I'm
hard pushed to think of circumstances in which this would be a useful
thing to do...
Does anybody using omega use --duplicates with anything apart from the
default setting (replace duplicates)? If so, how are you making use of
it?
Cheers,
Olly
-------------------------------------------------------
(Continue reading)
> I'd like to remove duplicate duplicates at some point if there are no
> objections. I want to rework omindex to allow it to do proper updating
> (checking for removed documents too). Ignore duplicates and replace
> duplicates fit into this nicely, but duplicate duplicates doesn't...
I have no problems with that at all. Removed documents was something I
never got round to doing. Duplicate duplicates was pretty much there
because it was easy to do ...
J
RSS Feed