2 Sep 04:17
Re: Proposed changes to omindex
Olly Betts <olly <at> survex.com>
2006-09-02 02:17:40 GMT
2006-09-02 02:17:40 GMT
On Tue, Aug 29, 2006 at 04:22:29PM +0100, Olly Betts wrote: > I've made some and I'm in the process of working through the rest. OK, I've done fettling. After some research, I went for the public domain MD5 implementation written by Colin Plumb. It's used widely (including in the Linux kernel), compiles as C++, and doesn't add further relicensing obstacles. I couldn't find a guarantee that std::string::c_str() will have the correct alignment for access as a 32 bit integer (though I can believe it typically will be), so I've used memcpy() instead of reinterpret_cast<>. I've been wondering if there should be a command line option to enable/disable the MD5 checksumming. If you don't want to collapse indentical documents, it's just overhead (slower indexing, bigger database, and probably some slowdown when sorting by lastmod with the current way we store values). So I did some simple benchmarking by indexing /usr/share/doc on my Ubuntu box: Without MD5: real 1m56.279s user 1m44.573s sys 0m7.358s 58536 usrsharedoc(Continue reading)
RSS Feed