Donovan Baarda | 18 Mar 03:46 2004

Re: MD4 checksum_seed


cc'ing to the librsync-dev list as this is pretty relevant there.

On Thu, 2004-03-18 at 04:41, Eran Tromer wrote:
> Hi,
> >>>Have you looked at the librsync rollsum.c implementation in CVS? It
> >>>replaced rs_calc_weak_sum and should be a fair bit faster.
> Here are run times for 70000000 calculations over the same all-zero block.
> "checksum.c" is librsync's current code (which differs from rsync's only
> in the chars are signed).
> "rollsum.c" is librsync's rollsumc.c from CVS.
> "XDELTA" is checksum.c changed to add char-to-short lookup table
> (I blindly converted every occurance of "buf[foo]" into
> "lookup_table[buf[foo]]"; the lookup table is xdelta's).
> Numbers in seconds; tested on a Pentium Xeon 1700MHz, gcc 3.2.3.
> checksum.c -O2 CHAR_OFFSET=0:  273.42
> checksum.c -O2 CHAR_OFFSET=31: 300.63
> checksum.c -O3 CHAR_OFFSET=0:   31.42
> checksum.c -O3 CHAR_OFFSET=31:  35.32
> rollsum.c -O2 CHAR_OFFSET=0:    99.47
> rollsum.c -O2 CHAR_OFFSET=31:   99.22
> rollsum.c -O3 CHAR_OFFSET=0:   100.18
> rollsum.c -O3 CHAR_OFFSET=31:   99.77
> XDELTA -O2:                    386.07
> XDELTA -O3:                     32.17

(Continue reading)

Eran Tromer | 18 Mar 05:04 2004

Re: MD4 checksum_seed


On 2004/03/18 04:46, Donovan Baarda wrote:

> Why would rollsum.c be slightly _slower_ for CHAR_OFFSET=0? 

Just measurement noise (background process kicking in and spoiling the
memory caches, or such). It sometimes errs slightly the other direction,
so I believe the two cases are identical.

>>* The extra cost of xdelta-type lookups is acceptable for -O2 and
>>  virtually nonexistant for -O3.
> The optimiser must something pretty amazing to minimise the table
> lookups at compile-time; pretty impressive.

It's impossible to be completely eliminate the lookups. Indeed, looking
at gcc's intermediate assembler output, checksum.c's main loop is 22
instructions with plain CHAR_OFFSET=0 and 28 with xdelta lookups. My
guess is that checkum.c -O3 leaves some unused slots in the Xeon's
pipeline, and the extra lookups happen to fall nicely into those slots.
Same effect with a Pentium III Katmai 550MHz, BTW.

> The next question is; is the table lookup benefits enough to justify the
> extra complexity... given there is negligible performance hit?
> I don't think the lookups actually improve the probability of avoiding
> random collisions by much, and although they make it slightly more
> complicated, they don't make it much more robust against intentional
> attacks either. The only true protection against intentional attacks is
(Continue reading)