Re: Open versus Chained HashMap
David MacIver <david.maciver <at> gmail.com>
2008-09-01 14:39:11 GMT
On Mon, Sep 1, 2008 at 3:24 PM, Ismael Juma <ismaelj <at> gmail.com> wrote:
> On Mon, 2008-09-01 at 14:52 +0100, David MacIver wrote:
>> > David has told me that
>> > the alternative of having separate arrays for keys, values and
>> > (optionally) hashes was substantially slower. Given that, chaining
>>
>> My theory is that this is because it increases the number of bounds
>> checks the JVM inserts dramatically. Presumably it also decreases
>> locality of reference (in a way that storing them inline in the array
>> wouldn't), but I'm not sure about this point.
>
> Right, I thought about that, but in theory using Entry objects does not
> provide better locality of reference since it just contains references
> anyway. But maybe the JVM is doing something that causes better
I had a convincing reason why it might happen, but I'm no longer sure
what it was.
It might be related to the position of the Entry objects. Given that
they're all freshly allocated in the current thread local allocation
block, most of them might still be in cache?
> locality of reference for Entry objects because it seemed to me that
> the additional bounds checks should not cause the kind of disparity
> that you saw. Storing everything in a single array would be interesting
I don't know. It wasn't an utterly dramatic slowdown, and on fairly
fast operations. Tripling the number of tests inside the lookup could
easily be a significant hit.
> and it's what IdentityHashMap in the JDK does. As you know,
The problem I had with this is that it would then require either
non-caching of the hash codes or boxing of the hash keys, so in the
former case it's potentially dramatically slower and in the latter it
wouldn't really decrease the memory use.
Non-caching hash codes could at least be interesting to experiment
with, as it works well with some cases (ints, strings, umm...
basically nothing else.
).
Hm. Actually, you don't need to recompute the hash of the keys already
in the table. Currently the hash for them *is* used, but only because
it's cheap to do so. If that was removed it would result in more
expensive equality tests within the map but lower memory usage and
less indirection. Worth a go, certainly.
> IdentityHashMap needs to solve a more restricted problem. They can rely on
> linear probing because reference hashCodes in HotSpot have good
> distribution and are also cheap to compute (so no need to cache them).
Right.
>> In theory the probing scheme I used may be more resilient to bad
>> hashes. I don't have a test case to prove this though.
>
> I don't understand why this is so. Note that ChainedHashMap uses the
> same hashOf method as OpenHashMap, but it does not need to probe because
> of the chaining nature.
The function chosen for probing is done in a way that can result in
very short probes where in principle you might end up with much longer
chains in the chaining.
Like I said, I don't have a test case that demonstrates this, so I
could be totally wrong.