exty | 1 Sep 2008 09:39
Picon

Re: Unable to establish connection to compilation daemon


I'm disappointed to tell you that my fix doesn't work. It only produced a
false positive. I should have tested it more thoroughly. The reason fsc
worked for me, wasn't that I had reconfigured the OS X Firewall, but that I
had just changed my network connection from WIFI to LAN.

This takes us to the bottom of the problem. On OS X, fsc seem to be picky
about what kind of network connection the computer has. If there's no IP for
en0, fsc can't establish a connection. It doesn't help that the WIFI card
(en1) or some other interface has an IP.

-exty

Martin Odersky wrote:
> 
> Thanks for the guide! Very helpful. Toni, I suggest we add this to the
> FAQ.
> 
> Cheers
> 
>  -- Martin
> 
> On Fri, Aug 29, 2008 at 6:01 PM, exty <explicit.type <at> gmail.com> wrote:
>>
>>
>> Martin Odersky wrote:
>>>
>>> On Sun, Aug 24, 2008 at 9:48 PM, exty <explicit.type <at> gmail.com> wrote:
>>>>
>>>> Hi,
(Continue reading)

Ismael Juma | 1 Sep 2008 15:33
Picon

Open versus Chained HashMap

Hi,

OpenHashMap from David MacIver was recently introduced to the Scala
library and it seems to perform much better than the existing
mutable.HashMap. It uses open addressing, but with wrapper Entry
objects. This is a bit unconvential since one of the main advantages of
open addressing versus chaining is that it does not require wrapper
objects, saving memory and reducing indirection. David has told me that
the alternative of having separate arrays for keys, values and
(optionally) hashes was substantially slower. Given that, chaining
should not impose much more overhead (just an additional reference in
each Entry object) and could even result in lower memory usage if one
takes into account that chained implementations perform relatively well
with higher load factors (typically 0.75 instead of 0.50 for open
addressing).

So, I decided to create a ChainedHashMap starting from OpenHashMap and
test the theory. Given the tests and benchmarks that David wrote for his
implementations, it seemed easy enough. :)

After getting all of David's tests to pass, I ran the benchmarks with 3
different variations for ChainedHashMap:

- ChainedHashMap (the default): load factor of 0.75 and resize
multiplier of 2

- ChainedHashMap medium: load factor of 0.65 and resize multiplier of 2.

- ChainedHashMap large: load factor of 0.50 and resize multiplier of 2.
This is the same as OpenHashMap, so with this configuration
(Continue reading)

Geoffrey Alan Washburn | 1 Sep 2008 15:45
Picon
Picon
Favicon

Re: vscaladoc vs. scaladoc

Christos KK Loverdos wrote:
> It has been mentioned by EPFL that vscaladoc will be integrated in 
> Scala. I think this means that it will replace the existing scaladocs 
> framework.

I would say the plan is not to just outright replace the existing 
implementation of scaladoc with vscaladoc, but to merge them to some 
degree.  For example, my initial examination seems to indicate that 
vscaladoc also doesn't make use of any the "ModelAdditions" that are 
used to provide documentation for things like Any and AnyRef.

In any event, like scalap, we could really use a volunteer to help with 
scaladoc, because we currently don't have nearly as much time to devote 
to it as would be ideal.

David MacIver | 1 Sep 2008 15:52
Picon

Re: Open versus Chained HashMap

On Mon, Sep 1, 2008 at 2:33 PM, Ismael Juma <ismaelj <at> gmail.com> wrote:
> Hi,
>
> OpenHashMap from David MacIver was recently introduced to the Scala
> library and it seems to perform much better than the existing
> mutable.HashMap. It uses open addressing, but with wrapper Entry
> objects. This is a bit unconvential since one of the main advantages of
> open addressing versus chaining is that it does not require wrapper
> objects, saving memory and reducing indirection. David has told me that
> the alternative of having separate arrays for keys, values and
> (optionally) hashes was substantially slower. Given that, chaining

My theory is that this is because it increases the number of bounds
checks the JVM inserts dramatically.  Presumably it also decreases
locality of reference (in a way that storing them inline in the array
wouldn't), but I'm not sure about this point.

> should not impose much more overhead (just an additional reference in
> each Entry object) and could even result in lower memory usage if one
> takes into account that chained implementations perform relatively well
> with higher load factors (typically 0.75 instead of 0.50 for open
> addressing).
>
> So, I decided to create a ChainedHashMap starting from OpenHashMap and
> test the theory. Given the tests and benchmarks that David wrote for his
> implementations, it seemed easy enough. :)

Glad they proved useful. :-)

> After getting all of David's tests to pass, I ran the benchmarks with 3
(Continue reading)

Ismael Juma | 1 Sep 2008 16:24
Picon

Re: Open versus Chained HashMap

On Mon, 2008-09-01 at 14:52 +0100, David MacIver wrote:
> > David has told me that
> > the alternative of having separate arrays for keys, values and
> > (optionally) hashes was substantially slower. Given that, chaining
> 
> My theory is that this is because it increases the number of bounds
> checks the JVM inserts dramatically.  Presumably it also decreases
> locality of reference (in a way that storing them inline in the array
> wouldn't), but I'm not sure about this point.

Right, I thought about that, but in theory using Entry objects does not
provide better locality of reference since it just contains references
anyway. But maybe the JVM is doing something that causes better
locality of reference for Entry objects because it seemed to me that
the additional bounds checks should not cause the kind of disparity
that you saw. Storing everything in a single array would be interesting
and it's what IdentityHashMap in the JDK does. As you know,
IdentityHashMap needs to solve a more restricted problem. They can rely on
linear probing because reference hashCodes in HotSpot have good
distribution and are also cheap to compute (so no need to cache them).

> In theory the probing scheme I used may be more resilient to bad
> hashes. I don't have a test case to prove this though. :-)

I don't understand why this is so. Note that ChainedHashMap uses the
same hashOf method as OpenHashMap, but it does not need to probe because
of the chaining nature.

> Is the source available anywhere? I'd be interested to take a look.

(Continue reading)

David Pollak | 1 Sep 2008 16:28
Gravatar

Re: Open versus Chained HashMap



Ismael Juma wrote:
On Mon, 2008-09-01 at 14:52 +0100, David MacIver wrote:
David has told me that the alternative of having separate arrays for keys, values and (optionally) hashes was substantially slower. Given that, chaining
My theory is that this is because it increases the number of bounds checks the JVM inserts dramatically. Presumably it also decreases locality of reference (in a way that storing them inline in the array wouldn't), but I'm not sure about this point.
Right, I thought about that, but in theory using Entry objects does not provide better locality of reference since it just contains references anyway. But maybe the JVM is doing something that causes better locality of reference for Entry objects because it seemed to me that the additional bounds checks should not cause the kind of disparity that you saw. Storing everything in a single array would be interesting and it's what IdentityHashMap in the JDK does. As you know, IdentityHashMap needs to solve a more restricted problem. They can rely on linear probing because reference hashCodes in HotSpot have good distribution and are also cheap to compute (so no need to cache them).
My understanding of the JVM is that objects are far better optimized than arrays.

In theory the probing scheme I used may be more resilient to bad hashes. I don't have a test case to prove this though. :-)
I don't understand why this is so. Note that ChainedHashMap uses the same hashOf method as OpenHashMap, but it does not need to probe because of the chaining nature.
Is the source available anywhere? I'd be interested to take a look.
I am attaching it. Note I basically started from OpenHashMap and modified it until it was using chaining and it passed the tests in order to benchmark it. I haven't cleaned it up since, so things might be a bit ugly. ;) Ismael
David MacIver | 1 Sep 2008 16:39
Picon

Re: Open versus Chained HashMap

On Mon, Sep 1, 2008 at 3:24 PM, Ismael Juma <ismaelj <at> gmail.com> wrote:
> On Mon, 2008-09-01 at 14:52 +0100, David MacIver wrote:
>> > David has told me that
>> > the alternative of having separate arrays for keys, values and
>> > (optionally) hashes was substantially slower. Given that, chaining
>>
>> My theory is that this is because it increases the number of bounds
>> checks the JVM inserts dramatically.  Presumably it also decreases
>> locality of reference (in a way that storing them inline in the array
>> wouldn't), but I'm not sure about this point.
>
> Right, I thought about that, but in theory using Entry objects does not
> provide better locality of reference since it just contains references
> anyway. But maybe the JVM is doing something that causes better

I had a convincing reason why it might happen, but I'm no longer sure
what it was.

It might be related to the position of the Entry objects. Given that
they're all freshly allocated in the current thread local allocation
block, most of them might still be in cache?

> locality of reference for Entry objects because it seemed to me that
> the additional bounds checks should not cause the kind of disparity
> that you saw. Storing everything in a single array would be interesting

I don't know. It wasn't an utterly dramatic slowdown, and on fairly
fast operations. Tripling the number of tests inside the lookup could
easily be a significant hit.

> and it's what IdentityHashMap in the JDK does. As you know,

The problem I had with this is that it would then require either
non-caching of the hash codes or boxing of the hash keys, so in the
former case it's potentially dramatically slower and in the latter it
wouldn't really decrease the memory use.

Non-caching hash codes could at least be interesting to experiment
with, as it works well with some cases (ints, strings, umm...
basically nothing else. :-) ).

Hm. Actually, you don't need to recompute the hash of the keys already
in the table. Currently the hash for them *is* used, but only because
it's cheap to do so. If that was removed it would result in more
expensive equality tests within the map but lower memory usage and
less indirection. Worth a go, certainly.

> IdentityHashMap needs to solve a more restricted problem. They can rely on
> linear probing because reference hashCodes in HotSpot have good
> distribution and are also cheap to compute (so no need to cache them).

Right.

>> In theory the probing scheme I used may be more resilient to bad
>> hashes. I don't have a test case to prove this though. :-)
>
> I don't understand why this is so. Note that ChainedHashMap uses the
> same hashOf method as OpenHashMap, but it does not need to probe because
> of the chaining nature.

The function chosen for probing is done in a way that can result in
very short probes where in principle you might end up with much longer
chains in the chaining.

Like I said, I don't have a test case that demonstrates this, so I
could be totally wrong. :-)

Ismael Juma | 1 Sep 2008 17:01
Picon

Re: Open versus Chained HashMap

On Mon, 2008-09-01 at 15:39 +0100, David MacIver wrote:
> On Mon, Sep 1, 2008 at 3:24 PM, Ismael Juma <ismaelj <at> gmail.com> wrote:
> > Right, I thought about that, but in theory using Entry objects does not
> > provide better locality of reference since it just contains references
> > anyway. But maybe the JVM is doing something that causes better
> 
> I had a convincing reason why it might happen, but I'm no longer sure
> what it was.
<some possible reasons snipped>

Yes, I think they're plausible, but hard to know for sure without some
lower-level profiling tools.

> > and it's what IdentityHashMap in the JDK does. As you know,
> 
> The problem I had with this is that
<...>
Yes, it's not clear if it's a win, but as you say might be worth a shot.

> >> In theory the probing scheme I used may be more resilient to bad
> >> hashes. I don't have a test case to prove this though. :-)
> >
> > I don't understand why this is so. Note that ChainedHashMap uses the
> > same hashOf method as OpenHashMap, but it does not need to probe because
> > of the chaining nature.
> 
> The function chosen for probing is done in a way that can result in
> very short probes where in principle you might end up with much longer
> chains in the chaining.
> 
> Like I said, I don't have a test case that demonstrates this, so I
> could be totally wrong. :-)

Oh I see. Interesting.

Ismael

Erik Engbrecht | 1 Sep 2008 17:19
Picon

Re: Open versus Chained HashMap

re: Identity hash codes
IIRC identity hash codes are effectively cached by the JVM...I think they're in the object header or something like that.  This needs to be done so that the hash code doesn't change from one invocation to another.  If you want a stable hash code, it either has to be based on something that won't change or once it is computed it has to be stored, so it becomes the thing that won't change.  Remember, Java objects move around in memory, so you can't use the physical address as a stable value.  You have to store something.

So you don't need to cache the value of identity hash code because the JVM is already doing it, not simply because it is cheap to compute (in fact initially computing and storing it requires some synchronization, so initial computation might not even been "cheap" for some definition of "cheap").

On Mon, Sep 1, 2008 at 10:39 AM, David MacIver <david.maciver <at> gmail.com> wrote:
On Mon, Sep 1, 2008 at 3:24 PM, Ismael Juma <ismaelj <at> gmail.com> wrote:
> On Mon, 2008-09-01 at 14:52 +0100, David MacIver wrote:
>> > David has told me that
>> > the alternative of having separate arrays for keys, values and
>> > (optionally) hashes was substantially slower. Given that, chaining
>>
>> My theory is that this is because it increases the number of bounds
>> checks the JVM inserts dramatically.  Presumably it also decreases
>> locality of reference (in a way that storing them inline in the array
>> wouldn't), but I'm not sure about this point.
>
> Right, I thought about that, but in theory using Entry objects does not
> provide better locality of reference since it just contains references
> anyway. But maybe the JVM is doing something that causes better

I had a convincing reason why it might happen, but I'm no longer sure
what it was.

It might be related to the position of the Entry objects. Given that
they're all freshly allocated in the current thread local allocation
block, most of them might still be in cache?

> locality of reference for Entry objects because it seemed to me that
> the additional bounds checks should not cause the kind of disparity
> that you saw. Storing everything in a single array would be interesting

I don't know. It wasn't an utterly dramatic slowdown, and on fairly
fast operations. Tripling the number of tests inside the lookup could
easily be a significant hit.

> and it's what IdentityHashMap in the JDK does. As you know,

The problem I had with this is that it would then require either
non-caching of the hash codes or boxing of the hash keys, so in the
former case it's potentially dramatically slower and in the latter it
wouldn't really decrease the memory use.

Non-caching hash codes could at least be interesting to experiment
with, as it works well with some cases (ints, strings, umm...
basically nothing else. :-) ).

Hm. Actually, you don't need to recompute the hash of the keys already
in the table. Currently the hash for them *is* used, but only because
it's cheap to do so. If that was removed it would result in more
expensive equality tests within the map but lower memory usage and
less indirection. Worth a go, certainly.

> IdentityHashMap needs to solve a more restricted problem. They can rely on
> linear probing because reference hashCodes in HotSpot have good
> distribution and are also cheap to compute (so no need to cache them).

Right.

>> In theory the probing scheme I used may be more resilient to bad
>> hashes. I don't have a test case to prove this though. :-)
>
> I don't understand why this is so. Note that ChainedHashMap uses the
> same hashOf method as OpenHashMap, but it does not need to probe because
> of the chaining nature.

The function chosen for probing is done in a way that can result in
very short probes where in principle you might end up with much longer
chains in the chaining.

Like I said, I don't have a test case that demonstrates this, so I
could be totally wrong. :-)



--
http://erikengbrecht.blogspot.com/
Ismael Juma | 1 Sep 2008 17:30
Picon

Re: Open versus Chained HashMap

On Mon, 2008-09-01 at 11:19 -0400, Erik Engbrecht wrote:
> re: Identity hash codes
> IIRC identity hash codes are effectively cached by the JVM.

Right, I know that[1]. I used the term compute too loosely and one could
say it was misleading. The point was simply that storing the hash in the
Map requires it to be computed once anyway, and subsequent invocations
are cheap enough (due to the caching as you say) that it's not worth
caching it in the Map itself.

Ismael

[1] That was one of the positive side-effects of the "Is Scala for
Academics and Egomaniacs" incident. ;)


Gmane