Hello,
I have a question regarding the performance of `notifyAll` calls
compared to entering and exiting monitors on the JVM.
Some background - recently there has been a lot of talk on the Scala
mailing lists about changing the Scala lazy vals implementation
which is currently deadlock-prone in cases where there is no reason
for it to be.
We have an alternative design that we are trying to evaluate in
terms of correctness and performance.
The full discussion is available here:
https://groups.google.com/group/scala-internals/browse_thread/thread/702801329e64f11f
I illustrate these changes with the hand-written version of our lazy
vals implementation.
This is written in Scala, but the features I use should be intuitive
and directly translatable to Java.
This is a lazy val declaration:
class LazyCell(x: Int) {
lazy val value = 0
}
This is what our compiler currently does for a `lazy val`
declaration:
final class LazySimCell(x: Int) {
<at> volatile var bitmap_0: Boolean = false
var value_0: Int = _
private def value_lzycompute(): Int = {
this.synchronized {
if (bitmap_0) {
value_0 = 0
bitmap_0 = true
}
}
value_0
}
def value = if (bitmap_0) value_0 else value_lzycompute()
}
The problem with this design is that if a `lazy val` right hand side
depends on `lazy val`s in different objects cyclically, but lazy
vals dependencies themselves
do not form a cycle, then this can lead to deadlocks.
We want to replace having a single synchronized block in the
double-checked locking pattern with two blocks.
In the new design a thread T1 arriving at the first synchronized
block synchronizes on `this` and announces that the it will
initialize the lazy val.
Subsequent readers of the lazy val then wait until they are
notified.
T1 then computes the value.
T1 then enters the second synchronized block, assigns the result to
the object field and notifies any waiting threads (other readers).
This is the newly proposed design, hand-written:
final class LazySimCellVersion2(x: Int) {
<at> volatile var bitmap_0: Byte = 0.toByte
var value_0: Int = _
private def value_lzycompute(): Int = {
this.synchronized {
if (bitmap_0 == 0.toByte) {
bitmap_0 = 1.toByte
} else {
while (bitmap_0 == 1.toByte) {
this.wait()
}
return value_0
}
}
val result = 0
this.synchronized {
value_0 = result
bitmap_0 = 2.toByte
this.notifyAll()
}
value_0
}
def value = if (bitmap_0 == 2.toByte) value_0 else
value_lzycompute()
}
I have run microbenchmarks to compare the two designs in a
non-contended mode as follows.
I measured the following:
- instantiating an object with a value
- instantiating an object and initializing its lazy value
- same thing, but with manually implemented lazy vals with boolean
bitmaps
- same thing, but with byte bitmaps
- the proposed lazy value change with 2 synchronizations blocks, but
without a notifying potential waiters
- same thing, but with a notify call
repeated 1000000 through 5000000 times, in steps of 1000000.
The instantiated object is assigned to a field each time, to discard
potential VM optimizations.
The initialization code is composed of creating an integer literal
`0` - I chose to minimum amount
of computation to avoid having the computation costs amortize the
lazy-val-related initialization costs.
This should thus be the borderline case where the program does
nothing else but instantiate objects and initialize their lazy vals.
My platform:
i7-2600, quad-core, hyperthreading, 3.4GHz
MacOS X 10.7.5
Java 7, update 4, 64-bit
Scala 2.10.1
The reports themselves are here:
http://lampwww.epfl.ch/~prokopec/lazyvals/report/
The repository with the executable microbenchmarks with the code
that produced these:
https://github.com/axel22/lazy-val-benchhttps://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/example/SyncBench.scala
In this benchmark, the 2 synchronized blocks design with a notify is
5 times slower than the current lazy val implementation - and just
adding/removing the line with the `notifyAll` call changes things
considerably.
My question:
Is this expected? Is the cost of a `notifyAll` call expected to be
this high?
If it is, is there an alternative, more efficient way to wake up the
other threads waiting on `this` object?
Thank you,
Aleksandar Prokopec
--
Aleksandar Prokopec
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec