Aaron Dunlop | 1 Apr 2011 20:16
Picon

Possible deadlock in ForkJoinPool when parallelism = 1 ?

This is a follow-up to a similar posting at StackOverflow
(http://stackoverflow.com/questions/5493399/forkjoinpool-parallelism-1-deadlock).
I'm pretty new to the Fork-Join framework, so this is probably an
obvious case of user-error, but I haven't been able to figure it out,
and thus far, none of the comments on that post have resolved the
issue either.

I'm profiling a parallel algorithm over a range of thread-counts. My
tasks seem to work flawlessly if I create the ForkJoinPool with
parallelism > 1 (I've normally been running with 2-24 threads). But if
I create the ForkJoinPool with parallelism = 1, I see deadlocks after
an unpredictable number of iterations. And yes - setting parallelism =
1 is a strange practice, but I want to accurately ascertain the
overhead of the parallel implementation, which means comparing the
serial version and the parallel version run with a single thread.

Below is a simple example that illustrates the issue I'm seeing. The
'task' is a dummy iteration over a fixed array, divided recursively
into 16 subtasks. I chose this odd iteration simply to produce a
memory-bound workload - it's possible the task itself interacts oddly
with F-J or with JIT optimizations, but if so, I haven't been able to
tease out those interactions.

If run with THREADS = 2 (or more), it runs reliably to completion, but
if run with THREADS = 1, it invariably deadlocks. After an
unpredictable number of iterations, the main loop hangs in
ForkJoinPool.invoke(), waiting in task.join(), and the worker thread
exits. (I've been running between 10000 and 50000 ITERATIONS,
depending on the host hardware)

(Continue reading)

Doug Lea | 1 Apr 2011 22:25
Favicon

Re: Possible deadlock in ForkJoinPool when parallelism = 1 ?

On 04/01/11 14:16, Aaron Dunlop wrote:
> I'm pretty new to the Fork-Join framework, so this is probably an
> obvious case of user-error.

Thanks! This was an FJ error (of prematurely terminating a worker).
Now fixed in our sources and hopefully soon in openjdk builds.

-Doug
Christian Vest Hansen | 4 Apr 2011 21:47
Picon
Gravatar

What are the odds of an "early read?"

Hi,

Consider the situation where we have an ordinary field A and a volatile field B.
Thread 1 writes to A and then to B. Then thread 2 reads B and then A,
and the write to A is guaranteed to happen-before the read of A
because it piggy-backed on the ordering guarantees of B.
However, what if a third thread was doing another write sequence
similar to that of thread 1, but slightly delayed? Could thread 2 then
observe the write of thread 1 to B, and the write of thread 3 to A?

--

-- 
Venlig hilsen / Kind regards,
Christian Vest Hansen.
Dimitris Andreou | 4 Apr 2011 22:04
Picon

Re: What are the odds of an "early read?"

Of course it can happen:


E1: thread1 writes A, B
E2: thread2 reads B
E3: thread3 writes A, B
E4: thread2 reads A

Unless E3 "happens before" E1, but you didn't specify that.

On Mon, Apr 4, 2011 at 12:47 PM, Christian Vest Hansen <karmazilla <at> gmail.com> wrote:
Hi,

Consider the situation where we have an ordinary field A and a volatile field B.
Thread 1 writes to A and then to B. Then thread 2 reads B and then A,
and the write to A is guaranteed to happen-before the read of A
because it piggy-backed on the ordering guarantees of B.
However, what if a third thread was doing another write sequence
similar to that of thread 1, but slightly delayed? Could thread 2 then
observe the write of thread 1 to B, and the write of thread 3 to A?

--
Venlig hilsen / Kind regards,
Christian Vest Hansen.
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Marco Villalobos | 7 Apr 2011 08:04

Re: What are the odds of an "early read?"

I would say that since you did not specify the use of any lock or
synchronized blocks / methods, then all "happens-before" ordering is
thrown out the window, and this program will not be deterministic in
this multi-threaded scenario.

On Mon, Apr 4, 2011 at 12:47 PM, Christian Vest Hansen
<karmazilla <at> gmail.com> wrote:
> Hi,
>
> Consider the situation where we have an ordinary field A and a volatile field B.
> Thread 1 writes to A and then to B. Then thread 2 reads B and then A,
> and the write to A is guaranteed to happen-before the read of A
> because it piggy-backed on the ordering guarantees of B.
> However, what if a third thread was doing another write sequence
> similar to that of thread 1, but slightly delayed? Could thread 2 then
> observe the write of thread 1 to B, and the write of thread 3 to A?
>
> --
> Venlig hilsen / Kind regards,
> Christian Vest Hansen.
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest <at> cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
Doug Lea | 13 Apr 2011 02:07
Favicon

ConcurrentHashMap footprint and contention improvements


For years, we've known that ConcurrentHashMaps have initial
footprints (over 1000 bytes using default constructor) that
are too big for casual use. And that the best way to address
this would be to use the Fences API to emulate "final field"
memory model guarantees in the presence of lazy initialization.
But we aren't releasing the Fences API. So I committed a version
that instead uses Unsafe calls to essentially the same effect
(reducing initial footprint to around 100 bytes, plus a few
percent savings for large populated tables). Also, this
version includes throughput improvements under contention
(mainly by interleaving locking with probes, to absorb cache misses),
which can double performance on big tables with many threads.
While conceptually straightforward, these lead to many
line-of-code changes.

The main price paid for these improvements is a greater reliance
of "volatile" vs "final" reads, which are essentially equivalent
in cost on most machines, but can be more costly on ARM and POWER.
Even on these though, the net effect should be positive.

It would be helpful if members of this list could help check
that this is so. The committed version is now
in java.util.concurrent sources (at
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
and can be run by getting jsr166.jar and using
"java -Xbootclasspath/p:jsr166.jar" with any java7 build
or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html).
Also, as an alternative, I temporarily placed an unpackaged
source version (with the class renamed "CHM")
at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java
You can compile and somehow run in any java6/7 JVM.

While working on these changes, I also contemplated other
more extensive redesigns, including Cliff Click's non-blocking
version (http://sourceforge.net/projects/high-scale-lib/)
which usually has better scalability with large numbers
of threads solely using get and put, but not otherwise
uniformly a better choice.

-Doug
dmitry.miltsov | 13 Apr 2011 18:54
Picon
Favicon

Auto Reply: Concurrency-interest Digest, Vol 75, Issue 4

This is an auto-replied message.
I'm on vacation from April 12, returning to the office on April 18.

My backup persons are:
java.lang - Victor Rudometov;
java.security, javax.security - Paul Rank;
java.text, java.util - Yuri Gaevsky.

Please contact my manager Pavel Klodin regarding other issues. 

Thanks,
Dmitry Miltsov
Ben Manes | 13 Apr 2011 20:24
Picon
Favicon

Re: ConcurrentHashMap footprint and contention improvements

Another approach to consider for improving write throughput might be to guard the lock with a SynchronousQueue and attempt to transfer the operation to the thread currently holding the lock. If the transfer was successful, then the waiting thread would spin until the response materialized (e.g. per-op volatile result field). This would be inserted after the scanning phase exhausted its retries, as we can safely assume contention. Under high write loads this would increase the penalty for the thread performing the operation in exchange for reducing the contention of the lock (which should be a net benefit).
iv>

From: Doug Lea <dl <at> cs.oswego.edu>
To: "Concurrency-interest <at> cs.oswego.edu" <Concurrency-interest <at> cs.oswego.edu>
Sent: Tuesday, April 12, 2011 5:07 PM
Subject: [concurrency-interest] ConcurrentHashMap footprint and contention improvements


For years, we've known that ConcurrentHashMaps have initial
footprints (over 1000 bytes using default constructor) that
are too big for casual use. And that the best way to address
this would be to use the Fences API to emulate "final field"
memory model guarantees in the presence of lazy initialization.
But we aren't releasing the Fences API. So I committed a version
that instead uses Unsafe calls to essentially the same effect
(reducing initial footprint to around 100 bytes, plus a few
percent savings for large populated tables). Also, this
version includes throughput improvements under contention
(mainly by interleaving locking with probes, to absorb cache misses),
which can double performance on big tables with many threads.
While conceptually st raightforward, these lead to many
line-of-code changes.

The main price paid for these improvements is a greater reliance
of "volatile" vs "final" reads, which are essentially equivalent
in cost on most machines, but can be more costly on ARM and POWER.
Even on these though, the net effect should be positive.

It would be helpful if members of this list could help check
that this is so. The committed version is now
in java.util.concurrent sources (at
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
and can be run by getting jsr166.jar and using
"java -Xbootclasspath/p:jsr166.jar" with any java7 build
or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html).
Also, as an alternative, I temporarily placed an unpackaged
source version (with the class renamed "CHM")
at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java
You can compile and somehow run in any java6/7 JVM.

While wo rking on these changes, I also contemplated other
more extensive redesigns, including Cliff Click's non-blocking
version (http://sourceforge.net/projects/high-scale-lib/)
which usually has better scalability with large numbers
of threads solely using get and put, but not otherwise
uniformly a better choice.

-Doug

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Rodrigo Kumpera | 14 Apr 2011 21:52
Picon
Gravatar

Re: ConcurrentHashMap footprint and contention improvements

Did you consider other lock free algorithms such as Shalev-Shavit

Split-Ordered List based hashtable?


On Tue, Apr 12, 2011 at 9:07 PM, Doug Lea <dl <at> cs.oswego.edu> wrote:

For years, we've known that ConcurrentHashMaps have initial
footprints (over 1000 bytes using default constructor) that
are too big for casual use. And that the best way to address
this would be to use the Fences API to emulate "final field"
memory model guarantees in the presence of lazy initialization.
But we aren't releasing the Fences API. So I committed a version
that instead uses Unsafe calls to essentially the same effect
(reducing initial footprint to around 100 bytes, plus a few
percent savings for large populated tables). Also, this
version includes throughput improvements under contention
(mainly by interleaving locking with probes, to absorb cache misses),
which can double performance on big tables with many threads.
While conceptually straightforward, these lead to many
line-of-code changes.

The main price paid for these improvements is a greater reliance
of "volatile" vs "final" reads, which are essentially equivalent
in cost on most machines, but can be more costly on ARM and POWER.
Even on these though, the net effect should be positive.

It would be helpful if members of this list could help check
that this is so. The committed version is now
in java.util.concurrent sources (at
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
and can be run by getting jsr166.jar and using
"java -Xbootclasspath/p:jsr166.jar" with any java7 build
or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html).
Also, as an alternative, I temporarily placed an unpackaged
source version (with the class renamed "CHM")
at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java
You can compile and somehow run in any java6/7 JVM.

While working on these changes, I also contemplated other
more extensive redesigns, including Cliff Click's non-blocking
version (http://sourceforge.net/projects/high-scale-lib/)
which usually has better scalability with large numbers
of threads solely using get and put, but not otherwise
uniformly a better choice.

-Doug

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Martin Buchholz | 15 Apr 2011 03:13
Picon
Favicon

Re: ConcurrentHashMap footprint and contention improvements

I'm looking at the latest CHM.containsValue.

Suppose the first traversal over the segments discovers no entries
(map apparently empty).  The intent appears to be to retry, but in
this case both sum and last will be 0L, causing the search to be
abandoned.  An obvious fix would be to set last to be anything other
than 0L, and more "random" initial values for last (like reusing
serialVersionUID) would be safer still.

But even if you did that, you might conceivably have two consecutive
traversals both come up empty, with sum == 0L, in which case you will
never get around to doing a fully locked traversal?

Martin

On Tue, Apr 12, 2011 at 17:07, Doug Lea <dl <at> cs.oswego.edu> wrote:
>
> For years, we've known that ConcurrentHashMaps have initial
> footprints (over 1000 bytes using default constructor) that
> are too big for casual use. And that the best way to address
> this would be to use the Fences API to emulate "final field"
> memory model guarantees in the presence of lazy initialization.
> But we aren't releasing the Fences API. So I committed a version
> that instead uses Unsafe calls to essentially the same effect
> (reducing initial footprint to around 100 bytes, plus a few
> percent savings for large populated tables). Also, this
> version includes throughput improvements under contention
> (mainly by interleaving locking with probes, to absorb cache misses),
> which can double performance on big tables with many threads.
> While conceptually straightforward, these lead to many
> line-of-code changes.
>
> The main price paid for these improvements is a greater reliance
> of "volatile" vs "final" reads, which are essentially equivalent
> in cost on most machines, but can be more costly on ARM and POWER.
> Even on these though, the net effect should be positive.
>
> It would be helpful if members of this list could help check
> that this is so. The committed version is now
> in java.util.concurrent sources (at
> http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
> and can be run by getting jsr166.jar and using
> "java -Xbootclasspath/p:jsr166.jar" with any java7 build
> or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html).
> Also, as an alternative, I temporarily placed an unpackaged
> source version (with the class renamed "CHM")
> at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java
> You can compile and somehow run in any java6/7 JVM.
>
> While working on these changes, I also contemplated other
> more extensive redesigns, including Cliff Click's non-blocking
> version (http://sourceforge.net/projects/high-scale-lib/)
> which usually has better scalability with large numbers
> of threads solely using get and put, but not otherwise
> uniformly a better choice.
>
> -Doug
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest <at> cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

Gmane