Romain Colle | 22 May 18:20

ForkJoinPool does not achieve expected parallelism

Hi Doug and all,


Using the latest version of the ForkJoinPool available on the jsr166y website, we ran into an issue/regression with our existing code base.

In our code, we have a single task that scans a fairly large fact table while applying some filtering conditions. Any row that passes this condition must then be processed.
The pattern that we use is that the scanning task saves (in an array) the rows that passed the condition and forks a processing task as soon as we have enough rows to process (say 1024).
We expect that these processing tasks will be picked up (i.e. stolen) by the other threads in the pool and executed while we continue scanning and filtering.
Unrelated to the current issue, we have a completion phase at the end of the scanning that ensures all the forked tasks have been executed.

To sum it up, we have a single task that scans some data and forks lots of processing task.
Previously (with the version from a few months back without the worker queues), this was working perfectly and all the worker threads were kept busy and executed the processing tasks.
Now, we see that only a few threads are being kept busy with the processing tasks (in addition to the scanning task). Dozens of threads are idle while work is piling up.

If have put together a simple test that exhibits this issue:

Looking at the ForkJoinPool code, it looks like the issue could be in ForkJoinPool.WorkQueue.push().
More specifically, we have the following:
                if ((n = (top = s + 1) - base) <= 2) {
                    if ((p = pool) != null)
                        p.signalWork();
                }

If tasks are being forked quickly enough, we will only signal work twice while the local queue is getting fairly large, and no extra help is being made available.
I naively modified this code to the following and got back to the initial behavior (full threads usage):
                if ((n = (top = s + 1) - base) <= pool.parallelism) {
                    if ((p = pool) != null)
                        p.signalWork();
                }

I'm not quite sure why the initial "2" was put there. Was it to avoid flooding the threads with signals if everybody is already at work? In that case we could do the signaling only if the AC count is < 0.
I'm sure there is a more elegant solution available, so any advice on whether this is indeed a core issue or if there is an issue with our pool usage would be appreciated!

Thanks, 

--
Romain Colle
R&D Project Manager
QuartetFS
2 rue Jean Lantier, 75001 Paris, France
http://www.quartetfs.com
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
√iktor Ҡlang | 21 May 19:43
Picon

Re: CallerRunsOrRejects for JDK8?



On Mon, May 21, 2012 at 7:19 PM, Aleksey Shipilev <aleksey.shipilev <at> gmail.com> wrote:
On Sat, May 12, 2012 at 8:05 PM, √iktor Ҡlang <viktor.klang <at> gmail.com> wrote:
> As a sidenote, it'd be nice to have an AsyncExecutor, which is never allowed
> to do caller-runs at all. Some code might rely on the block not being
> executed on submission.

I had always had a feeling this is the difference between
Executor.execute() and ExecutorService.submit(). submit() should not
allow to do caller-runs unless explicitly told so by rejection policy.
I would anticipate lots of people are betting on that.

Problem is all the baggage of ExecutorService. :/

Cheers,
 

-Aleksey.



--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: <at> viktorklang

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Min Zhou | 21 May 09:43
Picon

What's mean TLR?

Hi, all 

Here is some lines of comments in ThreadLocalRandom.java

    // Padding to help avoid memory contention among seed updates in
    // different TLRs in the common case that they are located near
    // each other.

What's the above TLR short for ?

Thanks,
Min
--
My research interests are distributed systems, parallel computing and bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Chris Vest | 19 May 14:45
Picon
Gravatar

Promise implementation?

Hi,


What are the odds of getting a Promise implementation in j.u.c.? I occasionally find myself needing something like that.

I'm thinking it could perhaps look a bit like this (to show a concrete example of an API):

Cheers,
Chris
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Zhong Yu | 16 May 18:55
Picon

a volatile bug?

as reported on
http://stackoverflow.com/questions/10620680

basically there are

    volatile int a;
    int b;

Thread 1:

    b=1;
    a=1;

Thread 2:

    while(a==0)
        ;
    if(b==0)
        print("error");

"error" is seen printed on 32 bit JDK6 on 64bit machine
Chris Vest | 13 May 22:22
Picon
Gravatar

Object finalization

Hi,
I could not find any resources that I felt were clear and trustworthy enough, so...

I would like to have confirmed (or busted) my assumption that the following code is buggy, because the garbage collector might set the `a` field to null, before this `finalize` method can read it, thus preventing the method from attempting to cleanly dispose of the `Resource` in the `b` field — this all under the assertion that `safelyDispose` itself never throws:
public class SomeService {
  private final Resource a, b;
  
  public SomeService(Resource a, Resource b) {
    assert a != null && b != null;
    this.a = a;
    this.b = b;
  }
  <at> Override
  protected void finalize() throws Throwable {
    a.safelyDispose();
    b.safelyDispose();
  }
}
Cheers,
Chris
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Michael Barker | 13 May 09:40
Picon

Re: Quest for the optimal queue

>> > I'd like to explore alternatives to ConcurrentLinkedQueue, especially to
>> > get
>> > a bit lower latency and perhaps even lower mem usage.
>>
>> For which use case are you looking to improve latency; the latency
>> cost on the producing thread or latency for moving the message from
>> the publishing thread to the consuming thread?
>
>
> The latency between writers and reader. So the latter.

With most of the experimentation done on the Disruptor, the thing that
has the biggest impact on latency is how you notify the consumer
thread.  I've seen latencies as low as 180ns using
ConcurrentLinkedQueue, but required a full busy spin to get there.
That's compared to a lock/condition wake-up on ABQ of about 32 000ns.
The Disruptor has a number of different "WaitStrategies" based on the
trade off between CPU consumption and latency.

I've been meaning to explore a phased back-off strategy of busy-spin
-> spin with yield -> LockSupport.parkNanos.  Building good back-off
strategy in Java is a bit tricky as LockSupport.parkNanos is not
always predictable and you don't have fine grained access to some of
the OS timer facilities.  The latency of the LockSupport.parkNanos(1L)
tends to be somewhere in the region of 65µs on a reasonably modern
Linux kernel, which is okay, I have seen it run into milliseconds on
some revisions of MS Windows.

I'd be interested in what your consumer looks like.  You mentioned
having millions of the these queues, I'm assuming that you won't be
able to have millions of Java threads as consumers, so you'd be
consuming from multiple queues in a given thread?

Mike.
√iktor Ҡlang | 12 May 18:39
Picon

CallerRunsOrRejects for JDK8?

Hey,

How about this for inclusion in JDK8?

/**
 * The RejectedExecutionHandler used by Akka, it improves on CallerRunsPolicy
 * by throwing a RejectedExecutionException if the executor isShutdown.
 * (CallerRunsPolicy silently discards the runnable in this case, which is arguably broken)
 */
class CallerRunsOrRejects extends RejectedExecutionHandler {
  def rejectedExecution(runnable: Runnable, threadPoolExecutor: ThreadPoolExecutor): Unit = {
    if (threadPoolExecutor.isShutdown) throw new RejectedExecutionException("Shutdown")
    else runnable.run()
  }
}

Cheers,

--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: <at> viktorklang

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
√iktor Ҡlang | 11 May 17:00
Picon

Quest for the optimal queue

Hey guys,


I'd like to explore alternatives to ConcurrentLinkedQueue, especially to get a bit lower latency and perhaps even lower mem usage.

Behavior:
No locks
Unbounded
Single consumer
Multiple producers

Operations:

dequeue
enqueue
numberOfMessages // Would be nice to have as a constant, can be linear or simply not supported, doesn't really matter
hasMessages // Just a Boolean if there's anything in there at all, only needs to return true if something has been put in that hasn't been pulled out yet

Is there anything out there which is better than CLQ?

Cheers,

--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: <at> viktorklang

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest <at> cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Gmane