Anuj Shah | 20 Oct 21:41 2014
Picon

[jgroups-users] ExecutionService at least once behaviour

I've noticed that runnable tasks submitted to the ExecutionService can be run more than once!

This would happen when a member executing the task is suspected and is removed from the cluster. The protocol handles this by resubmitting the request in Executing#handleView. I note the following JavaDoc

                // The person currently servicing our request has gone down
                // without completing so we have to keep our request alive by
                // sending ours back to the coordinator

The problem is that there is no guarantee that the request was not completed.

For my application a member executing the task had a absurdly long GC pause (30s) which meant it was temporarily removed from the cluster, when the pause completed it happily continued executing the task which had already been resubmitted and completed. The task in question involves modifying the database and is quite destructive, so you can imagine the fallout. 

I was hoping to get an opinion of if we think this behaviour is correct, especially since we are implementing a standard java,util.concurrent interface. (I couldn't find anything in the ExecutorService JavaDoc to say it is wrong though)

Perhaps there could be control over the behaviour:
* At least once - assumes task failed and resubmits
* At most once - assumes task completed and cleans up - may not actually be complete
* Exactly once - not sure if this is possible
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Bela Ban | 15 Oct 15:36 2014
Picon

Re: [jgroups-users] known issue with pauses on jdk1.7?


On 07/10/14 17:52, Bobby Bissett wrote:
> Hi all,
>
> I suspect this might be a garbage collection issue with the jdk, but
> wanted to ask if other people have run into it just in case. My protocol
> stack is below, and the jvm is from Oracle:
>
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>
> What we're seeing is that, after "some time," a member in a 3-node group
> seems to pause on one node for about 20 seconds. During that time there
> is no jgroups logging (Level.FINER) of heartbeat messages or logging
> from our own threads.

This points to a GC pause, as you suspected. JGroups will suspect and 
exclude the paused node from the cluster by FD, only to be re-merged 
back later by MERGE3.

> The node leaves the view and eventually rejoins
> it, but the long time pause causes other parts of our app to fail (the
> app mostly creates a jdbc connection every 10 seconds to ping a database
> and then cleans up the connection). This wasn't an issue on OpenJDK 6.

You can mitigate this on the JGroups side and up the timeout*max_tries 
to be greater than your longest GC pause. Currently the product is 9s in 
your config below.
This will reduce the number of false suspicions, but not completely 
eliminate them. E.g. if you up the combined timeout to 30s, but have a 
50s GC pause (I've seen those before, on large heaps with falsely 
configured GC options), then all bets are off again.

However, this will not help your DB connection code, as it'll hang as 
well. Also, if you invoke blocking RPCs into the paused node, the RPC 
will hang for the duration of the pause, or throw a TimeoutException if 
the max time of the call is less then the GC pause.

So the fix is really to tune your GC options, cause that's the root 
cause. Modern JVMs allow you to define max pause times, and that should 
really do the job.

> Before I go diving into GC tuning, has anyone else seen this before? I
> have logs available if that helps any. At the time of the pause, there
> are 2 lines out of order in the log, one appearing 20 seconds later than
> it was supposed to.
>
> Thanks,
> Bobby
>
>         JChannel channel = new JChannel(false);
>          ProtocolStack stack = new ProtocolStack();
>          channel.setProtocolStack(stack);
>
>          stack.addProtocol(new TCP()
>              .setValue("oob_thread_pool_keep_alive_time", 5000)
>              .setValue("timer_keep_alive_time", 3000)
>              .setValue("bind_addr", InetAddress.getByName(<address>)
>              .setValue("bind_port", bindingPort)
>              .setValue("thread_pool_min_threads", 1)
>              .setValue("thread_pool_keep_alive_time", 5000)
>              .setValue("send_buf_size", 640000)
>              .setValue("oob_thread_pool_queue_max_size", 100)
>              .setValue("oob_thread_pool_max_threads", 8)
>              .setValue("thread_pool_queue_enabled", false)
>              .setValue("sock_conn_timeout", 300)
>              .setValue("oob_thread_pool_min_threads", 1)
>              .setValue("loopback", false)
>              .setValue("oob_thread_pool_queue_enabled", false)
>              .setValue("max_bundle_timeout", 30)
>              .setValue("thread_pool_queue_max_size", 100)
>              .setValue("recv_buf_size", 5000000))
>              .addProtocol(new TCPPING()
>                  .setValue("initial_hosts", <3 addresses here>)
>                  .setValue("num_initial_members", 3)) // default: 10
>              .addProtocol(new MERGE2()
>                  .setValue("min_interval", 10000)
>                  .setValue("max_interval", 30000))
>              .addProtocol(new FD_SOCK())
>              .addProtocol(new FD()
>                  .setValue("max_tries", getJGRoupsMaxTries()) <-- 3
>                  .setValue("timeout", getJGroupsTimeout())) <-- 3000
>              .addProtocol(new VERIFY_SUSPECT()
>                  .setValue("timeout", 1500))
>              .addProtocol(new BARRIER())
>              .addProtocol(new NAKACK2()
>                  .setValue("use_mcast_xmit", false))
>              .addProtocol(new UNICAST3()
>                  .setValue("conn_close_timeout", 5000L))
>              .addProtocol(new STABLE()
>                  .setValue("desired_avg_gossip", 50000)
>                  .setValue("max_bytes", 4000000)
>                  .setValue("stability_delay", 1000))
>              .addProtocol(createAuthProtocol())
>              .addProtocol(new GMS()
>                  .setValue("join_timeout", 3000)
>                  .setValue("print_local_addr", false))
>              .addProtocol(new MFC()
>                  .setValue("max_credits", 2000000)
>                  .setValue("min_credits", 800000))
>              .addProtocol(new FRAG2());
>          stack.init();

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
Bela Ban | 6 Oct 08:39 2014
Picon

[jgroups-users] Docker and JGroups

FYI,

[1] http://belaban.blogspot.ch/2014/10/jgroups-and-docker.html

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
Dave Rathnow | 3 Oct 20:20 2014
Picon

[jgroups-users] How to stop JGRP000010 warnings

We recently installed a third party product in our network that uses a version of JBoss that includes JGroups.  I do development using JBoss and, since they installed the product, have been finding these messages in my log file:

 

WARN  [org.jgroups.protocols.UDP] () JGRP000010: packet from 192.168.170.57:45688 has different version (2.11.0) than ours (3.2.12); packet is discarded (received 7 identical messages from 192.168.170.57:45688 in the last 61,400 ms)

 

I have two questions:

 

1.      What is causing this?

2.      Is there any way to stop it?

 

Thanks,
Dave

 

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Padki, Anuradha | 3 Oct 16:26 2014

[jgroups-users] jgroups access current list of GossipRouter Hosts

Hello,

 

I am trying to find out if there is a way to find out the CURRENT list of GossipRouter Hosts from a channel?

As I see in the code the list is not being updated even when a gossip router host is not accessible.

On a related note, what exactly are the implications when none of the GossipRouter Hosts are available?

I see that the when this happens all the nodes/channels become primary/coordinator, is this how it works?

Any response is very much appreciated.

Regards

-          Anu

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Grega Poga─Źnik | 29 Sep 22:40 2014
Picon

[jgroups-users] total ordering protocols (sequencer)

Hi,

does SEQUENCER depend on multicast and thus the use of UDP transport? I can imagine that the coordinator needs to send out everything "atomically". I mean else there could be trouble if the coordinator crashes in the middle of the batch. Or how is that handled?

I apologize if that is already covered in some part of the documentation.

I recall reading that jgroups also had a token based solution but that got abandoned. How exactly did that work? What were the problems with it? Any plans to get it "resurrected"?

Thanks.

Regards,
Gregor
------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Bela Ban | 26 Sep 09:11 2014
Picon

[jgroups-users] JGroups 3.5.1.Final released

FYI, I released 3.5.1.Final [1].

It does contain a few important fixes, e.g.

- https://issues.jboss.org/browse/JGRP-1880: UDP.ip_ttl is always set to 
1. This causes JGroups messages sent across multiple switches not to get 
received

- https://issues.jboss.org/browse/JGRP-1885: a bind_addr of 0.0.0.0 now 
throws an exception. Setting bind_addr to the wildcard address, would 
result in members not being able to communicate, and is a silent showstopper

- https://issues.jboss.org/browse/JGRP-1876: MergeView shows incorrect 
subgroups in some cases (the merged view is correct though). This can 
have an impact on merge handling code

- https://issues.jboss.org/browse/JGRP-1865: RELAY2 (xsite) doesn't work 
over FORK

- https://issues.jboss.org/browse/JGRP-187{4,5}: reduce traffic in a 
SYNC case

Note that 3.6 will be baselined on JDK 7, while 3.5.x will continue to 
work with JDK 6.

[1] https://issues.jboss.org/browse/JGRP/fixforversion/12325575

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Kowshik NS | 24 Sep 16:09 2014

Re: [jgroups-users] Node Recognition in Jgroups 3.0.6Final

Thanks Bela. We will check about the issues below

And sorry about the direct e-mail .  I will post this to javagroups-users <at> lists.sourceforge.net

Regards,
Kowshik NS

-----Original Message-----
From: Bela Ban [mailto:belaban <at> yahoo.com] 
Sent: 24 September 2014 15:05
To: Kowshik NS
Subject: Re: Node Recognition in Jgroups 3.0.6Final

Could be a firewall issue. Are the members listening on port 7801 ? Are they binding to the 10.x.x.x
addresses ?
Please post to the JGroups mailing list, not to me directly next time...

On 24/09/14 16:02, Kowshik NS wrote:
> Hi Bela,
>
> We are trying to set up a cluster on two different machines. A,B,C, 
> (Machine 1 nodes ) D,E,F ( Machine 2 nodes)
>
> We did test this on our internal network with this and can see a 
> proper cluster formed
>
> However in our customer environment with the same setting Machine 1 is 
> not able to recognise nodes on Machine 2 and viceversa.
>
> We see machines able to ping each other.
>
> Please could you help us.
>
> We are using Jgroups 3.0.6Final and TCP.xml shipped with 3.0.6FInal
>
> The setting include
>
> -Djgroups.bind_addr=10.125.45.11
> -Djgroups.tcpping.initial_hosts="10.125.45.11[7810],10.125.45.12[7810]"
> -Djava.net.preferIPv4Stack=true
>
> Best Regards,
>
> Kowshik NS
>
> CONFIDENTIALITY: This e-mail (including any attachments) may contain 
> confidential, proprietary and privileged information, and unauthorized 
> disclosure or use is prohibited. If you received this e-mail in error, 
> please notify the sender and delete this e-mail from your system.
>

--
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Bela Ban | 24 Sep 12:30 2014
Picon

[jgroups-users] JGroups 3.6 will require JDK 7

FYI,

as the subject says... JDK 6 was EOL'ed in 2011 [1], so it's time to 
move on.

3.5 and all patch releases off of that branch (3.5.1) will be the last 
release(s) on JDK 6.

[1] http://www.oracle.com/technetwork/java/eol-135779.html

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Caleb Cushing | 18 Sep 18:55 2014
Picon

[jgroups-users] message size (216081) is greater than max bundling size (200000) but frag2 60k

I've asked the question on SO, but I'll repeat it here

http://stackoverflow.com/q/25916835/206466

> [ERROR] [2014-08-28 07:09:17.100] failed sending message to cluster (216081 bytes):
java.lang.Exception: message size (216081) is greater than max bundling size (200000). Set the
fragmentation/bundle size in FRAG and TP correctly, cause: null - protocols.TCP
[http-apr-8080-exec-10] [48488852 ms]

after doing some research about the configuration

    <config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/JGroups-3.1.xsd">
    <TCP bind_port="${ehcacheDistributed.jgroups.bindPort}"
         bind_addr="NON_LOOPBACK"
         loopback="false"
         recv_buf_size="${tcp.recv_buf_size:20M}"
         send_buf_size="${tcp.send_buf_size:640K}"
         discard_incompatible_packets="true"
         max_bundle_size="200K"
         max_bundle_timeout="30"
         enable_bundling="true"
         use_send_queues="true"
         sock_conn_timeout="300"

         timer_type="new"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="10"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"/>

    <TCPPING timeout="3000"
             initial_hosts="${ehcacheDistributed.jgroups.tcpping.initialhosts}"
             port_range="1"
             num_initial_members="10"/>
    <MERGE2  min_interval="10000"
             max_interval="30000"/>
    <FD_SOCK/>
    <FD timeout="35000" max_tries="6" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 use_mcast_xmit="false"
                   discard_delivered_msgs="true"/>
    <UNICAST />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"

                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <!--RSVP resend_interval="2000" timeout="10000"/-->
    <pbcast.STATE_SOCK/>
    </config>

we're using jgroups 3.1.0, it would appear that jgroups isn't breaking
the message apart properly using FRAG2, why is that? what'm I missing?

--

-- 
Caleb Cushing

http://xenoterracide.com

Calendar:
https://www.google.com/calendar/embed?src=xenoterracide%40gmail.com&ctz=America/Chicago

------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
Bela Ban | 29 Aug 11:01 2014
Picon

[jgroups-users] JGroups 3.5.0.Final released

FYI: http://belaban.blogspot.ch/2014/08/jgroups-350final-released.html

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/

Gmane