Picon
Picon

[jgroups-users] isolated node getting multiple view changes

Hi,

This is with version 3.4.7.Final (I'll upgrade in our next product version if I ever get there!), using the tcp-based stack below. A customer is trying to test how node isolation affects out clusters; e.g. if a master database node becomes isolated from the rest of the cluster it needs to detect that and take appropriate action to prevent split-brain issues.

Given {A - B - C} running on AWS, they're simulating B/C in one zone becoming isolated from A in another zone by shutting off the network on A: 'service network stop', wait, then start. When this happens, B and C see A leave the view properly and do the right thing, which is replace the master database.

On A, though, B and C are leaving in two separate view changes back to back, 1-2 seconds apart. This causes problems because A doesn't see itself suddenly isolated from a majority of the cluster, but instead thinks that two other nodes failed separately, and so it doesn't do the right thing.

Is there a setting I can change so that B/C both leave the view at the same time on A, or is there some other recommended way to handle this, for instance based on timing of view changes? This works properly in our testing for the case of physically pulling a network cable; bringing the network service down causes this other behavior that I need to get fixed.

Thanks,
Bobby

        JChannel channel = new JChannel(false);
        ProtocolStack stack = new ProtocolStack();
        channel.setProtocolStack(stack);
        stack.addProtocol(new TCP()
            .setValue("oob_thread_pool_keep_alive_time", 5000)
            .setValue("timer_keep_alive_time", 3000)
            .setValue("bind_addr", InetAddress.getByName(<get address>))
            .setValue("bind_port", bindingPort)
            .setValue("thread_pool_min_threads", 1)
            .setValue("thread_pool_keep_alive_time", 5000)
            .setValue("send_buf_size", 640000)
            .setValue("oob_thread_pool_queue_max_size", 100)
            .setValue("oob_thread_pool_max_threads", 8)
            .setValue("thread_pool_queue_enabled", false)
            .setValue("sock_conn_timeout", 300)
            .setValue("oob_thread_pool_min_threads", 1)
            .setValue("loopback", false)
            .setValue("oob_thread_pool_queue_enabled", false)
            .setValue("max_bundle_timeout", 30)
            .setValue("thread_pool_queue_max_size", 100)
            .setValue("recv_buf_size", 5000000))
            .addProtocol(new TCPPING()
                .setValue("initial_hosts", <get hosts>)
                .setValue("num_initial_members", 3)) // default: 10
            .addProtocol(new MERGE2()
                .setValue("min_interval", 10000)
                .setValue("max_interval", 30000))
            .addProtocol(new FD_SOCK())
            .addProtocol(new FD()
                .setValue("max_tries", <our default is 8>)
                .setValue("timeout", <our default is 5000>))
            .addProtocol(new VERIFY_SUSPECT()
                .setValue("timeout", 1500))
            .addProtocol(new BARRIER())
            .addProtocol(new NAKACK2()
                .setValue("use_mcast_xmit", false))
            .addProtocol(new UNICAST3()
                .setValue("conn_close_timeout", 5000L))
            .addProtocol(new STABLE()
                .setValue("desired_avg_gossip", 50000)
                .setValue("max_bytes", 4000000)
                .setValue("stability_delay", 1000))
            .addProtocol(<our own auth protocol>)
            .addProtocol(new GMS()
                .setValue("join_timeout", 3000)
                .setValue("print_local_addr", false))
            .addProtocol(new MFC()
                .setValue("max_credits", 2000000)
                .setValue("min_credits", 800000))
            .addProtocol(new FRAG2())
            .addProtocol(new STATE_TRANSFER());

------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] Atomic/reliable multicast to subgroup

Hi.

I have the following problem:
Suppose my cluster is {A, B, C, D}, and I'm node A. I need to send a message to {B,C} atomically -- i.e. either both receive it or neither. If during the process one of them fails (and is suspected by fault-detection, so it falls out of view), I don't want the other to receive it (maybe I'll choose to resend when the new view is installed and maybe not). Finally, I want to know the result of the operation, namely if both had the message delivered or neither. I have no requirement of total order (but it's OK to have it).

The docs suggest that the TOA protocol might be appropriate, but its semantics aren't clear. Also, it is unclear to me how the sender can be notified of the result.

How can I do this with JGroups?

Ron
------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] GridFilesystem

Hello,

I need to build a cluster which would cache data, but not in memory but on the file system.

I've found ReplCache, and thought I could follow the code as an example or make adjustments to store the data as files.

But l discovered that there is something like this already called GridFilesystem.

I couldn't find any example of how to use it or any documentation, so first of all, I wanted to ask if my assumption is correct, and these classes are for the case I'm talking about? 

And second question, I noticed that some comments says this is experimental feature, so how far the development moved? Could it be used on production with some monitoring, or this is not ready at all?

And any info about GridFilesystem or any example is greatly appreciated. I'm first time JGroups user, so any pointers or suggestions would be helpful.

Thank you,

Eugene

------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] S3_PING - Extra hosts show up in members file

I am JGroups 3.6.6.Final on a two-node cluster (for XWiki)

When I delete the members file in my S3 bucket and restart XWiki on both hosts, I notice that after the XWiki application starts up, the members (JGroups hosts) file in S3 contains multiple entries instead of just two:

wiki-gamma-2b-i-bdcdb578-57943 9c77acca-248d-6f81-e920-53404c41c112 172.22.163.161:7800 F wiki-gamma-2c-i-43890a98-6186 214cb3e2-e740-2f14-754e-b8bc6bc6536d 172.23.183.179:7800 F wiki-gamma-2c-i-43890a98-20775 80d26ae9-9738-aac7-ba38-5cf994f6db77 172.23.183.179:7800 F wiki-gamma-2b-i-bdcdb578-55801 f02a05ca-5201-89ed-f651-ce694f85ce77 172.22.163.161:7800 F wiki-gamma-2c-i-780385a3-9285 a2b05292-aeb3-6e2e-70b7-fbd5698e321a 172.23.153.66:7800 F wiki-gamma-2b-i-bdcdb578-28740 2b059b54-48dc-a58e-b8f7-8eb293b08bfb 172.22.163.161:7800 F wiki-gamma-2b-i-87add242-21142 91b3f796-9002-c447-4ace-ccef0fd6c0b9 172.22.133.102:7800 F wiki-gamma-2c-i-43890a98-7350 2e9d96a5-27c1-ac95-409d-fe08ce148bbc 172.23.183.179:7800 T wiki-gamma-2b-i-bdcdb578-46182 4aeb385a-fd17-9e87-27f9-1f1406abb0b6 172.22.163.161:7800 F wiki-gamma-2b-i-bdcdb578-47213 70176220-3337-3767-f1ca-1e598b09b3af 172.22.163.161:7800 F wiki-gamma-2c-i-43890a98-30807 df42bb9b-5ec1-7592-804e-989d7f4d5341 172.23.183.179:7800 F wiki-gamma-2b-i-bdcdb578-54816 977ef0f2-4279-438f-a238-7357ab5434dd 172.22.163.161:7800 F wiki-gamma-2c-i-43890a98-37675 3608cf76-8a81-4149-f117-612b6fc77dd0 172.23.183.179:7800 F wiki-gamma-2c-i-43890a98-26930 52760af1-75b6-548b-338f-670a1e6714a6 172.23.183.179:7800 F

I would expect to see only two hosts listed in the above file (no other files get created). 

Any pointers as to what might be causing this? 

Do the hosts in a JGroups cluster need to be started in a particular order?

—Debajit
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] FILE_PING in cloud environments

Hello,

I am using a FILE_PING discovery protocol(NATIVE_S3[1]) in my Wildfly 
cluster and noticed that the discovery files or entries in the discovery 
files aren't removed when a node shuts down gracefully. Is that supposed 
to be like that or am I missing a configuration?
This will become a big problem when the cluster is long running and very 
dynamic because the files are getting bigger and files of already 
stopped cluster nodes aren't removed. This leads to gradually increasing 
startup times of new nodes because these will try to join all other 
nodes which they will fail to do, since they shut down.
In the end, a socket startup timeout in Wildfly will be hit and the 
cluster becomes unusable. The main problem is, that I can't even purge 
the list manually even if I deleted all files because each node keeps a 
list of all the other nodes that they know of. Sooner or later these 
still running nodes will write out a file again containing all nodes and 
when a new node joins the cluster, it gets the whole list again.

I am using Wildfly 10.0.0.CR4 with JGroups 3.6.7-SNAPSHOT built from 
this commit: 
https://github.com/belaban/JGroups/commit/dea68562a80ec4ad493af3668ce7be711eacb7c5

Thanks for any help in advance!

Regards,
Christian

[1] https://github.com/Sweazer/jgroups-native-s3-ping/tree/wildfly10

dea68562a80ec4ad493af3668ce7be711eacb7c5dea68562a80ec4ad493af3668ce7be711eacb7c5

dea68562a80ec4ad493af3668ce7be711eacb7c5dea68562a80ec4ad493af3668ce7be711eacb7c5

dea68562a80ec4ad493af3668ce7be711eacb7c5

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Picon
Picon

[jgroups-users] Reducing the size of each message by ~22 bytes

FYI,
I just committed [1] which replaces the Headers class to manage headers 
by a Header[] array inside Message itself. The headers array is 
manipulated by static public functions in Headers.

This saves roughly *22bytes per message* compared to the previous code! 
Measurement excluded the actual array contents which should pretty much 
be the same (headers).

[1] https://issues.jboss.org/browse/JGRP-1990

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Picon
Picon

[jgroups-users] JGroups - JChannel hanging in connect()

Hi

 

I have a cluster setup with 5 nodes using TCP. It seems none of the node is able to connect to the cluster.

I took a thread dump and see that my application has not moved ahead of the call to JChannel.connect().

As anyone encountered this issue.?  I am on 3.6.6.Final. Thread dump below. 

 

java.lang.Thread.State: TIMED_WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00002aab1b246418> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at org.jgroups.util.CondVar.waitFor(CondVar.java:64)

        at org.jgroups.util.Responses.waitFor(Responses.java:129)

        at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:82)

        at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:41)

        at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1087)

        at org.jgroups.protocols.FC.down(FC.java:428)

        at org.jgroups.protocols.FRAG2.down(FRAG2.java:136)

        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1038)

        at org.jgroups.JChannel.down(JChannel.java:790)

        at org.jgroups.JChannel._connect(JChannel.java:564)

        at org.jgroups.JChannel.connect(JChannel.java:294)

        - locked <0x00002aab178a1d08> (a org.jgroups.JChannel)

        at org.jgroups.JChannel.connect(JChannel.java:279)

        - locked <0x00002aab178a1d08> (a org.jgroups.JChannel)

 

 

-regards

Seemanto Barua

212.298.4097

 


PLEASE READ: This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please delete it and all copies from your system, destroy any hard copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Nomura Holding America Inc., Nomura Securities International, Inc, and their respective subsidiaries each reserve the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state the views of such entity. Unless otherwise stated, any pricing information in this message is indicative only, is subject to change and does not constitute an offer to deal at any price quoted. Any reference to the terms of executed transactions should be treated as preliminary only and subject to our formal written confirmation.
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] What is PING over TUNNEL for?

Documentation for TUNNEL says " Note that it is not recommended to use TCPGOSSIP for discovery if TUNNEL is used (use PING instead).", and for PING "Initial (dirty) discovery of members. Used to detect the coordinator (oldest member), by mcasting PING requests to an IP multicast address.", however as far as I understand, TUNNEL only works with explicitly specified hosts (Gossip routers).

So, I wonder what PING over TUNNEL is used for? (couldn't find any information in documentation nor mail archive) [*].

Kind regards

[*] Removing PING from the stack causes an exception in GMS. Is the only purpose of including PING to avoid this error in the expectations of GMS?
------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] Recommended architecture for heterogeneously connected networks

Hi everyone, I'm just starting in the JGroups world and I would really appreciate insights about the best architecture for my use case.
What I'm trying to accomplish is an auto-discovery, fully "inter-connected" heterogeneous network. What I'm mean by heterogeneous network is a network where:
  • Not all nodes are reachable directly by each other
  • All node are reachable _through_ other (undefined) node.
In this situation, I may have a node connected to two networks (different network IPs) with nodes being in (any of the following cases):
  1. the same two networks (reachable by both of the IPs)
  2. only one network
  3. only reachable by another node in any of the cases 1 or 2
The additional difficulty is that, in some cases, connection establishment can be one way only (e.g. because of a node being behind a NAT).

I don't (can't) know in advance the topology of the network, so any topology dependent configuration must be done in runtime. Also, nodes can go down or up at any time.

Sooo, my question is, is there any recommended configuration for this use case? 
(I have been researching and have a few ideas in mind, but I don' want to post them yet to avoid biased recommendations and to keep this mail shorter :) )

I would really appreciate your feedback!
Kind regards

------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] Monitor org.jgroups.util.Table

Question:

Any way we can monitor the Table object for each sender through probe. 
I am facing an issue where gust of messages are sent through jgroups
filling up the Table eventually and running out of memory


Best,
Pooja
------------------------------------------------------------------------------
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Picon
Picon

[jgroups-users] Memory consumption of org.jgroups.protocols.Discovery

Hello,

I recently looked at a heap dump and noticed that the field 
"ping_responses" uses a lot of memory.
My dump shows that it uses around 52 MB because the map contains about 
10000 entries. It seems that all of these org.jgroups.util.Responses 
entries contain 24 org.jgroups.protocols.PingData entries.

As far as I can see in the code[1], the only method that will remove 
entries from the map is addResponse and that is only called from the up 
method. This is especially strange because the code suggests that the 
method will remove an entry when the o.j.u.Responses object has the done 
flag set to true. Interestingly, my heap dump shows that all 
o.j.u.Responses objects have the done flag set to true, so I don't 
really understand why they are still in the map.

Can anyone help me with that? I just had my cluster up for about 3 days, 
if I don't restart it periodically, I will get out of memory errors 
sooner or later.

I am working with JGroups 3.6.4.Final within Wildfly 9.0.1.Final and the 
S3_PING protocol.

[1]

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/Discovery.java#L519 

------------------------------------------------------------------------------

Gmane