Andrei Palade | 21 Mar 15:19 2014

Re: [jgroups-users] UNICAST3 number of received messages

In the SEQUENCER, I managed to get the correct number of broadcasts made by the coordinator (is given by bcasts_sent).
However, in the UNICAST3 I'm still missing few messages. This is what I get at the end of the file 50000, 49953, 14475.
The first number is the number of broadcasts made by the SEQUENCER, the second number is the number of unicast messages received in UNICAST3 and
the third number is the timestamp. Ignore the other 2 numbers.To separate the retransmissions from the actual messages,
I used a map for each sender. The UNICAST3 adds a sequence number in the header of the message and I use that number
to identify if the message has been retransmitted or not. What I'm trying to measure is the service + waiting time of messages
using the fixed sequencer protocol. This is the numbers I get when not using FC. In this case I have five sending processes each sending 10000 messages.


PS. Sorry for duplicating this message, but the first one since is quite large expects moderator's approval before being posted.
You'll be able to see the entire file in the first message.


On 21 March 2014 14:00, Andrei Palade <andrey.palade <at> googlemail.com> wrote:
In the SEQUENCER, I managed to get the correct number of broadcasts made by the coordinator (is given by bcasts_sent).
However, in the UNICAST3 I'm still missing few messages. Have a look at the attached txt file. The first number is the number
of broadcasts made by the SEQUENCER, the second number is the number of unicast messages received in UNICAST3 and
the third number is the timestamp. Ignore the other 2 numbers.To separate the retransmissions from the actual messages,
I used a map for each sender. The UNICAST3 adds a sequence number in the header of the message and I use that number
to identify if the message has been retransmitted or not. What I'm trying to measure is the service + waiting time of messages
using the fixed sequencer protocol. If you look at the attached file you will notice that there are few unicast messages missing.
This is the numbers I get when not using FC. In this case I have five sending processes each sending 10000 messages.




--
Andrei
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Bela Ban | 19 Mar 18:23 2014
Picon

[jgroups-users] UUID subclasses that have been deprecated and will be removed

I'ce deprecated and will remove most of the following classes below. 
Please let me know if you happen to use one of them.

- CanBeSiteMaster
- CanBeSiteMasterTopology
- PayloadUUID
- AdditionalDataUUID
- TopologyUUID (I think Infinispan uses it ?)

All of these classes can be replaced by ExtendedUUID which carries a 
hashmap, so arbitrary data can be added (note that this class has not 
yet been committed to master)

- SiteMaster
- SiteUUID

These classes might stay as helper classes, but their implementation 
will use ExtendedUUID.

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
Andrei Palade | 18 Mar 00:52 2014

[jgroups-users] UNICAST3 number of received messages

In UNICAST3, how do identify the number of received messages without
considering the number of retransmissions? For example if the receiving
process receives M1, M2, M3 I would like to know that when I receive M3 I
have received 3 messages. I don't really care that M1 or M2 have been
retransmitted 1 million times.


--
Andrei
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Karim Ammous | 17 Mar 11:55 2014
Picon

[jgroups-users] MERGE3 : Wrong number of subgroups

Hi Bela,

Using JGroups 3.4.2, a split occurred and a merge was processed successfully but number of subgroups is wrong (75 instead of 2).

Here is an extract of MergeView :

Incoming-16,cluster,serv-00832113-39940:host:172.29.190.141:3:CL(GROUP01)[F] | MergeView view ID = [serv-00831835-64655:host:172.29.190.131:3:CL(GROUP01)[F]|35]
74 subgroups =  
[serv-00832171-19370:host:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832210-45460:host:172.29.190.164:3:CL(GROUP01)[F]]
[serv-00832171-19370:host:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802144-53027:host:172.29.190.152:3:CL(GROUP01)[F]]
[serv-00832171-19370:host:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802285-33969:host:172.29.190.202:3:CL(GROUP01)[F]]
[serv-00832171-19370:host:172.29.190.175:3:CL(GROUP01)[F]|34] (9) 
[serv-00832171-19370:host:172.29.190.175:3:CL(GROUP01)[F]
serv-00802287-39465:host:172.29.190.147:3:CL(GROUP01)[F]
serv-00832253-57473:host:172.29.190.137:3:CL(GROUP01)[F]
serv-00832120-46459:host:172.29.190.185:3:CL(GROUP01)[F]
...
...


Number of subgroups returned by instruction :
((MergeView)view).getSubgroups().size()

Please find enclosed the entire MergeView log file.

Best regards,

--
Karim AMMOUS
INFO Mar 17 05:19:38,277 |
Incoming-16,cluster,serv-00832113-39940:turret:172.29.190.141:3:CL(GROUP01)[F] | MergeView
view ID = [serv-00831835-64655:turret:172.29.190.131:3:CL(GROUP01)[F]|35]
74 subgroups =  
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832210-45460:turret:172.29.190.164:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802144-53027:turret:172.29.190.152:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802285-33969:turret:172.29.190.202:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|34] (9) 
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]
serv-00802287-39465:turret:172.29.190.147:3:CL(GROUP01)[F]
serv-00832253-57473:turret:172.29.190.137:3:CL(GROUP01)[F]
serv-00832120-46459:turret:172.29.190.185:3:CL(GROUP01)[F]
serv-00832065-27735:turret:172.29.190.158:3:CL(GROUP01)[F]
serv-00832104-58948:turret:172.29.190.197:3:CL(GROUP01)[F]
serv-00831918-55816:turret:172.29.190.199:3:CL(GROUP01)[F]
serv-00831853-62993:turret:172.29.190.159:3:CL(GROUP01)[F]
serv-00832143-55757:turret:172.29.190.198:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832250-26647:turret:172.29.190.163:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001612-61852:ces:172.29.190.240:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832127-49058:turret:172.29.190.134:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832072-34271:turret:172.29.190.192:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832145-50839:turret:172.29.190.136:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832113-39940:turret:172.29.190.141:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832218-56813:turret:172.29.190.203:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832089-18356:turret:172.29.190.154:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831911-63031:turret:172.29.190.204:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832194-49414:turret:172.29.190.146:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832249-12635:turret:172.29.190.191:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832132-16617:turret:172.29.190.132:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832102-15495:turret:172.29.190.173:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832071-12153:turret:172.29.190.179:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832260-26680:turret:172.29.190.183:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831757-64951:turret:172.29.190.196:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001681-37173:maa:172.29.190.239:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [term-ETE105100007-52231:turret:172.29.190.210:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [term-138985039-63401:turret:172.29.190.211:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832213-49678:turret:172.29.190.165:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [term-ETE105100006-15551:turret:172.29.190.212:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001612-62810:mdc:172.29.190.240:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832183-13274:turret:172.29.190.180:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832214-6339:turret:172.29.190.172:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832146-190:turret:172.29.190.130:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832199-11853:turret:172.29.190.188:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802473-55034:turret:172.29.190.193:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00631011-15141:turret:172.29.190.176:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802282-11228:turret:172.29.190.166:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [term-ETE105000047-46868:turret:172.29.190.213:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832263-2882:turret:172.29.190.139:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831998-60610:turret:172.29.190.145:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832056-39561:turret:172.29.190.161:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831794-55225:turret:172.29.190.156:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832182-2054:turret:172.29.190.148:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831782-65244:turret:172.29.190.160:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ETJ104516289-44868:turret:172.29.190.155:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832220-54368:turret:172.29.190.168:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001681-45989:mdc:172.29.190.239:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001612-18889:maa:172.29.190.240:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832219-63343:turret:172.29.190.133:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-ZM137S001681-47662:ces:172.29.190.239:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802300-17112:turret:172.29.190.157:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832177-30838:turret:172.29.190.184:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832154-31091:turret:172.29.190.151:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832090-29280:turret:172.29.190.162:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831771-56192:turret:172.29.190.171:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832251-5542:turret:172.29.190.174:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832215-7369:turret:172.29.190.143:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832246-63833:turret:172.29.190.150:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832070-15072:turret:172.29.190.189:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831866-21239:turret:172.29.190.140:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832164-23207:turret:172.29.190.201:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832197-1970:turret:172.29.190.182:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831835-64655:turret:172.29.190.131:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00542177-62445:turret:172.29.190.177:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832130-57782:turret:172.29.190.129:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831768-37756:turret:172.29.190.149:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832178-55918:turret:172.29.190.195:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832166-22590:turret:172.29.190.138:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832091-64821:turret:172.29.190.181:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831943-56550:turret:172.29.190.135:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832243-21232:turret:172.29.190.169:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00888453-61083:turret:172.29.190.200:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831841-32187:turret:172.29.190.167:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00802289-30213:turret:172.29.190.153:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832139-47243:turret:172.29.190.144:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832151-35016:turret:172.29.190.194:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00832115-6982:turret:172.29.190.187:3:CL(GROUP01)[F]]
[serv-00832171-19370:turret:172.29.190.175:3:CL(GROUP01)[F]|33] (1) [serv-00831800-52174:turret:172.29.190.178:3:CL(GROUP01)[F]]]
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Bela Ban | 14 Mar 14:28 2014
Picon

[jgroups-users] JGroups 3.5.0.Beta1

I just released 3.5.0.Beta1 to Nexus and SourceForge.

The most important change from Alpha1 is [1], which does the following:

#1 A sender P sending a multicast message M loops back M up the stack 
and discards M at the transport level when received. This saves us 1 
thread (from a pool) and 1 de-serialization step as M is discarded as 
early as possible. Previously, M was discarded *after* de-serialization.

#2 P also loops back unicast message M with dest=P. Previously, with 
loopback=false, M would get serialized, sent, received and 
de-serialized. These steps are now skipped, saving us a serialization, 
sending, de-serialization step and 1 thread used for de-serialization.

I also got rid of the copy made on looping back a mesage, as this is not 
needed.

It would be great to get feedback on this, especially its impact on 
performance on different systems !
Cheers,

[1] https://issues.jboss.org/browse/JGRP-1765

--

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
cwjcsu | 12 Mar 07:38 2014
Picon

[jgroups-users] JGroups join happens minutes later than discovery

I'm using jgroups 3.4.2 TCP-stack,9 nodes,with TCPPING.initialHosts set to a
static list.

If i start the program at the same time on each node,some nodes join the
cluster minutes later than when they are socket-connected,and
MembershipListener.viewAccepted with MergeView fired on them,in my program,i
consider it as a brain-split,such as (9,7,2,8,3,5),(1),(4),(6),that's
bad.The network is always OK.

through the stacktrace,i can see lots of discovery messages exchanged
between the later nodes and the previous joined nodes.

if i start the nodes one by none with a few seconds delay,ALL good.

How can i make sure all nodes joined the cluster within 10 seconds when they
are started at the same time?

some paramters:

TCP.bind_addr:ip address of each node

TCP.bind_port=3388

TCPPING.portRange=0

GMS.max_join_attempts=2

GMS.join_timeout=5000

others are default

when i increase the TCPPING.timeout to 10s and GMS.join_timeout to 13s,the
startup time of JChannel increased to 20 ~30s too,but less MergeView
accepted.

Why can't 9 nodes join the cluster within 10 seconds ?

--
View this message in context: http://jgroups.1086181.n5.nabble.com/JGroups-join-happens-minutes-later-than-discovery-tp10113.html
Sent from the JGroups - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
Scott Prater | 5 Mar 16:34 2014
Picon

Re: [jgroups-users] Problem adding more than two nodes to a TCP cluster

Thanks, Bela. I did drop the firewall, as one of the first troubleshooting steps; unfortunately, that
hasn't made any difference. I'll try the return_entire_cache setting, just to see what happens.

FWIW, I'm running this app on a cluster of VMWare virtual machines running RHEL6, managed by the EXSi
hypervisor. I have seen earlier posts on this list indicating that others may have had similar
problemsĀ in similar environments. Does this ring any bells for anyone? Are there any gotchas you (or
anyone) is aware of, running a JGroups cluster on a suite of VMware virtual machines installed with RHEL6?

-- Scott

On 03/05/14, Bela Ban wrote:
> Hi Scott,
> 
> first of all, if you can use IP multicasting, I recommend UDP over TCP, 
> especially for large clusters.
> 
> If there's a firewall, diable it to see if this solves the problem. If 
> so, you can always punch holes in it.
> 
> Your config looks ok. The warning below occurs when the cache of a node 
> is missing the physical-to-logical address mapping for a node P. It then 
> asks for it just-in-time, so the next message send should succeed.
> 
> You could try setting TCPPING.return_entire_cache to true, but a config 
> like yours is widely used, and usually works, so I suspect a 
> network/firewall issue...
> 
> On 05/03/14 00:14, Scott Prater wrote:
> > I'm a new JGroups user, trying to get an existing application, built on
> > top of Infinispan, working in a clustered (replication) configuration.
> > I have five nodes, all on VMs in a private network. I can start the
> > first node up okay, and then the second node, but when I try to start
> > the third node, fourth node, fifth node, these later nodes report this
> > warning:
> >
> > WARN 16:33:37.326 (TCP) JGRP000032: Node3-5858: no physical address for
> > 41c900b2-0a8c-6873-1d3b-0594577d57cd, dropping message
> >
> > (substitute Node3 for Node4, etc.)
> >
> > After several of these, Infinispan shuts down with this exception:
> >
> > Caused by: org.infinispan.CacheException: Initial state transfer timed
> > out for cache MyRepository on Node3-5858
> > at
> > org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:221)
> >
> > and my application shuts down.
> >
> > Here's the config I am using for all the nodes:
> >
> > https://gist.github.com/sprater/9357672
> >
> > (I substitute "MyNodeAddress" for the node's hostname in each node's
> > config file.)
> >
> > I've tried using both UDP and TCP for discovery, with the same results,
> > leading me to think it might be a lower-level problem. I finally fell
> > back to the sample TCP configuration shipped in the jgroups jar.
> >
> > It doesn't matter it what order I start the individual nodes; the first
> > two see each other just fine, but the rest fail.
> >
> > I hope I'm just missing something simple. Any help would be greatly
> > appreciated.
> >
> > -- Scott
> >
> 
> -- 
> Bela Ban, JGroups lead (http://www.jgroups.org)
> 
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works. 
> Faster operations. Version large binaries. Built-in WAN optimization and the
> freedom to use Git, Perforce or both. Make the move to Perforce.
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> javagroups-users mailing list
> javagroups-users <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/javagroups-users

--
--

-- 
Scott Prater
Shared Development Group
General Library System
University of Wisconsin - Madison

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Scott Prater | 5 Mar 00:14 2014
Picon

[jgroups-users] Problem adding more than two nodes to a TCP cluster

I'm a new JGroups user, trying to get an existing application, built on 
top of Infinispan, working in a clustered (replication) configuration. 
I have five nodes, all on VMs in a private network.  I can start the 
first node up okay, and then the second node, but when I try to start 
the third node, fourth node, fifth node, these later nodes report this 
warning:

WARN 16:33:37.326 (TCP) JGRP000032: Node3-5858: no physical address for 
41c900b2-0a8c-6873-1d3b-0594577d57cd, dropping message

(substitute Node3 for Node4, etc.)

After several of these, Infinispan shuts down with this exception:

Caused by: org.infinispan.CacheException: Initial state transfer timed 
out for cache MyRepository on Node3-5858
         at 
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:221)

and my application shuts down.

Here's the config I am using for all the nodes:

https://gist.github.com/sprater/9357672

(I substitute "MyNodeAddress" for the node's hostname in each node's 
config file.)

I've tried using both UDP and TCP for discovery, with the same results, 
leading me to think it might be a lower-level problem.  I finally fell 
back to the sample TCP configuration shipped in the jgroups jar.

It doesn't matter it what order I start the individual nodes;  the first 
two see each other just fine, but the rest fail.

I hope I'm just missing something simple.  Any help would be greatly 
appreciated.

-- Scott

--

-- 
Scott Prater
Shared Development Group
General Library System
University of Wisconsin - Madison

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
Bram Klein Gunnewiek | 27 Feb 13:46 2014
Picon

[jgroups-users] Cant recover from ifdown; ifup when node uses a linux bridge

We use jgroups on a set-up where all nodes use network cards that are 
configured as a bridge instead of a normal network card. We use Ubuntu 
as our OS, a typical Ubuntu config for a nic is this:

auto eth0
iface eth0 inet dhcp

However, we use eth0 as a bridge and ours looks like this:

auto br0
iface br0 inet dhcp
     bridge_ports eth0
     bridge_stp off
     bridge_fd 0

The problem we have is that our application (using JGroups 3.4.1.Final) 
can't recover when when put the bridge down and up again (ifdown 
br0;ifup br0). JGroups does not reestablish the connection to the 
cluster and the node never gets merged back into it. When we do the 
exact same thing on the node with our network devices configured 
normally JGroups does recover. Since JGroups seems to "eat" the 
exception we also have no way of knowing that JGroups is in some sort of 
zombie state and needs to reconnect.

Here is a post about someone with the same thing: 
http://sourceforge.net/p/javagroups/mailman/message/9724641/

How do we need to handle these situations? Is this behavior caused by 
linux drivers? I don't really understand why having the interfaces 
configured as a bridge causes different behavior. The output we get 
after doing ifdown/ifup on the bridge looks like this (and continues 
until you stop the application):

13:03:48.978 [TransferQueueBundler,test,bram-ubuntuvm-2-10803] ERROR 
org.jgroups.protocols.UDP - JGRP000029: bram-ubuntuvm-2-10803: failed 
sending message to cluster (50 bytes): java.lang.Exception: 
dest=/ff0e:0:0:0:0:8:8:8:7600 (53 bytes), headers: NAKACK2: [MSG, 
seqno=47], UDP: [channel_name=test]

(I don't know why the destination is an IPv6 address. The address 
JGroups was previously bound on whas an internal IPv4 address)

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
sergez | 27 Feb 11:25 2014
Picon

[jgroups-users] JGroups communication in LAN

Hi! I've met some problem.

I can launch example from the Tutorial and it is possible to test out how a
bunch of node do their communication.

But when I try to launch SimpleChat on two separate over the LAN machines -
then nothing happens?

So, the question is: how to program nodes separated over the LAN ? Is it
suppose to run over the LAN without any changes being made to SimpleChat
example in the Tutorial or not ? If second, then it is not obvious from the
Manual how to program such things.

Thank !

--
View this message in context: http://jgroups.1086181.n5.nabble.com/JGroups-communication-in-LAN-tp10087.html
Sent from the JGroups - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
pooja khambhayata | 24 Feb 18:17 2014
Picon

[jgroups-users] Configure TCP.sock_conn_timeout programatically

while I was looking into the properties configuration, I realized there is no setter for sock_conn_timeout. I tried to find the setter both from TP and TCP. Is it not possible to set the sock_conn_timeout programatically? If any one has tried this before by other means pls let me know 

Best,
Pooja
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
javagroups-users mailing list
javagroups-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/javagroups-users

Gmane