Bill Hastings | 3 Jul 16:56 2015
Picon

Bootstrap code

Hi All

Can someone please point me to where the code for bootstrapping a new node exists?

--
Cheers
Bill
John Wong | 3 Jul 04:37 2015
Picon

What are problems with schema disagreement

Hi.

Here is a schema disagreement we encountered.
Schema versions:
        b6467059-5897-3cc1-9ee2-73f31841b0b0: [10.0.1.100, 10.0.1.109]
        c8971b2d-0949-3584-aa87-0050a4149bbd: [10.0.1.55, 10.0.1.16, 10.0.1.77]
        c733920b-2a31-30f0-bca1-45a8c9130a2c: [10.0.1.221]

We deployed an application which would send a schema update (DDL=auto). We found this prod cluster had 3 schema difference. Other existing applications were fine, so some people were curious what if we left this problem alone until off hours.

Is there any concerns with not resolve schema disagreement right away? FWIW we went ahead and restarted 221 first, and continue with the rest of the minors.

Thanks.

John

KZ Win | 3 Jul 02:01 2015

joining a node caused loads on some existing nodes to skyrocket

We had six node clusters and when we attempted to join a node to this, cpu load on two gradually climbed to abnormally high number.   Stopping the join and shutting down cassandra on two high-load nodes restored the cluster health (we have RF=3)

Anyone have any insight on this cassandra behavior?  We have done node join many times before; most recent was just 4 days before.  The

The following unusual messages in the relevant time period for two nodes.  We are using cassandra 2.0.10


Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR [GossipStage:1] CassandraDaemon.java (line 199) Exception in thread Thread[GossipStage:1,5,main]

Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException 

Jun 30 16:47:30 cass-22.pelotime.com     at org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301)

Jun 30 16:47:30 cass-22.pelotime.com     at org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)

Jun 30 16:47:30 cass-22.pelotime.com     at org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)

Jun 30 16:47:30 cass-22.pelotime.com     at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)

Jun 30 16:47:30 cass-22.pelotime.com     at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

Jun 30 16:47:30 cass-22.pelotime.com     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

Jun 30 16:47:30 cass-22.pelotime.com     at java.lang.Thread.run(Unknown Source)

Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO [GossipStage:2] Gossiper.java (line 910) Node /10.0.251.77 is now part of the cluster

Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with /10.0.251.77

Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [RequestResponseStage:138] Gossiper.java (line 876) InetAddress /10.0.251.77 is now UP

Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO [GossipStage:2] Gossiper.java (line 890) InetAddress /10.0.251.77 is now DOWN

Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with /10.0.251.77

Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [GossipTasks:1] Gossiper.java (line 658) FatClient /10.0.251.77 has been silent for 30000ms, removing from gossip

Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve




Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with /10.0.251.77

Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [RequestResponseStage:26] Gossiper.java (line 876) InetAddress /10.0.251.77 is now UP

Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with /10.0.251.77

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/10.0.251.77] StreamSession.java (line 454) [Stream #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/10.0.251.77] StreamSession.java (line 454) [Stream #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.SocketDispatcher.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.SocketChannelImpl.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/10.0.251.77] StreamSession.java (line 454) [Stream #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.SocketDispatcher.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.SocketChannelImpl.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/10.0.251.77] StreamSession.java (line 454) [Stream #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.FileDispatcherImpl.write0(Native Method)



Jason Wee | 2 Jul 15:42 2015
Picon

Re: Experiencing Timeouts on one node

you should check the network connectivity for this node and also its system average load. is that typo or literary what it is, cassandra 1.2.15.*1* and java 6 update *85* ?



On Thu, Jul 2, 2015 at 12:59 AM, Shashi Yachavaram <shashi007 <at> gmail.com> wrote:
We have a 28 node cluster, out of which only one node is experiencing timeouts. 
We thought it was the raid, but there are two other nodes on the same raid without 
any problem. Also The problem goes away if we reboot the node, and then reappears 
after seven  days. The following hinted hand-off timeouts are seen on the node 
experiencing the timeouts. Also we did not notice any gossip errors.

I was wondering if anyone has seen this issue and how they resolved it.

Cassandra Version: 1.2.15.1
OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
java version "1.6.0_85"

------------------------------------------------------------------------------------------------------------------------------------
INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java (line 296) Started hinted handoff for host: 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
 INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java (line 296) Started hinted handoff for host: bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.122; aborting (0 delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java (line 296) Started hinted handoff for host: f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.119; aborting (0 delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java (line 296) Started hinted handoff for host: ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.108; aborting (0 delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java (line 296) Started hinted handoff for host: 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.104; aborting (0 delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java (line 296) Started hinted handoff for host: cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
------------------------------------------------------------------------------------------------------------------------------------

Thanks
-shashi..

Serega Sheypak | 2 Jul 09:59 2015
Picon

Running java-driver in parallel cassandra-driver-core 2.1.5, multithreading wokrs extremely slow.

Hi, I have weird driver behaviour. Can you help me please to find the problem?
Problem: I try to insert data using 10 threads.
I see that 10 thread starts, they start to insert some data and then they hung. It takes enormous amount of time to insert (seconds for 1K inserts). It runs 1K per second if I use single thread to insert.
amit tewari | 2 Jul 08:32 2015
Picon

C*/Nodejs REST API integration

Hi

I am using REST API to insert data into C* using below reference link.


However the insert performance is extremely lagging.

Can anyone has experience with Nodejs driver for cassandra and using Async for better performance while inserting data through Nodejs REST API?

Thanks
Amit
Robert Wille | 1 Jul 23:52 2015

Truncate really slow

I have two test clusters, both 2.0.15. One has a single node and one has three nodes. Truncate on the three
node cluster is really slow, but is quite fast on the single-node cluster. My test cases truncate tables
before each test, and > 95% of the time in my test cases is spent truncating tables on the 3-node cluster.
Auto-snapshotting is off. 

I know there’s some coordination that has to occur when a truncate happens, but it seems really
excessive. Almost one second to truncate each table with an otherwise idle cluster.

Any thoughts?

Thanks in advance

Robert

Kevin Burton | 1 Jul 23:22 2015

Lots of write timeouts and missing data during decomission/bootstrap

We get lots of write timeouts when we decommission a node.  About 80% of them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for that matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is happening.

--

Founder/CEO Spinn3r.com
Location: San Francisco, CA
… or check out my Google+ profile

Shashi Yachavaram | 1 Jul 18:59 2015
Picon

Experiencing Timeouts on one node

We have a 28 node cluster, out of which only one node is experiencing timeouts. 
We thought it was the raid, but there are two other nodes on the same raid without 
any problem. Also The problem goes away if we reboot the node, and then reappears 
after seven  days. The following hinted hand-off timeouts are seen on the node 
experiencing the timeouts. Also we did not notice any gossip errors.

I was wondering if anyone has seen this issue and how they resolved it.

Cassandra Version: 1.2.15.1
OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
java version "1.6.0_85"

------------------------------------------------------------------------------------------------------------------------------------
INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java (line 296) Started hinted handoff for host: 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
 INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java (line 296) Started hinted handoff for host: bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.122; aborting (0 delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java (line 296) Started hinted handoff for host: f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.119; aborting (0 delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java (line 296) Started hinted handoff for host: ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.108; aborting (0 delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java (line 296) Started hinted handoff for host: 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java (line 422) Timed out replaying hints to /192.168.1.104; aborting (0 delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java (line 296) Started hinted handoff for host: cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
------------------------------------------------------------------------------------------------------------------------------------

Thanks
-shashi..
Jayapandian Ponraj | 1 Jul 16:02 2015
Picon

High load on cassandra node

HI I have a 6 node cluster and I ran a major compaction on node 1 but
I found that the load reached very high levels on node 2. Is this
explainable?

Attaching tpstats and metrics:

cassandra-2 ~]$ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked
 All time blocked
MutationStage                     0         0      185152938         0
                0
ReadStage                         0         0        1111490         0
                0
RequestResponseStage              0         0      168660091         0
                0
ReadRepairStage                   0         0          21247         0
                0
ReplicateOnWriteStage            32      6186       88699535         0
             7163
MiscStage                         0         0              0         0
                0
HintedHandoff                     0         1           1090         0
                0
FlushWriter                       0         0           2059         0
               13
MemoryMeter                       0         0           3922         0
                0
GossipStage                       0         0        2246873         0
                0
CacheCleanupExecutor              0         0              0         0
                0
InternalResponseStage             0         0              0         0
                0
CompactionExecutor                0         0          12353         0
                0
ValidationExecutor                0         0              0         0
                0
MigrationStage                    0         0              1         0
                0
commitlog_archiver                0         0              0         0
                0
AntiEntropyStage                  0         0              0         0
                0
PendingRangeCalculator            0         0             16         0
                0
MemtablePostFlusher               0         0          10932         0
                0

Message type           Dropped
READ                     49051
RANGE_SLICE                  0
_TRACE                       0
MUTATION                   269
COUNTER_MUTATION           185
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

Also I saw that the nativetransportreqests was 23 in active and 23 in
pending, I found this in opscenter.

Any settings i can make to keep the load under control?
Appreciate any help.. Thanks

Amlan Roy | 1 Jul 12:45 2015

Seed gossip version error

Hi,

I have a running cluster running with version 2.1.7. Two of the machines went down and they are not joining the cluster even after restart. I see the following WARN message in system.log in all the nodes:
system.log:WARN  [MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32] 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip version is -2147483648; will not connect with that version

Please let me know if you have faced the same problem.

Regards,
Amlan



Gmane