Aravindan T | 23 Oct 15:38 2014

Cassandra Node Commissioning



We are facing several problems during commissioning of new nodes to the existing cluster. The existing cluster(5 nodes) is holding data of 13 TB and daily 0.1 TB of data will be loaded.Ten days back,we started adding 5 nodes. In the middle of the commissioning process, the bootstrap process is getting failed many times, displaying an error STREAM FAILED in the new node and BROKEN PIPE in the old node. Whenever we face this problem, we are restarting the new node. Heare are the few questions about the node joining process.


1) Whenever we are restarting the node to join, will the bootstrap process resume or restart from the begining? If so, should we wipe off the data directories and do a fresh restart.

2) How much time it might take to complete the node joining process.( Network bandwidth :1Gbps) ?

3) Can we add a node directly by setting auto_bootstrap to false and run a nodetool repair in that node?

4) How to monitor the percentage of load balancing in each node?

5) Can we increase the streaming speed by using nodetool setstreamthroughput property? What is the difference between stream_throughput_outbound_megabits_per_sec property in cassandra.yaml file and nodetool setsreamthroughput property?

6) Can we scp some data from old nodes to new nodes and do a restart?

Aravindan Thangavelu

Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Alain RODRIGUEZ | 23 Oct 11:18 2014

Operating on large cluster


I was wondering about how do you guys handle a large cluster (50+ machines).

I mean there is sometime you need to change configuration (cassandra.yaml) or send a command to one, some or all nodes (cleanup, upgradesstables, setstramthoughput or whatever).

So far we have been using things like custom scripts for repairs or any routine maintenance and cssh for specific and one shot actions on the cluster. But I guess this doesn't really scale, I guess we coul use pssh instead. For configuration changes we use Capistrano that might scale properly.

So I would like to known, what are the methods that operators use on large cluster out there ? Have some of you built some open sourced "cluster management" interfaces or scripts that could make things easier while operating on large Cassandra clusters ?

Alain RODRIGUEZ | 23 Oct 11:16 2014

Multi Datacenter / MultiRegion on AWS Best practice ?


We are currently wondering about the best way to configure network architecture to have a Cassandra cluster multi DC.

Reading previous messages on this mailing list, I see 2 main ways to do this:

1 - 2 private VPC, joined by a VPN tunnel linking 2 regions. C* using EC2Snitch (or PropertyFileSnitch) and private IPs.
2 - 2 public VPC. C* using EC2MultiRegionSnitch (and so public IPs for seeds and broadcast, private for listen address).

On solution one we are not confident on VPN tunnel about stability and performances, the rest should work just fine.

On solution 2, we would need to open IPs one by one on 3 ports (7000, 9042, 9160) at least. 100 entries in a security group would allow us to have a maximum of ~30 nodes. An other issuer is that a ring describe (using astyanax let's say) would also give to clients public IPs, our clients which are also inside the VPC, would have to go to the internet before coming back to VPC, creating unnecessary latencies.

What are your advices regarding best practices for a multiDC (cross region) inside AWS cloud ? 

And by the way, how to configure Astyanax when using EC2MultiRegionSnitch (and public IP for broadcasting) to use private IPs instead of public ones ?

Jens Rantil | 23 Oct 08:37 2014

Empty cqlsh cells vs. null


Not sure this is a Datastax specific question to be asked elsewhere. In that case, let me know.

Anyway, I have populated a Cassandra table from DSE Hive. When I fire up cqlsh and execute a SELECT against the table I have columns of INT type that are empty. At first I thought these were null, but it turns out that cqlsh explicitly writes "null" in those cells. What can I make of this? A bug in Hive serialization to Cassandra?


Sent from Mailbox
Jeremiah Anderson | 23 Oct 02:33 2014

Cassandra Developer - Choice Hotels



I am hoping to get the word out that we are looking for a Cassandra Developer for a full time position at our office in Scottsdale, AZ. Please let me know what I can do to let folks know we are looking J


Thank you!!  



Jeremiah Anderson | Sr. Recruiter

Choice Hotels International, Inc. (NYSE: CHH) |
6811 E Mayo Blvd, Ste 100, Phoenix, AZ 85054

(602.494.6648 | *: jeremiah_anderson <at>


Jens Rantil | 22 Oct 20:14 2014

Re: Increasing size of "Batch of prepared statements"


Apologize for the late answer.

On Mon, Oct 6, 2014 at 2:38 PM, shahab <shahab.mokari <at>> wrote:
But do you mean that inserting columns with large size (let's say a text with 20-30 K) is potentially problematic in Cassandra?

AFAIK, the size _warning_ you are getting relates to the size of the batch of prepared statements (INSERT INTO mykeyspace.mytable VALUES (?,?,?,?)). That is, it has nothing to do with the actual content of your row. 20-30 K shouldn't be a problem. But it's considered good practise to split larger files (maybe > 5 MB into chunks) since it makes operations easier to your cluster more likely to spread more evenly across cluster.
What shall i do if I want columns with large size?

Just don't insert to many rows in a single batch and you should be fine. Like Shane's JIRA ticket said, the warning is to let you know you are not following best practice when adding too many rows in a single batch. It can create bottlenecks in a single Cassandra node.


Jens Rantil
Backend engineer
Tink AB

Phone: +46 708 84 18 32

Jeremy Franzen | 22 Oct 19:25 2014

Copy Error

Hey folks,

I am sure that this is a simple oversight on my part, but I just can not see the forest for the trees. Any ideas on this one?

copy strevus_data.strevus_metadata_data to 'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
Bad Request: Undefined name 0008000000000000000000000800000000000000000000100000000000000000000000000000000000 in selection clause

Jeremy J. Franzen
VP Operations | Strevus
Jeremy.franzen <at> 
T: +1.415.649.6234 | M: +1.408.726.4363
Compliance Made Easy.
... . -- .--. . .-. / ..-. ..

Donald Smith | 22 Oct 18:18 2014

Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

Question about the read path in cassandra.  If a partition/row is in the Memtable and is being actively written to by other clients,  will a READ of that partition also have to hit SStables on disk (or in the page cache)?  Or can it be serviced entirely from the Memtable?


If you select all columns (e.g., “select * from ….”)   then I can imagine that cassandra would need to merge whatever columns are in the Memtable with what’s in SStables on disk.


But if you select a single column (e.g., “select Name from ….  where id= ….”) and if that column is in the Memtable, I’d hope cassandra could skip checking the disk.  Can it do this optimization?


Thanks, Don


Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
donalds <at>


Jens Rantil | 22 Oct 16:05 2014

Cluster/node with inconsistent schema


I have a table that I dropped, recreated with two clustering primary keys (only had a single partition key before), and loaded previous data into the table.

I started noticing that a single node of mine was not able to do `ORDER BY` executions on the table (while the other nodes were). What was interesting was that `DESCRIBE TABLE mytable` showed correct PRIMARY KEY, and schema version was the same on all machines when I looked at system.peers as well as system.local.

On the failing node I was seeing exceptions such as

I restarted the failing node in the belief the maybe I would force the gossip to get into a consistent state. Now I am, instead, getting RPC timeout when trying to SELECT against the table while logs are giving me

Any input appreciated. Would you suggest I drain the node, clear all sstables (rm -fr /var/lib/cassandra/mykeyspace/mytable/*), boot up Cassandra and run a full repair?


——— Jens Rantil Backend engineer Tink AB Email: jens.rantil <at> Phone: +46 708 84 18 32 Web: Facebook Linkedin Twitter
Juho Mäkinen | 22 Oct 14:39 2014

Question on how to run incremental repairs

I'm having problems understanding how incremental repairs are supposed to be run.

If I try to do "nodetool repair -inc" cassandra will complain that "It is not possible to mix sequential repair and incremental repairs". However it seems that running "nodetool repair -inc -par" does the job, but I couldn't be sure if  this is the correct (and only?) way to run incremental repairs?

Previously I ran repairs with "nodetool repair -pr" on each node at a time, so that I could minimise the performance hit. I've understood that doing a single "nodetool repair -inc -par" command runs it on all machines in the entire cluster, so doesn't that cause a big performance penalty? Can I run incremental repairs on one node at a time?

If running "nodetool repair -inc -par" every night in a single node is fine, should I still spread them out so that each node takes a turn executing this command each night?

Last question is a bit deeper: What I've understood is that incremental repairs don't do repairs on SSTables which have already been repaired, but doesn't this mean that these repaired SSTables can't be checked towards missing or incorrect data?

Thomas Whiteway | 22 Oct 13:34 2014

Performance Issue: Keeping rows in memory



I’m working on an application using a Cassandra (2.1.0) cluster where

-          our entire dataset is around 22GB

-          each node has 48GB of memory but only a single (mechanical) hard disk

-          in normal operation we have a low level of writes and no reads

-          very occasionally we need to read rows very fast (>1.5K rows/second), and only read each row once.


When we try and read the rows it takes up to five minutes before Cassandra is able to keep up.  The problem seems to be that it takes a while to get the data into the page cache and until then Cassandra can’t retrieve the data from disk fast enough (e.g. if I drop the page cache mid-test then Cassandra slows down for the next 5 minutes).


Given that the total amount of should fit comfortably in memory I’ve been trying to find a way to keep the rows cached in memory but there doesn’t seem to be a particularly great way to achieve this.


I’ve tried enabling the row cache and pre-populating the test by querying every row before starting the load which gives good performance, but the row cache isn’t really intended to be used this way and we’d be fighting the row cache to keep the rows in (e.g. by cyclically reading through all the rows during normal operation).


Keeping the page cache warm by running a background task to keep accessing the files for the sstables would be simpler and currently this is the solution we’re leaning towards, but we have less control over the page cache, it would be vulnerable to other processes knocking Cassandra’s files out, and it generally feels like a bit of a hack. 


Has anyone had any success with trying to do something similar to this or have any suggestions for possible solutions?