Jens Rantil | 23 Oct 08:37 2014

Empty cqlsh cells vs. null


Not sure this is a Datastax specific question to be asked elsewhere. In that case, let me know.

Anyway, I have populated a Cassandra table from DSE Hive. When I fire up cqlsh and execute a SELECT against the table I have columns of INT type that are empty. At first I thought these were null, but it turns out that cqlsh explicitly writes "null" in those cells. What can I make of this? A bug in Hive serialization to Cassandra?


Sent from Mailbox
Jeremiah Anderson | 23 Oct 02:33 2014

Cassandra Developer - Choice Hotels



I am hoping to get the word out that we are looking for a Cassandra Developer for a full time position at our office in Scottsdale, AZ. Please let me know what I can do to let folks know we are looking J


Thank you!!  



Jeremiah Anderson | Sr. Recruiter

Choice Hotels International, Inc. (NYSE: CHH) |
6811 E Mayo Blvd, Ste 100, Phoenix, AZ 85054

(602.494.6648 | *: jeremiah_anderson <at>


Jens Rantil | 22 Oct 20:14 2014

Re: Increasing size of "Batch of prepared statements"


Apologize for the late answer.

On Mon, Oct 6, 2014 at 2:38 PM, shahab <shahab.mokari <at>> wrote:
But do you mean that inserting columns with large size (let's say a text with 20-30 K) is potentially problematic in Cassandra?

AFAIK, the size _warning_ you are getting relates to the size of the batch of prepared statements (INSERT INTO mykeyspace.mytable VALUES (?,?,?,?)). That is, it has nothing to do with the actual content of your row. 20-30 K shouldn't be a problem. But it's considered good practise to split larger files (maybe > 5 MB into chunks) since it makes operations easier to your cluster more likely to spread more evenly across cluster.
What shall i do if I want columns with large size?

Just don't insert to many rows in a single batch and you should be fine. Like Shane's JIRA ticket said, the warning is to let you know you are not following best practice when adding too many rows in a single batch. It can create bottlenecks in a single Cassandra node.


Jens Rantil
Backend engineer
Tink AB

Phone: +46 708 84 18 32

Jeremy Franzen | 22 Oct 19:25 2014

Copy Error

Hey folks,

I am sure that this is a simple oversight on my part, but I just can not see the forest for the trees. Any ideas on this one?

copy strevus_data.strevus_metadata_data to 'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
Bad Request: Undefined name 0008000000000000000000000800000000000000000000100000000000000000000000000000000000 in selection clause

Jeremy J. Franzen
VP Operations | Strevus
Jeremy.franzen <at> 
T: +1.415.649.6234 | M: +1.408.726.4363
Compliance Made Easy.
... . -- .--. . .-. / ..-. ..

Donald Smith | 22 Oct 18:18 2014

Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

Question about the read path in cassandra.  If a partition/row is in the Memtable and is being actively written to by other clients,  will a READ of that partition also have to hit SStables on disk (or in the page cache)?  Or can it be serviced entirely from the Memtable?


If you select all columns (e.g., “select * from ….”)   then I can imagine that cassandra would need to merge whatever columns are in the Memtable with what’s in SStables on disk.


But if you select a single column (e.g., “select Name from ….  where id= ….”) and if that column is in the Memtable, I’d hope cassandra could skip checking the disk.  Can it do this optimization?


Thanks, Don


Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
donalds <at>


Jens Rantil | 22 Oct 16:05 2014

Cluster/node with inconsistent schema


I have a table that I dropped, recreated with two clustering primary keys (only had a single partition key before), and loaded previous data into the table.

I started noticing that a single node of mine was not able to do `ORDER BY` executions on the table (while the other nodes were). What was interesting was that `DESCRIBE TABLE mytable` showed correct PRIMARY KEY, and schema version was the same on all machines when I looked at system.peers as well as system.local.

On the failing node I was seeing exceptions such as

I restarted the failing node in the belief the maybe I would force the gossip to get into a consistent state. Now I am, instead, getting RPC timeout when trying to SELECT against the table while logs are giving me

Any input appreciated. Would you suggest I drain the node, clear all sstables (rm -fr /var/lib/cassandra/mykeyspace/mytable/*), boot up Cassandra and run a full repair?


——— Jens Rantil Backend engineer Tink AB Email: jens.rantil <at> Phone: +46 708 84 18 32 Web: Facebook Linkedin Twitter
Juho Mäkinen | 22 Oct 14:39 2014

Question on how to run incremental repairs

I'm having problems understanding how incremental repairs are supposed to be run.

If I try to do "nodetool repair -inc" cassandra will complain that "It is not possible to mix sequential repair and incremental repairs". However it seems that running "nodetool repair -inc -par" does the job, but I couldn't be sure if  this is the correct (and only?) way to run incremental repairs?

Previously I ran repairs with "nodetool repair -pr" on each node at a time, so that I could minimise the performance hit. I've understood that doing a single "nodetool repair -inc -par" command runs it on all machines in the entire cluster, so doesn't that cause a big performance penalty? Can I run incremental repairs on one node at a time?

If running "nodetool repair -inc -par" every night in a single node is fine, should I still spread them out so that each node takes a turn executing this command each night?

Last question is a bit deeper: What I've understood is that incremental repairs don't do repairs on SSTables which have already been repaired, but doesn't this mean that these repaired SSTables can't be checked towards missing or incorrect data?

Thomas Whiteway | 22 Oct 13:34 2014

Performance Issue: Keeping rows in memory



I’m working on an application using a Cassandra (2.1.0) cluster where

-          our entire dataset is around 22GB

-          each node has 48GB of memory but only a single (mechanical) hard disk

-          in normal operation we have a low level of writes and no reads

-          very occasionally we need to read rows very fast (>1.5K rows/second), and only read each row once.


When we try and read the rows it takes up to five minutes before Cassandra is able to keep up.  The problem seems to be that it takes a while to get the data into the page cache and until then Cassandra can’t retrieve the data from disk fast enough (e.g. if I drop the page cache mid-test then Cassandra slows down for the next 5 minutes).


Given that the total amount of should fit comfortably in memory I’ve been trying to find a way to keep the rows cached in memory but there doesn’t seem to be a particularly great way to achieve this.


I’ve tried enabling the row cache and pre-populating the test by querying every row before starting the load which gives good performance, but the row cache isn’t really intended to be used this way and we’d be fighting the row cache to keep the rows in (e.g. by cyclically reading through all the rows during normal operation).


Keeping the page cache warm by running a background task to keep accessing the files for the sstables would be simpler and currently this is the solution we’re leaning towards, but we have less control over the page cache, it would be vulnerable to other processes knocking Cassandra’s files out, and it generally feels like a bit of a hack. 


Has anyone had any success with trying to do something similar to this or have any suggestions for possible solutions?





Fredrik | 22 Oct 09:42 2014

Does Cassandra support running on Java 8?

Are there any official recomendations, validations/tests done with 
Cassandra >= 2.0 on Java 8?


Jimmy Lin | 21 Oct 05:24 2014

frequently update/read table and level compaction

I have a column family/ table that has frequent update on one of the column, and one column that has infrequent update. Rest of the columns never changed. Our application also read frequently on this table.

We have seen some read latency issue on this table and plan to switch to use level compaction on this table. Few questions:

In Cassandra Server JMX, there is "readcount" attribute, what is consider a read? accessing a row will consider a read count?

For JMX "read latency", does it include consistency level (fetching data from other nodes) or coordinator related work load?

From the doc, level compaction stated it will guarantee that all sstables in same level are 'non-overlapping", what does it really mean? (trying to visualize how this can reduce read latency)

If I change my select CQL query not to include the frequently changed column, will that improve read latency?

How significant of the improvement of change compaction from sized to level? is  it day and night difference? and it is true that while sized compaction latency will get worse, level compaction can give very consistent read latency for long long time?


Vishanth Balasubramaniam | 20 Oct 19:40 2014

Looking for a cassandra web interface


I am very new to cassandra. I have started cassandra inside an instance in my VM and I want to expose a cassandra web interface. What is the most stable web interface for Cassandra with a proper guide to set up?

Thanks and Regards,