Andy Stec | 21 Nov 23:17 2014

Partial row read

We're getting strange results when reading from Cassandra 2.0 in php using this driver:

Here's the schema:

  day text,
  last_event text,
  event_text text,
  mdn text,
  PRIMARY KEY ((day), last_event)
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Here's the database contents:

cqlsh:msa> SELECT * FROM events;
 day    | last_event          | event_text | mdn
 141121 | 2014-46-21 20:46:45 | event text | 8471111111
(1 rows)

Here's the simple program in php that reads the database:

$dsn = "cassandra:host=;port=9160";
$db = new PDO ($dsn);
$db->exec ("USE msa");
$stmt = $db->query ("SELECT day, last_event, mdn, event_text FROM events");
if ($stmt === false)
   var_dump ($db->errorInfo ());
   $stmt->execute ();
   var_dump ($stmt->fetchAll());

And this is the output the program produces.  Why is it not returning the full row? 

array(1) {
  array(6) {
    string(6) "141121"
    string(6) "141121"
    string(0) ""
    string(0) ""
    string(0) ""
    string(0) ""

Chris Hornung | 21 Nov 18:44 2014

bootstrapping node stuck in JOINING state


I have been bootstrapping 4 new nodes into an existing production cluster. Each node was bootstrapped one at a time, the first 2 completing without errors, but ran into issues with the 3rd one. The 4th node has not been started yet.

On bootstrapping the third node, the data steaming sessions completed without issue, but bootstrapping did not finish. The node is stuck in JOINING state even 19 hours or so after data streaming completed.

Other reports of this issue seem to be related either to network connectivity issues between nodes, or multiple nodes bootstrapping simultaneously. I haven't found any evidence of either of these situations, no errors or stracktraces in the logs.

I'm just looking for the safest way to proceed - I'm fine with removing the hanging node altogether, just looking for confirmation that wouldn't leave the cluster in a bad state, and what data points to be looking at to gauge the situation.

If removing the node and starting over is OK, is any other maintenance on the existing nodes recommended? I've read of people scrubbing/rebuilding nodes coming out of this situation, but not sure if that's necessary.

Please let me know if any additional info would be helpful.

Chris Hornung

Rajanish GJ | 21 Nov 18:29 2014

max ttl for column

Does hector or cassandra imposes a limit on max ttl value for a column? 

I am trying to insert record into one of the column family and seeing the following error.. 
Cassandra version : 1.1.12 
Hector  : 1.1-4

Any pointers appreciated. 

me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:ttl is too large. requested (951027277) maximum (630720000))
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.model.MutatorImpl.execute( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched( ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update( ~[hector-core-1.1-4.jar:na]

Also tried using cql, and it seems to hangs and not responding.. trying again with few combinations
INSERT INTO users (key,id) values ('test6','951027277 secs') USING TTL 951027277 ; 

Rajanish GJ
apigee | rajanish <at>  
Rahul Neelakantan | 21 Nov 12:38 2014

Data not replicating consistently

I have a setup that looks like this

Dc1: 9 nodes
Dc2: 9 nodes
Dc3: 9 nodes
C* version: 2.0.10
RF: 2 in each DC
Empty CF with no data at the beginning of the test

Scenario 1 (happy path): I connect to a node in DC1 using CQLsh, validate that I am using CL=1, insert 10 rows.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1, select * on the table, each DC shows all
10 rows.

Scenario 1: using a program based on datastax drivers, write 10 rows to DC1. The program uses CL=1, also does
a read after write.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1 or LocalQuorum, select * on the table,
DC1 shows all 10 rows.
DC2 shows 8 or 9 rows
DC3 shows 8 or 9 rows 
The missing rows never show up in DC2 and DC3 unless I do a CQLsh lookup with CL=all

Why is there a difference in the replication between writes performed using the datastax drivers and while
using CQLsh?

Akhtar Hussain (JIRA | 21 Nov 12:15 2014

[jira] Akhtar Hussain shared a search result with you

    Akhtar Hussain shared a search result with you

    We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node
down in DC2 by kill -9 <Cassandra-pid>, reads fail on DC1 with TimedOutException for a brief amount of time
(15-20 sec~). 

This message was sent by Atlassian JIRA

Jan Karlsson | 21 Nov 10:21 2014

high context switches



We are running a 3 node cluster with RF=3 and 5 clients in a test environment. The C* settings are mostly default. We noticed quite high context switching during our tests. On 100 000 000 keys/partitions we averaged around 260 000 cs (with a max of 530 000).


We were running 12 000~ transactions per second. 10 000 reads and 2000 updates.


Nothing really wrong with that however I would like to understand why these numbers are so high. Have others noticed this behavior? How much context switching is expected and why? What are the variables that affect this?



Lu, Boying | 21 Nov 04:27 2014

A questiion to adding a new data center

Hi, all,


I read the document about how to adding a new data center to existing clusters posted at

But I have a question: Are all those steps executed only at the new adding cluster or on  existing clusters also? ( Step 7 is to be executed on the new cluster according to the document).






Stephane Legay | 20 Nov 17:36 2014

Fwd: sstable usage doubles after repair

I upgraded a 2 node cluster with RF = 2  from 1.0.9 to 2.0.11. I did rolling upgrades and upgradesstables after each upgrade. We then moved our data to new hardware by shutting down each node, moving data to new machine, and starting up with auto_bootstrap = false.

When all was done I ran a repair. Data went from 250GB to 400 GB per node. A week later, I am doing another repair, data filling the 800GB drive on each machine. Huge compaction on each node, constantly.

Where should I go from here? Will scrubbing fix the issue? 


Nikolai Grigoriev | 20 Nov 16:52 2014

coordinator selection in remote DC


There is something odd I have observed when testing a configuration with two DC for the first time. I wanted to do a simple functional test to prove myself (and my pessimistic colleagues ;) ) that it works.

I have a test cluster of 6 nodes, 3 in each DC, and a keyspace that is replicated as follows:

CREATE KEYSPACE xxxxxxx WITH replication = {

  'class': 'NetworkTopologyStrategy',

  'DC2': '3',

  'DC1': '3'


I have disabled the traffic compression between DCs to get more accurate numbers.

I have set up a bunch of IP accounting rules on each node so they count the outgoing traffic from this node to each other node. I had rules for different ports but, of course, but it is mostly about port 7000 (or 7001) when talking about inter-node traffic. Anyway, I have a table that shows the traffic from any node to any node's port 7000.

I have ran a test with DCAwareRoundRobinPolicy and the client talking only to DC1 nodes. Everything looks fine - the client has sent identical amount of data to each of 3 nodes in DC1. These nodes inside of DC1 (I was writing with LOCAL_ONE consistency) have sent similar amount of data to each other that represents exactly two extra replicas.

However, when I look at the traffic from the nodes in DC1 to the nodes in DC1 the picture is different:



















Nodes are in DC1, .159-161 - in DC2. As you can see, each of nodes in DC1 has sent different amount of traffic to the remote nodes: 117Mb, 228Mb and 46Mb respectively. Both DC have one rack.

So, here is my question. How does node select the node in remote DC to send the message to? I did a quick sweep through the code and I could only find the sorting by proximity (checking the rack and DC). So, considering that for each request I fire the targets are all 3 nodes in the remote DC, the list will contain all 3 nodes in DC2. And, if I understood correctly, the first node from the list is picked to send the message.

So, it seems to me that there is no any kind of round-robin-type logic is applied when selecting the target node to forward the write to from the list of targets in remote DC.

If this is true (and the numbers kind of show it is, right?), then probably the list with equal proximity should be shuffled randomly? Or, instead of picking the first target, a random one should be picked?

Nikolai Grigoriev

Andreas Finke | 20 Nov 16:00 2014

Upgrade: C* 2.0.8 -> C* 2.1.1 - Ten thousands of sstable files



we upgraded a 6 node Cluster from Cassandra 2.0.7 to 2.1.1 recently sticking to this guide:


After upgrade cluster was less responsive than before. One node did not came up at all. When checking the data directory, we discovered a huge amount of SSTABLES:


# ls |wc -l



# ls |cut -d'-' -f5 |sort|uniq -c |sort

      1 snapshots

  81757 CompressionInfo.db

  81757 Data.db

  81757 Digest.sha1

  81757 Filter.db

  81757 Index.db

  81757 Statistics.db

  81757 Summary.db

  81757 TOC.txt


Did anyone upgraded yet experiencing this kind of problem?


Thanks and regards


Adil | 20 Nov 09:20 2014

logging over multi-datacenter

We have two data-center, we configured PasswordAuthenticator on each node, we increment the RF of system_auth to the number of nodes (each data-center) as recommended.
We can logged-in via cqlsh without problem, but when i stop cassandra on all nodes of a data-center we can't logged in in the other data-center...this error is displayed as output:
Bad credentials] message="org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM"'

from what i understand we should be able to logged in even if there is only one node UP but it seems that has to reach QUORUM consistency level (2 data-center).

my question is if the java driver cql uses the same condition and if there is a way to set the consistency level to like LOCAL_ONE.