Dan Kinder | 25 Nov 01:26 2014

large range read in Cassandra


We have a web crawler project currently based on Cassandra (https://github.com/iParadigms/walker, written in Go and using the gocql driver), with the following relevant usage pattern:

- Big range reads over a CF to grab potentially millions of rows and dispatch new links to crawl
- Fast insert of new links (effectively using Cassandra to deduplicate)

We ultimately planned on doing the batch processing step (the dispatching) in a system like Spark, but for the time being it is also in Go. We believe this should work fine given that Cassandra now properly allows chunked iteration of columns in a CF.

The issue is, periodically while doing a particularly large range read, other operations time out because that node is "busy". In an experimental cluster with only two nodes (and replication factor of 2), I'll get an error like: "Operation timed out - received only 1 responses." Indicating that the second node took too long to reply. At the moment I have the long range reads set to consistency level ANY but the rest of the operations are on QUORUM, so on this cluster they require responses from both nodes. The relevant CF is also using LeveledCompactionStrategy. This happens in both Cassandra 2 and 2.1.

Despite this error I don't see any significant I/O, memory consumption, or CPU usage.

Here are some of the configuration values I've played with:

Increasing timeouts:
read_request_timeout_in_ms: 15000                                                           
range_request_timeout_in_ms: 30000                                                          
write_request_timeout_in_ms: 10000                                                          
request_timeout_in_ms: 10000

Getting rid of caches we don't need:
key_cache_size_in_mb: 0
row_cache_size_in_mb: 0

Each of the 2 nodes has an HDD for commit log and single HDD I'm using for data. Hence the following thread config (maybe since I/O is not an issue I should increase these?):
concurrent_reads: 16
concurrent_writes: 32
concurrent_counter_writes: 32

Because I have a large number columns and aren't doing random I/O I've increased this:
column_index_size_in_kb: 2048

It's something of a mystery why this error comes up. Of course with a 3rd node it will get masked if I am doing QUORUM operations, but it still seems like it should not happen, and that there is some kind of head-of-line blocking or other issue in Cassandra. I would like to increase the amount of dispatching I'm doing because of this it bogs it down if I do.

Any suggestions for other things we can try here would be appreciated.

Ankit Patel | 25 Nov 00:19 2014

Cassandra version 1.0.10 Data Loss upon restart

We are experiencing data loss with Cassandra 1.0.10 when we had restarted the without flushing. We see in the cassandra logs that the commitlogs were read back without any problems. Until the restart the data was correct. However, after the node restarted we retrieved older version of the data (row caching is turned off). We are reading/writing to a single cassandra node that is replicated to a single node setup at another data center. The times are synchronized across our machines. Has anyone experienced this type of behavior?


Ankit Patel

Kevin Burton | 24 Nov 21:57 2014

What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

I’m trying to track down some exceptions in our production cluster.  I bumped up our write load and now I’m getting a non-trivial number of these exceptions.  Somewhere on the order of 100 per hour.

All machines have a somewhat high CPU load because they’re doing other tasks.  I’m worried that perhaps my background tasks are just overloading cassandra and one way to mitigate this is to nice them to least favorable priority (this is my first tasks).

But I can’t seem to really track down any documentation on HOW to tune cassandra to prevent these. I mean I get the core theory behind all of this just need to track down the docs so I can actually RTFM :)


Founder/CEO Spinn3r.com
Location: San Francisco, CA
… or check out my Google+ profile

Robert Wille | 23 Nov 16:41 2014

Getting the counters with the highest values

I’m working on moving a bunch of counters out of our relational database to Cassandra. For the most part,
Cassandra is a very nice fit, except for one feature on our website. We manage a time series of view counts
for each document, and display a list of the most popular documents in the last seven days. This seems like a
pretty strong anti-pattern for Cassandra, but also seems like something a lot of people would want to do.
If you’re keeping counters, its pretty likely that you’d want to know which ones have the highest

Here’s what I came up with to implement this feature. Create a counter table with primary key (doc_id,
day) and a single counter. Whenever a document is viewed, increment the counter for the document for today
and the previous six days. Sometime after midnight each day, compile the counters into a table with
primary key (day, count, doc_id) and no additional columns. For each partition in the counter table, I
would sum up the counters, delete any counters that are over a week old, and put the sum into the second table
with day = today. When I query the table, i would ask for data where day = yesterday. During the compilation
process, I would delete old partitions. In theory I’d only need two partitions. One that is being built,
and one for querying.

I’d be interested to hear critiques on this strategy, as well as hearing how other people have
implemented a "most-popular" feature using Cassandra counters.


Stephane Legay | 23 Nov 00:25 2014

Compaction Strategy guidance

Hi there,

use case:

- Heavy write app, few reads.
- Lots of updates of rows / columns.
- Current performance is fine, for both writes and reads..
- Currently using SizedCompactionStrategy

We're trying to limit the amount of storage used during compaction. Should we switch to LeveledCompactionStrategy? 

Andy Stec | 21 Nov 23:17 2014

Partial row read

We're getting strange results when reading from Cassandra 2.0 in php using this driver:

Here's the schema:

  day text,
  last_event text,
  event_text text,
  mdn text,
  PRIMARY KEY ((day), last_event)
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Here's the database contents:

cqlsh:msa> SELECT * FROM events;
 day    | last_event          | event_text | mdn
 141121 | 2014-46-21 20:46:45 | event text | 8471111111
(1 rows)

Here's the simple program in php that reads the database:

$dsn = "cassandra:host=;port=9160";
$db = new PDO ($dsn);
$db->exec ("USE msa");
$stmt = $db->query ("SELECT day, last_event, mdn, event_text FROM events");
if ($stmt === false)
   var_dump ($db->errorInfo ());
   $stmt->execute ();
   var_dump ($stmt->fetchAll());

And this is the output the program produces.  Why is it not returning the full row? 

array(1) {
  array(6) {
    string(6) "141121"
    string(6) "141121"
    string(0) ""
    string(0) ""
    string(0) ""
    string(0) ""

Chris Hornung | 21 Nov 18:44 2014

bootstrapping node stuck in JOINING state


I have been bootstrapping 4 new nodes into an existing production cluster. Each node was bootstrapped one at a time, the first 2 completing without errors, but ran into issues with the 3rd one. The 4th node has not been started yet.

On bootstrapping the third node, the data steaming sessions completed without issue, but bootstrapping did not finish. The node is stuck in JOINING state even 19 hours or so after data streaming completed.

Other reports of this issue seem to be related either to network connectivity issues between nodes, or multiple nodes bootstrapping simultaneously. I haven't found any evidence of either of these situations, no errors or stracktraces in the logs.

I'm just looking for the safest way to proceed - I'm fine with removing the hanging node altogether, just looking for confirmation that wouldn't leave the cluster in a bad state, and what data points to be looking at to gauge the situation.

If removing the node and starting over is OK, is any other maintenance on the existing nodes recommended? I've read of people scrubbing/rebuilding nodes coming out of this situation, but not sure if that's necessary.

Please let me know if any additional info would be helpful.

Chris Hornung

Rajanish GJ | 21 Nov 18:29 2014

max ttl for column

Does hector or cassandra imposes a limit on max ttl value for a column? 

I am trying to insert record into one of the column family and seeing the following error.. 
Cassandra version : 1.1.12 
Hector  : 1.1-4

Any pointers appreciated. 

me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:ttl is too large. requested (951027277) maximum (630720000))
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch(AbstractColumnFamilyTemplate.java:115) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched(AbstractColumnFamilyTemplate.java:163) ~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update(ColumnFamilyTemplate.java:69) ~[hector-core-1.1-4.jar:na]

Also tried using cql, and it seems to hangs and not responding.. trying again with few combinations
INSERT INTO users (key,id) values ('test6','951027277 secs') USING TTL 951027277 ; 

Rajanish GJ
apigee | rajanish <at> apigee.com  
Rahul Neelakantan | 21 Nov 12:38 2014

Data not replicating consistently

I have a setup that looks like this

Dc1: 9 nodes
Dc2: 9 nodes
Dc3: 9 nodes
C* version: 2.0.10
RF: 2 in each DC
Empty CF with no data at the beginning of the test

Scenario 1 (happy path): I connect to a node in DC1 using CQLsh, validate that I am using CL=1, insert 10 rows.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1, select * on the table, each DC shows all
10 rows.

Scenario 1: using a program based on datastax drivers, write 10 rows to DC1. The program uses CL=1, also does
a read after write.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1 or LocalQuorum, select * on the table,
DC1 shows all 10 rows.
DC2 shows 8 or 9 rows
DC3 shows 8 or 9 rows 
The missing rows never show up in DC2 and DC3 unless I do a CQLsh lookup with CL=all

Why is there a difference in the replication between writes performed using the datastax drivers and while
using CQLsh?

Akhtar Hussain (JIRA | 21 Nov 12:15 2014

[jira] Akhtar Hussain shared a search result with you

    Akhtar Hussain shared a search result with you


    We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node
down in DC2 by kill -9 <Cassandra-pid>, reads fail on DC1 with TimedOutException for a brief amount of time
(15-20 sec~). 

This message was sent by Atlassian JIRA

Jan Karlsson | 21 Nov 10:21 2014

high context switches



We are running a 3 node cluster with RF=3 and 5 clients in a test environment. The C* settings are mostly default. We noticed quite high context switching during our tests. On 100 000 000 keys/partitions we averaged around 260 000 cs (with a max of 530 000).


We were running 12 000~ transactions per second. 10 000 reads and 2000 updates.


Nothing really wrong with that however I would like to understand why these numbers are so high. Have others noticed this behavior? How much context switching is expected and why? What are the variables that affect this?