Ulrich Geilmann | 1 Apr 18:26 2015

replace_address vs add+removenode


The documentation suggests to use the replace_address startup
parameter for replacing a dead node. However, it doesn't motivate why
this is superior over adding a new node and retiring the dead one
using nodetool removenode.
I assume it can be more efficient since the new node can take over the
exact tokens of the dead node. Are there any other differences?
Can it be reasonable to not use replace_address in the interest of
more uniform operations?

br, Ulrich

Firdousi Farozan | 1 Apr 17:54 2015

Table design for historical data


My requirement is to design a table for historical state information (not exactly time-series). For ex: I have devices connecting and disconnecting to the management platform. I want to know the details such as (name, mac, os, image, etc.) for all devices connected to the management platform in a given interval (start and end time).

Any help on table design for this use-case?


Craig Ching | 1 Apr 16:17 2015

Datastax driver object mapper and union field


We need to implement a union field in our cassandra data model and we're using the datastax Mapper.  Anyone have any recommendations for doing this?  I'm thinking something like:

public class Value {
  int dataType;
  String valueAsString;
  double valueAsDouble;

If the Value is a String, do we need to store a double as well (and vice versa)?  Or should we convert the double to a java.lang.Double and null it?  If we did the latter, do we have to worry about tombstones?

Thanks and appreciate any advice!

Amlan Roy | 1 Apr 16:03 2015

Frequent timeout issues


I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am writing the same data in HBase and Cassandra and find that the writes are extremely slow in Cassandra and frequently seeing exception “Cassandra timeout during write query at consistency ONE". The cluster size for both HBase and Cassandra are same. 

Looks like something is wrong with my cluster setup. What can be the possible issue? Data and commit logs are written into two separate disks. 

Jake Luciani | 1 Apr 15:43 2015


CVE-2015-0225: Apache Cassandra remote execution of arbitrary code

Severity: Important

The Apache Software Foundation

Versions Affected:
Cassandra 1.2.0 to 1.2.19
Cassandra 2.0.0 to 2.0.13
Cassandra 2.1.0 to 2.1.3

Under its default configuration, Cassandra binds an unauthenticated
JMX/RMI interface to all network interfaces.  As RMI is an API for the
transport and remote execution of serialized Java, anyone with access
to this interface can execute arbitrary code as the running user.

1.2.x has reached EOL, so users of <= 1.2.x are recommended to upgrade
to a supported version of Cassandra, or manually configure encryption
and authentication of JMX,
2.0.x users should upgrade to 2.0.14
2.1.x users should upgrade to 2.1.4
Alternately, users of any version not wishing to upgrade can
reconfigure JMX/RMI to enable encryption and authentication according
to https://wiki.apache.org/cassandra/JmxSecurityor

This issue was discovered by Georgi Geshev of MWR InfoSecurity

Serega Sheypak | 1 Apr 14:53 2015

Testing sstableloader between Cassandra 2.1 DSE and community edition 2.1

Hi,  I have 2 cassandra clusters.
cluster1 is datastax community 2.1
cluster2 is datastax DSE

I can run sstableloader from cluster1(Community) and stream data to cluster2 (DSE)
But I get exception while streaming from cluster2 (DSE) to cluster1 (Community)

The expection is:

Could not retrieve endpoint ranges: 


java.lang.RuntimeException: Could not retrieve endpoint ranges: 

at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:282)

at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)

at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)

Caused by: java.lang.NullPointerException

at org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:33)

at org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:24)

at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)

at org.apache.cassandra.cql3.UntypedResultSet$Row.getBoolean(UntypedResultSet.java:102)

at org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1701)

at org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1059)

at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:274)

Andrew Vant | 31 Mar 22:54 2015

Cross-datacenter requests taking a very long time.

I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per datacenter. If I open cqlsh and do
a select with any consistency level that crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+
minutes to return. The same statement with consistency ONE or LOCAL_QUORUM is as fast as it should be. It
does not appear to be latency between centers; I can point cqlsh at a server in a different DC and it's not
noticeably slow.

I tried turning tracing on to get a better idea of what was happening; but it complains `Session <long hex
string> wasn't found`. 

I'm not entirely sure what direction to look in to find the problem. 



Saurabh Sethi | 31 Mar 22:23 2015

Column value not getting updated

I have written a unit test that creates a column family, inserts a row in that column family and then updates the value of one of the columns.

After updating, unit test immediately tries to read the updated value for that column, but Cassandra returns the old value.
  • I am using QueryBuilder API and not CQL directly.
  • I am using the consistency level of QUORUM for everything – insert, update and read.
  • Cassandra is running as a 3 node cluster with replication factor of 3.

Anyone has any idea what is going on here?

Caraballo, Rafael | 31 Mar 17:15 2015

Cassandra 2.0.12 - lots of "Got an IOException in internalRead!"

In Cassandra 2.0.12, we are seeing a hundreds of these warning in every node…


WARN [Thrift-Selector_16] 2015-03-31 00:48:49,700 Message.java (line 364) Got an IOException in internalRead!

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

        at sun.nio.ch.IOUtil.read(IOUtil.java:197)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

        at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)

        at com.thinkaurelius.thrift.util.mem.Buffer.readFrom(Buffer.java:96)

        at com.thinkaurelius.thrift.Message.internalRead(Message.java:360)

        at com.thinkaurelius.thrift.Message.read(Message.java:146)

        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleRead(TDisruptorServer.java:588)

        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:567)

        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423)

        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383)


Rafael Caraballo

This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
Tiwari, Tarun | 31 Mar 16:42 2015

Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper

Hi Experts,


I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector


Is there something else I need to import in the program or dependencies?


RUNTIME ERROR:  Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper

at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


Below is my scala program


/*** ld_Cassandra_Table.scala ***/

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

import com.datastax.spark.connector

import com.datastax.spark.connector._


object ldCassandraTable {

                def main(args: Array[String]) {

  val fileName = args(0)

  val tblName = args(1)

  val conf = new SparkConf(true).set("spark.cassandra.connection.host", "<MASTER HOST>") .setMaster("<MASTER URL>") .setAppName("LoadCassandraTableApp")

  val sc = new SparkContext(conf)


  val normalfill = sc.textFile(fileName).map(line => line.split('|'))

  normalfill.map(line => (line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), line(14), line(15), line(16), line(17), line(18), line(19), line(20), line(21))).saveToCassandra(keyspace, tblName, SomeColumns("wfctotalid", "timesheetitemid", "employeeid", "durationsecsqty", "wageamt", "moneyamt", "applydtm", "laboracctid", "paycodeid", "startdtm", "stimezoneid", "adjstartdtm", "adjapplydtm", "enddtm", "homeaccountsw", "notpaidsw", "wfcjoborgid", "unapprovedsw", "durationdaysqty", "updatedtm", "totaledversion", "acctapprovalnum"))

  println("Records Loaded to ".format(tblName))






Below is the sbt file:


name:= “POC”

version := "0.0.1"


scalaVersion := "2.10.4"


// additional libraries

libraryDependencies ++= Seq(

  "org.apache.spark" %% "spark-core" % "1.1.1" % "provided",

  "org.apache.spark" %% "spark-sql" % "1.1.1" % "provided",

  "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.1" % "provided"




Tarun Tiwari | Workforce Analytics-ETL | Kronos India

M: +91 9540 28 27 77 | Tel: +91 120 4015200

Kronos | Time & Attendance • Scheduling • Absence Management • HR & Payroll • Hiring • Labor Analytics

Join Kronos on: kronos.com | Facebook | Twitter | LinkedIn | YouTube


Robert Wille | 31 Mar 14:22 2015

Help understanding aftermath of death by GC

I moved my site over to Cassandra a few months ago, and everything has been just peachy until a few hours ago
(yes, it would be in the middle of the night) when my entire cluster suffered death by GC. By death by GC, I
mean this:

[rwille <at> cas031 cassandra]$ grep GC system.log | head -5
 INFO [ScheduledTasks:1] 2015-03-31 02:49:57,480 GCInspector.java (line 116) GC for
ConcurrentMarkSweep: 30219 ms for 1 collections, 7664429440 used; max is 8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:50:32,180 GCInspector.java (line 116) GC for
ConcurrentMarkSweep: 30673 ms for 1 collections, 7707488712 used; max is 8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:51:05,108 GCInspector.java (line 116) GC for
ConcurrentMarkSweep: 30453 ms for 1 collections, 7693634672 used; max is 8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:51:38,787 GCInspector.java (line 116) GC for
ConcurrentMarkSweep: 30691 ms for 1 collections, 7686028472 used; max is 8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:52:12,452 GCInspector.java (line 116) GC for
ConcurrentMarkSweep: 30346 ms for 1 collections, 7701401200 used; max is 8329887744

I’m pretty sure I know what triggered it. When I first started developing to Cassandra, I found the IN
clause to be supremely useful, and I used it a lot. Later I figured out it was a bad thing and repented and
fixed my code, but I missed one spot. A maintenance task spent a couple of hours repeatedly issuing queries
with IN clauses with 1000 items in the clause and the whole system went belly up.

I get that my bad queries caused Cassandra to require more heap than was available, but here’s what I
don’t understand. When the crap hit the fan, the maintenance task died due to a timeout error, but the
cluster never recovered. I would have expected that when I was no longer issuing the bad queries, that the
heap would get cleaned up and life would resume to normal. Can anybody help me understand why Cassandra
wouldn’t recover? How is it that GC pressure will cause heap to be permanently uncollectable?

This makes me pretty worried. I can fix my code, but I don’t really have control over spikes. If memory
pressure spikes, I can tolerate some timeouts and errors, but if it can’t come back when the pressure is
gone, that seems pretty bad.

Any insights would be greatly appreciated