lewis john mcgibbney | 20 Sep 21:25 2014
Picon

[ANNOUNCE] Apache Gora 0.5 Release

Hi Folks,
Apologies for cross posting.
The Apache Gora team are pleased to announce the immediate availability of Apache Gora 0.5.

The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop™ MapReduce support. Gora uses the Apache Software License v2.0.

This release addresses no fewer than 44 issues [0] with many being improvements and new functionality. Most notably the release includes the addition of a new module for MongoDB, Shim ffunctionality to support multiple Hadoop versions, improved authentication for Accumulo, better documentation for many modules, and pluggable solrj implementations supporting a default value of http for HttpSolrServer. Available options include http (HttpSolrServer), cloud (CloudSolrServer), concurrent (ConcurrentUpdateSolrServer) and loadbalance (LBHttpSolrServer).

Suggested Gora database support is as follows
  • Apache Avro 1.7.6
  • Apache Hadoop 1.0.1 and 2.4.0
  • Apache HBase 0.94.14
  • Apache Cassandra 2.0.2
  • Apache Solr 4.8.1
  • MongoDB 2.6
  • Apache Accumlo 1.5.1

Gora is released as both source code, downloads for which can be found at our downloads page [1] as well as Maven artifacts which can be found on Maven central [2].

Thank you

Lewis

(on behalf of the Apache Gora PMC)



--

` : : , : #+` . ,,` , ` ;##` .`,. ;;;;;;':;` `` ## <at> . ;.;: ,;+;;;';;';;';'` ```,###: .,;; +;;';;;;';;+;;;';;` ```#+##'``;+ '';;;';;;;';;';;;';;;` ```,##+# <at> :: ''';';;';+;;';;;;;;':::+: ```.#####'';';+;;';;;;;';';;';;';;':,;: ````'#+#+#';';''';;';;;;;';;';;';;;;;':: ;;:';,######''''''';'';;';';;'';;;'::';;;':.``` `.,;;;;`;;;+####+';'';;';';;;;';;';;;';;'::';;:;';;;:: :`,.,.`:;;;;';+#####+;;''';';;;;';';';;';;';;';;;'::;';:;. .`..;,:`';;';';;;'+#####+';;''+';;';:';;;;';;';;;;;';;;':::;,:` ` ,`:. ;;;';';;;;;;;++#####+'';''';';;;;';;;;;;;;+;;';;';::';';;:.. ` `` ;;;';';;;;;';';;'+#######+';';';;;;';;';;';;';;';;;';;',:;;;;. ` ` `;:;;';;;;;';';;;;;;;'########+;';';;';;';;';;';;'';;';;;;;';::; `.;,:::;::;';';;;;;;;;;;'#####+####+'';;;;';;';;;'';+;;;;';:::''::;;..: ```:,'::,;';;;;;;;;;';;;';;;''##########+++'';;';;';;;'';;;;';;':,,,:.:,.` ```..::,;';:;;;;;';';';;;';';';';'''++###########+'+;';;;';;;';;:;.:..:.., ````,;;:;:;;;;;';';;;;;;;;;;;;;;;;';;;;';''++##########+++;;;;.:..:.,....; ` `.``,,:,';;::;;::';';;;;;;;';';;';';;';';;';;';';';'++#+### <at> #++:...,,.;:. `````:.';.,;;',,;;;';';;;;;;';;':;;;';';;';;';';';;';;;''.:,:.,:'# <at> '::, ```````.:,';;.::':';';',;;;';;':;;;;;';;';;';;;';;';'';;.;.,.:..,:.:: ``````````:::',:;';;,:;;',:';';;':';;;;;';;;';;'::';;;,..,.,.,:+` `````.````````.:'+:';;',;';,:;:';;;,,';::,';;',,';;.:.:;,```` ``````````````````,.';;:':,;:;,,:;:::``````..````,:,`` ````````````````````````````````````````:`;;` `````````````````` : ,:`
http://people.apache.org/~lewismc || <at> hectorMcSpector || http://www.linkedin.com/in/lmcgibbney

Apache Gora V.P || Apache Nutch PMC || Apache Any23 V.P || Apache OODT PMC ||
Apache Open Climate Workbench PMC || Apache Tika PMC || Apache TAC
Erik Forsberg | 20 Sep 09:21 2014
Picon

Restart joining node

Hi!

On the same subject as before - due to full disk during bootstrap, my joining nodes are stuck. What's the correct procedure here, will a plain restart of the node do the right thing, i.e. continue where bootstrap stopped, or is it better to clean the data directories before new start of daemon?

Regards,
\EF
Erik Forsberg | 20 Sep 09:11 2014
Picon

Running out of disk at bootstrap in low-disk situation

Hi!

We have unfortunately managed to put ourselves in a situation where we are really close to full disks on our existing 27 nodes.  

We are now trying to add 15 more nodes, but running into problems with out of disk space on the new nodes while joining. 

We're using vnodes, on Cassandra 1.2.18 (yes, I know that's old, and I'll upgrade as soon as I'm out of this problematic situation). 

I've added all the 15 nodes, with some time inbetween - definitely more than the 2-minute rule. But it seems like compaction is not keeping up with the incoming data. Or at least that's my theory.

What are the recommended settings to avoid this problem? I have now set compaction threshold to 0 for unlimited compaction bandwidth, hoping that will help (will it?)

Will it help to lower the streaming throughput too? I'm unsure about the latter since from observation it seems that compaction will not start until it has finished streaming from a node. With 27 nodes sharing the incoming bandwidth, all of them will take equally long time to finish and then the compaction can occur. I guess I could limit streaming bandwidth on some of the source nodes too. Or am I completely wrong here? 

Other ideas most welcome.

Regards,
\EF


Randy Fradin | 20 Sep 00:09 2014
Picon

Upgrade steps to address CASSANDRA-4411

I have a question about the steps listed in this article for addressing CASSANDRA-4411 in an upgrade from a version <= 1.1.3 or to a version >= 1.1.5 when using leveled compaction: http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps

It suggests first upgrading the entire cluster, then shutting down the nodes one-at-a-time and running sstablescrub on each. My question is, why wouldn't I just run the sstablescrub as I am upgrading each node in the first place? In other words, can I just shut down my pre-1.1.3 node, run sstablescrub, then start 1.1.5+, one node at a time? Is it necessary to have already started the new version of Cassandra at least once for the scrub to work?

Thanks

Les Hartzman | 19 Sep 23:46 2014
Picon

Help with approach to remove RDBMS schema from code to move to C*?

My company is using an RDBMS for storing time-series data. This application was developed before Cassandra and NoSQL. I'd like to move to C*, but ...

The application supports data coming from multiple models of devices. Because there is enough variability in the data, the main table to hold the device data only has some core columns defined. The other columns are non-specific; a set of columns for numeric and a set for character. So for these non-specific columns, their use is defined in the code. The use of column 'numeric_1' might hold a millisecond time for one device and a fault code for another device. This appears to have been done to keep from modifying the schema whenever a new device was introduced. And they rolled their own db interface to support this mess.

Now, we could just use C* like an RDBMS - defining CFs to mimic the tables. But this just pushes a bad design from one platform to another.

Clearly there needs to be a code re-write. But what suggestions does anyone have on how to make this shift to C*?

Would you just layout all of the columns represented by the different devices, naming them as they are used, and having jagged rows? Or is there some other way to approach this?

Of course, the data miners already have scripts/methods for accessing the data from the RDBMS now in the user-unfriendly form it's in now. This would have to be addressed as well, but until I know how to store it, mining it gets ahead of things.

Thanks.

Les

DuyHai Doan | 19 Sep 23:19 2014
Picon

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

"It will merge requests to neighboring ranges when the same node is a replica for both of them.  Without vnodes, this usually results in all ranges for a node being merged.  With vnodes, merging still happens, but not all ranges can be merged." -->

 But does it implies that with vnodes, there are actually "extra work" to do for scanning indices ? If yes, is this "extra load" rather I/O bound or CPU bound ? 

On Fri, Sep 19, 2014 at 11:10 PM, Tyler Hobbs <tyler <at> datastax.com> wrote:

On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel <pateljay3001 <at> gmail.com> wrote:

Btw, there is no data in the table. Table is empty. Query is fired on the empty table.

This is actually the worst case for secondary index lookups.
 

From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node & same query works fine.

If you look at the output1.txt attached earlier, coordinator is firing index scan on a given node (for example, 192.168.51.22 in the below snippet from output1.txt) multiple times for different token ranges. Why can't it fire only one time? With non-vnode, it's only one time & query comes back very fast.

It will merge requests to neighboring ranges when the same node is a replica for both of them.  Without vnodes, this usually results in all ranges for a node being merged.  With vnodes, merging still happens, but not all ranges can be merged.


--
Tyler Hobbs
DataStax

Donald Smith | 19 Sep 23:17 2014

Is it wise to increase native_transport_max_threads if we have lots of CQL clients?

If we have hundreds of CQL clients (for C* 2.0.9), should we increase native_transport_max_threads  in cassandra.yaml from the default (128)  to the number of clients?   If we don’t do that, I presume requests will queue up, resulting in higher latency,  What’s a reasonable max value for increase native_transport_max_threads?

 

Thanks, Don 

 

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
donalds <at> AudienceScience.com


 

Tim Dunphy | 19 Sep 23:05 2014
Picon

can't launch cassandra 2.1.0

Hey all,

 I'm attempting to upgrade from cassandra 2.0.10 to version 2.1.0. 

However when launching the new version I'm running into the following: 

[root <at> beta-new:/etc/alternatives/cassandrahome] #./bin/cassandra -f
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-cassandra-2.1.0/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/apache-cassandra-2.1.0/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
INFO  21:02:28 Hostname: beta-new.jokefire.com
INFO  21:02:28 Loading settings from file:/usr/local/apache-cassandra-2.1.0/conf/cassandra.yaml
INFO  21:02:28 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=Jokefire Cluster; column_index_size_in_kb=64; commitlog_directory=/var/lib/cassandra/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_throughput_mb_per_sec=16; concurrent_reads=32; concurrent_writes=32; cross_node_timeout=false; data_file_directories=[/var/lib/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=162.243.109.94; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/var/lib/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=162.243.109.94}]}]; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]
INFO  21:02:29 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
ERROR 21:02:29 Exception encountered during startup
java.lang.NoSuchMethodError: org.github.jamm.MemoryMeter.withGuessing(Lorg/github/jamm/MemoryMeter$Guess;)Lorg/github/jamm/MemoryMeter;
        at org.apache.cassandra.utils.ObjectSizes.<clinit>(ObjectSizes.java:34) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.dht.Murmur3Partitioner.<clinit>(Murmur3Partitioner.java:46) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at java.lang.Class.forName0(Native Method) ~[na:1.8.0]
        at java.lang.Class.forName(Class.java:259) ~[na:1.8.0]
        at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:463) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:483) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:238) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:129) ~[apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:109) [apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457) [apache-cassandra-2.1.0.jar:2.1.0]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) [apache-cassandra-2.1.0.jar:2.1.0]
java.lang.NoSuchMethodError: org.github.jamm.MemoryMeter.withGuessing(Lorg/github/jamm/MemoryMeter$Guess;)Lorg/github/jamm/MemoryMeter;
        at org.apache.cassandra.utils.ObjectSizes.<clinit>(ObjectSizes.java:34)
        at org.apache.cassandra.dht.Murmur3Partitioner.<clinit>(Murmur3Partitioner.java:46)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:259)
        at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:463)
        at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:483)
        at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429)
        at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:238)
        at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:129)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:109)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546)
Exception encountered during startup: org.github.jamm.MemoryMeter.withGuessing(Lorg/github/jamm/MemoryMeter$Guess;)Lorg/github/jamm/MemoryMeter;

I was just wondering if I could please have some guidance in how to get this version to launch.

Thanks!
Tim

--
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

cass savy | 19 Sep 21:14 2014
Picon

Upgrade to DSE 4.5

We run on DSE 3.1.3 and only use the Cassandra in prod cluster.

What is the release that I need to be on right away. Because if I need to upgrade to DSE 4.5.c* 2.0.7. I need to take 3 paths to get there. I see lot of improvements for solr/Hadoop features in DSE 4.0 and above.

Can I upgrade to  DSE 3.2.7 for now , be on it for sometime and then later upgrade to DSE 4.5?

I do not want to upgrade to multiple versions on the same day and its hard to know which one is the culprit is we run into issues? 

Has any of you have DSE 4.5 in prod or DSE 3.2.7 and have found any issues?


Check Peck | 19 Sep 17:41 2014
Picon

Wide Rows - Data Model Design

I am trying to use wide rows concept in my data modelling design for Cassandra. We are using Cassandra 2.0.6.

    CREATE TABLE test_data (
      test_id int,
      client_name text,
      record_data text,
      creation_date timestamp,
      last_modified_date timestamp,
      PRIMARY KEY (test_id, client_name, record_data)
    )
   
So I came up with above table design. Does my above table falls under the category of wide rows in Cassandra or not?

And is there any problem If I have three columns in my  PRIMARY KEY? I guess PARTITION KEY will be test_id right? And what about other two?

In this table, we can have multiple record_data for same client_name.

Query Pattern will be -

select client_name, record_data from test_data where test_id = 1;
Tim Dunphy | 19 Sep 15:33 2014
Picon

what's cool about cassandra 2.1.0?

Hey all, 

 I tried googling around to get an idea about what was new (and potentially cool) in the newest release of cassandra - 2.1.0.

But all that I've been able to find so far is this kind of general statement about the new features. 


It doesn't seem to have a lot of detail!  Particularly I'm curious about how CQL has been enhanced beyond just an incomplete list of new data types. I'd like to know what the performance improvements are, How the row cache has been improved. Etc. You get the idea! So where can I find a more complete description of how this update is of benefit?

Thanks!
Tim

--
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Gmane