Michael Klishin | 20 Dec 22:43 2014
Picon

ANN Cassaforte 2.0 is released

Cassaforte [1] is a modern Clojure client for Apache Cassandra 
and DataStax Enterprise.

2.0 is a major release with breaking API changes. Release notes: 
http://blog.clojurewerkz.org/blog/2014/12/20/cassaforte-2-dot-0-0-is-released/

Documentation guides (up-to-date for 2.0):
http://clojurecassandra.info/articles/guides.html

1. http://clojurecassandra.info

--  
 <at> michaelklishin, github.com/michaelklishin

Jiri Horky | 19 Dec 20:20 2014

Node down during move

Hi list,

we added a new node to existing 8-nodes cluster with C* 1.2.9 without
vnodes and because we are almost totally out of space, we are shuffling
the token fone node after another (not in parallel). During one of this
move operations, the receiving node died and thus the streaming failed:

 WARN [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,227
StorageService.java (line 3703) Streaming to /X.Y.Z.18 failed
 INFO [RMI TCP Connection(12940)-X.Y.Z.17] 2014-12-19 19:25:56,233
ColumnFamilyStore.java (line 629) Enqueuing flush of
Memtable-local <at> 433096244(70/70 serialized/live bytes, 2 ops)
 INFO [FlushWriter:3772] 2014-12-19 19:25:56,238 Memtable.java (line
461) Writing Memtable-local <at> 433096244(70/70 serialized/live bytes, 2 ops)
ERROR [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,246
CassandraDaemon.java (line 192) Exception in thread Thread[Streaming to
/X.Y.Z.18:2,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Broken pipe
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

After restart of the receiving node, we tried to perform the move again,
(Continue reading)

Batranut Bogdan | 19 Dec 15:35 2014
Picon

Key Cache Questions

Hello all,
I just read that the default size of the Key cache is 100 MB. Is it stored in memory or disk? 
Svec, Michal | 19 Dec 14:26 2014

Drivers performance


Hello,

I am in the middle of evaluating whether we should switch from Astyanax to datastax driver and I did simple benchmark that load 10 000 times the same row by key and I was surprised with the slowness of datastax driver. I uploaded it to github.

https://github.com/michalsvec/astyanax-datastax-benchmark

 

It was tested against Cassandra 1.2 and 2.1. Testing conditions were naive (localhost, single node, …) but still the difference is huge.

 

10 000 iterations:

·         Astyanax:2734 ms

·         Astyanax prepared:1997 ms

·         Datastax:10230 ms

 

Is it really so slow or do I miss something?

 

Thank you for any advice.

Michal

 



NOTICE: This email and any attachments may contain confidential and proprietary information of NetSuite Inc. and is for the sole use of the intended recipient for the stated purpose. Any improper use or distribution is prohibited. If you are not the intended recipient, please notify the sender; do not review, copy or distribute; and promptly delete or destroy all transmitted information. Please note that all communications and information transmitted through this email system may be monitored and retained by NetSuite or its agents and that all incoming email is automatically scanned by a third party spam and filtering service which may result in deletion of a legitimate e-mail before it is read by the intended recipient.
Alain RODRIGUEZ | 19 Dec 11:14 2014
Picon

Multi DC informations (sync)

Hi guys,

We expanded our cluster to a multiple DC configuration.

Now I am wondering if there is any way to know:

1 - The replication lag between these 2 DC (Opscenter, nodetool, other ?)
2 - Make sure that sync is ok at any time

I guess big companies running Cassandra are interested in these kind of info, so I think something exist but I am not aware of it.

Any other important information or advice you can give me about best practices or tricks while running a multi DC (cross regions US <-> EU) is welcome of course !

cheers,

Alain
nitin padalia | 19 Dec 10:54 2014
Picon

Reset cfhistograms

Hi,
I am using cassandra 2.1.2 with 5 node cluster single DC.
I've read that histograms are reset after node restart or rerun of command.
But in my case it's not resetting by running every time.
Could someone point what could be the issue or how could I reset it without restarting node.
Thanks! in advance.
-Nitin

diwayou | 19 Dec 06:41 2014

2014 nosql benchmark

i just have read this benchmark pdf, does anyone have some opinion about this?
i think it's not fair about cassandra
url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
http://msrg.utoronto.ca/papers/NoSQLBenchmark
Leon Oosterwijk | 19 Dec 04:55 2014

Cassandra 2.1.0 Crashes the JVM with OOM with heaps of memory free

All,

 

We have a Cassandra cluster which seems to be struggling a bit. I have one node which crashes continually, and others which crash sporadically. When they crash it’s with a JVM couldn’t allocate memory, even though there’s heaps available. I suspect it’s because one table which is very big. (500GB) which has on the order of 500K-700K files in its directory. When I delete the directory contents on the crashing node and ran a repair, the nodes around this node crashed while streaming the data. Here is the relevant bits from the crash file and environment.

 

Any help would be appreciated.

 

#

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.

# Possible reasons:

#   The system is out of physical RAM or swap space

#   In 32 bit mode, the process size limit was hit

# Possible solutions:

#   Reduce memory load on the system

#   Increase physical memory or swap space

#   Check if swap backing store is full

#   Use 64 bit Java on a 64 bit OS

#   Decrease Java heap size (-Xmx/-Xms)

#   Decrease number of Java threads

#   Decrease Java thread stack sizes (-Xss)

#   Set larger code cache with -XX:ReservedCodeCacheSize=

# This output file may be truncated or incomplete.

#

#  Out of Memory Error (os_linux.cpp:2671), pid=1104, tid=139950342317824

#

# JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)

# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode linux-amd64 compressed oops)

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

 

---------------  T H R E A D  ---------------

 

Current thread (0x00007f4acabb1800):  JavaThread "Thread-13" [_thread_new, id=19171, stack(0x00007f48ba6ca000,0x00007f48ba70b000)]

 

Stack: [0x00007f48ba6ca000,0x00007f48ba70b000],  sp=0x00007f48ba709a50,  free space=254k

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

V  [libjvm.so+0xa76cea]  VMError::report_and_die()+0x2ca

V  [libjvm.so+0x4e52fb]  report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0x8b

V  [libjvm.so+0x8e4ec3]  os::Linux::commit_memory_impl(char*, unsigned long, bool)+0x103

V  [libjvm.so+0x8e4f8c]  os::pd_commit_memory(char*, unsigned long, bool)+0xc

V  [libjvm.so+0x8dce4a]  os::commit_memory(char*, unsigned long, bool)+0x2a

V  [libjvm.so+0x8e33af]  os::pd_create_stack_guard_pages(char*, unsigned long)+0x7f

V  [libjvm.so+0xa21bde]  JavaThread::create_stack_guard_pages()+0x5e

V  [libjvm.so+0xa29954]  JavaThread::run()+0x34

V  [libjvm.so+0x8e75f8]  java_start(Thread*)+0x108

C  [libpthread.so.0+0x79d1]

 

 

Memory: 4k page, physical 131988232k(694332k free), swap 37748728k(37748728k free)

 

vm_info: Java HotSpot(TM) 64-Bit Server VM (25.20-b23) for linux-amd64 JRE (1.8.0_20-b26), built on Jul 30 2014 13:13:52 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

 

time: Fri Dec 19 14:37:29 2014

elapsed time: 2303 seconds (0d 0h 38m 23s)

 

OS:Red Hat Enterprise Linux Server release 6.5 (Santiago)

 

uname:Linux 2.6.32-431.5.1.el6.x86_64 #1 SMP Fri Jan 10 14:46:43 EST 2014 x86_64

libc:glibc 2.12 NPTL 2.12

rlimit: STACK 10240k, CORE 0k, NPROC 8192, NOFILE 65536, AS infinity

load average:4.18 4.79 4.54

 

/proc/meminfo:

MemTotal:       131988232 kB

MemFree:          694332 kB

Buffers:          837584 kB

Cached:         51002896 kB

SwapCached:            0 kB

Active:         93953028 kB

Inactive:       32850628 kB

Active(anon):   70851112 kB

Inactive(anon):  4713848 kB

Active(file):   23101916 kB

Inactive(file): 28136780 kB

Unevictable:           0 kB

Mlocked:               0 kB

SwapTotal:      37748728 kB

SwapFree:       37748728 kB

Dirty:             75752 kB

Writeback:             0 kB

AnonPages:      74963768 kB

Mapped:           739884 kB

Shmem:            601592 kB

Slab:            3460252 kB

SReclaimable:    3170124 kB

SUnreclaim:       290128 kB

KernelStack:       36224 kB

PageTables:       189772 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    169736960 kB

Committed_AS:   92208740 kB

VmallocTotal:   34359738367 kB

VmallocUsed:      492032 kB

VmallocChunk:   34291733296 kB

HardwareCorrupted:     0 kB

AnonHugePages:  67717120 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:        5056 kB

DirectMap2M:     2045952 kB

DirectMap1G:    132120576 kB

 

Before you say It’s a ulimit issue:

[501]> ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 1030998

max locked memory       (kbytes, -l) 64

max memory size         (kbytes, -m) unlimited

open files                      (-n) 8192

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 10240

cpu time               (seconds, -t) unlimited

max user processes              (-u) 8192

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

 

Here’s the filecount on one of the nodes for this very big table:

loosterw <at> NODE:/env/datacache/data/cassandra/data/datastore/bigtable-e58925706a3c11e4ba63adfbd009c4d6

> ls | wc -l

588636

 

Thanks,

 

Leon

 

 

This email, including any attachments, is confidential. If you are not the intended recipient, you must not disclose, distribute or use the information in this email in any way. If you received this email in error, please notify the sender immediately by return email and delete the message. Unless expressly stated otherwise, the information in this email should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product or service, an official confirmation of any transaction, or as an official statement of the entity sending this message. Neither Macquarie Group Limited, nor any of its subsidiaries, guarantee the integrity of any emails or attached files and are not responsible for any changes made to them by any other person.

 

Rajath Subramanyam | 19 Dec 00:08 2014
Picon

Practical use of counters in the industry

Hi Folks, 

Have any of you come across blogs that describe how companies in the industry are using Cassandra counters practically.

Thanks in advance. 

Regards,
Rajath
------------------------
Rajath Subramanyam

Y.Wong | 18 Dec 22:58 2014
Picon

Re: full gc too oftenvAquin p y l mmm am m

V

On Dec 4, 2014 11:14 PM, "Philo Yang" <ud1937 <at> gmail.com> wrote:
Hi,all

I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full gc that sometime there may be one or two nodes full gc more than one time per minute and over 10 seconds each time, then the node will be unreachable and the latency of cluster will be increased.

I grep the GCInspector's log, I found when the node is running fine without gc trouble there are two kinds of gc: 
ParNew GC in less than 300ms which clear the Par Eden Space and enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in more than 200ms, there is only a small number of ParNew GC in log)
ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and enlarge Par Eden Space little, each 1-2 hours it will be executed once.

However, sometimes ConcurrentMarkSweep will be strange like it shows:

INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 - ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -> 3579838464; Par Eden Space: 503316480 -> 294794576; Par Survivor Space: 62914528 -> 0
INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -> 3579836512; Par Eden Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0
INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 - ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -> 3579805792; Par Eden Space: 503316480 -> 332391096; Par Survivor Space: 62914544 -> 0
INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 - ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -> 3579829760; Par Eden Space: 503316480 -> 351991456; Par Survivor Space: 62914552 -> 0
INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -> 3579799752; Par Eden Space: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0
INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 - ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -> 3579817392; Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 62914552 -> 0
INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 - ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -> 3579838424; Par Eden Space: 503316480 -> 408992784; Par Survivor Space: 62896720 -> 0
INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 - ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -> 3579816424; Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0
INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -> 3579785496; Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528 -> 0
INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 - ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 -> 3579814464; Par Eden Space: 503316480 -> 448782440; Par Survivor Space: 62914560 -> 0

In each time Old Gen reduce only a little, Survivor Space will be clear but the heap is still full so there will be another full gc very soon then the node will down. If I restart the node, it will be fine without gc trouble. 

Can anyone help me to find out where is the problem that full gc can't reduce CMS Old Gen? Is it because there are too many objects in heap can't be recycled? I think review the table scheme designing and add new nodes into cluster is a good idea, but I still want to know if there is any other reason causing this trouble.

Thanks,
Philo Yang

Or Sher | 18 Dec 16:17 2014
Picon

Replacing nodes disks

Hi all,

We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. 
We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas.

What do you think should be the procedure here?

I'm guessing it should be something like this but I'm pretty sure it's not enough.
1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency.
What do you say?

Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would take to rebuild a 250G data node?

Thanks in advance,
Or.

--
Or Sher

Gmane