Zhang, Charles | 3 May 20:09 2016

MigrationManager.java:164 - Migration task failed to complete

I have seen a bunch of them in the log files of some newly joined nodes. I did a search in google and it seems increasing the countdown latch timeout can solve this problem. But I assume it only resolves it for future nodes when joining happens? For the existing nodes, anything needs to be done?

Zhang, Charles | 3 May 19:09 2016

Decommissioned node shows up in the gossip log

I decommissioned a seed node from the cluster but now it still shows up in the log files but does not show up in “nodetool status”. The steps I took to decommission this seed node was first I added another node to the seed node list and remove this node from the seed node list and then restarted all nodes, and then I used “nodetool decommission” to remove the seed node (which is already not a seed node anymore by this time because I have deleted it from the seed node list). Now when I add a node to the cluster, I can see things like this in the log file:

 

INFO  [GossipStage:1] 2016-05-03 12:47:11,010 Gossiper.java:1010 - InetAddress /10.240.131.52 is now DOWN

INFO  [GossipStage:1] 2016-05-03 12:47:11,013 StorageService.java:2207 - Removing tokens [-1022524973634126820, -1080110197540691971, -1091685779109535219, -1215861550002479909, -1230125663037298110, -1276347863826680388, -1277321492456111075, -1301927496688170836, -1318545134739242389, -1322771136875660408, -1378340458058304278, -1408100110415631237, -1530631940375897404, -1535071782926085329, -1582947345179885144, -16012799486150337, -1696192817763838199, -1728191989986713155, -1745583338734596530, -1826663257009694585, -1851642471126916817, -1857538886064358546, -1901715377875763790, -1932177981591882421, -199072269576364247, -207608666599084705, -2184481684522309754, -2204020999427045206, -2205085893777233579, -2270561486823228034, -2308298192046286704, -235429323779874179, -2463024446587312335, -2495920187667193798, -2500756743826875052, -2506627651269234078, -2514401267607374003, -2545039589436377179, -2634503648039799520, -2646537028163124225, -2683738978574645155, -2807184187110003759, -2867402698030280784, -3114050892122318894, -3206044372292017688, -3281321558107483648, -3385749857713245200, -3416161302882661647, -3506129882816214786, -3513065641380195991, -3535041040510847711, -3613958158418786329, -3648155032806486758, -3665748511643959214, -381185030151387503, -3897484925082811126, -3938608392271975389, -3941659677176701884, -3947462638823321096, -4000556449482859203, -4110196499985462141, -4158458444329621178, -4227086524280114158, -4243909747517318822, -4280784900694217230, -4404716672756332652, -4455430621398252312, -4538061491874241556, -4572589909790935797, -4850346756544186378, -4952456886854152424, -5084536945116622949, -5326752651357993429, -5349130774372022115, -5460653359296793773, -5620302193040734534, -5623709120594154839, -5885389525591890638, -5949589155008107907, -5966013525235445845, -6026141079407570616, -6058868359575464435, -6077630467595631247, -6077845078374374077, -6138431646664499820, -6180882520195395456, -6243210077810081945, -6370284996672456580, -6415049304495129802, -6502707320839506073, -6803145485914681660, -6805877741054487708, -6926104547940671062, -6951403149841557452, -6953035076415327031, -6967464526766340870, -7011060314535843861, -7064863849614017322, -714981435304170064, -7157502981616220702, -7258779025703812914, -7360270914479075946, -73610861802671316, -7563240572942157514, -7635081293483408701, -7798006211363305085, -7813955538719900941, -786373677098356483, -8145051135140174940, -8176945151026682869, -8190382813904504028, -8272254174290282022, -8326684320313230680, -8481234483187517136, -8482289498657787899, -8565844176707815996, -8569487821947184157, -8798368038472556220, -8809438431651729380, -8813359577419549606, -8869977551074344265, -8895973234315752820, -8907908280791280135, -8908127052185996645, -8935928854496370061, -9015683177087077934, -9048056285157467189, -9110697723266307118, -981226497198399669, 1193342841401851448, 1230054248428394457, 1502987728312107227, 1637579501555626797, 1662468220619718905, 1679119062145242310, 1716958612112885132, 1746111140545268100, 1831765078257867905, 1837630369692254413, 1866757535234365933, 1980430493801992453, 2162782596967223630, 2267937126038851741, 2380570386757196663, 2445644030436787439, 2470624736744405884, 2535946784572437079, 2586884334596326732, 261609565442228971, 2807653039695258495, 2820914298989266652, 2850900328012773185, 2856965311103931071, 2878299073894089909, 293663570529846026, 2956982251255851218, 3004636240207923383, 3101507369252962007, 3127101442974941655, 3158812977721178300, 3171231578612039887, 3195192564355682313, 3336607934515052432, 3343357098855461268, 3353820331372189514, 34251354331737275, 3436913366576384630, 3548175524364212676, 3848342826573950162, 3851303454317024860, 3952857562650343132, 3990151331571689362, 4018170568179076035, 4112557911914814333, 4226750948868360216, 4279483569511255118, 4294178225463448329, 4309249284353877565, 4421348708243454111, 4421511311707699441, 4452305848308117645, 4593652086244012593, 4738143394265363235, 4833752378621606367, 4875304341660013392, 4884031847996493712, 4976496196225414573, 5031371703499035791, 5127384657519231924, 5139891720119780283, 5314562593199080390, 5316612914890745425, 5387115956703859037, 5444853543757322231, 5522844729162057439, 5671052167577743873, 5756163457846314224, 5767535292654078271, 5768242158938098694, 5808105469522749152, 5834119627896273386, 5868847068643124331, 5894523213718053689, 5901084982008030187, 5969133132258951572, 5992456767231788605, 6068249796826621917, 6088196406347452820, 6130398062904147944, 626484591742388608, 6308434312037484469, 6339160504017876713, 6355666313243093645, 6562282736963199813, 6565521999605813437, 6576341382108600989, 6594910241855950023, 6657010037537452225, 6732264829847660872, 6750814542377717658, 6752841711698715847, 6753878602599511786, 676331118332583504, 6998945362730848079, 7084036693238869952, 7123724899150065453, 7196754593644784128, 7228947848951386141, 7307770750307662007, 7423504143497361658, 7459372710602518943, 7461980712111448886, 7490521993083116151, 7533000471778287703, 7539331611294816376, 7575124648716146082, 7588493168230482795, 7745663537548315506, 7755819622826291311, 7811819103680710573, 7837217903334494952, 7910337028040730748, 8035987416996982100, 8141008003319228604, 8460970302985166145, 8704724839068075193, 8708181464490295602, 8719281186367503476, 8743407730212179670, 8782996992749946235, 8848834787326682329, 8913871967919712952, 9044250345847868181, 9184734792478964742, 955142821741666815, 998590089428154995] for /10.240.131.52

 

10.240.131.52 is the node I decommissioned and now it does not show up in “nodetool status”…

 

What does this issue impact (assuming this is an issue)? And how should I fix it?

 

Charles

Kai Wang | 3 May 18:47 2016
Picon

Bloom filter memory usage disparity

Hi,

I have a table on 3-node cluster. I notice bloom filter memory usage are very different on one of the node. For a given table, I checked CassandraMetricsRegistry$JmxGauge.[table]_BloomFilterOffHeapMemoryUsed.Value. 2 of 3 nodes show 1.5GB while the other shows 2.5 GB.

What could be the reason?

That table is using LCS.
bloom_filter_fp_chance=0.1
That table has about 16M keys and 140GB of data.

Thanks.
Oleksandr Petrov | 3 May 10:21 2016
Picon
Gravatar

Re: tombstone_failure_threshold being ignored?

If I understand the problem correctly, tombstone_failure_theshold is never reached because the ~2M objects might have been collected for different queries running in parallel, not for one query. Every separate query never reached the threshold although all together they contributed to the OOM.

You can read a bit more about the anti-patterns (particularly, ones related to workloads generating lots of tombstones): http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

You can also try running more frequent repair/compacts. Although I'd look closer on the read queries first, possibly with tracing on, and check parallelism for those. Maybe decrease warn level for tombstone thresholds to understand where the bounds are.

On Thu, Apr 28, 2016 at 7:23 PM Rick Gunderson <rgunderson <at> ca.ibm.com> wrote:
We are running Cassandra 2.2.3, 2 data centers, 3 nodes in each. The replication factor per datacenter is 3. The Xmx setting on the Cassandra JVMs is 4GB.

We have a workload that generates loots of tombstones and Cassandra goes OOM in about 24 hours. We've adjusted the tombstone_failure_threshold down to 25000 but we never see the TombstoneOverwhelmingException before the nodes start going OOM.

The table operation that looks to be the culprit is a scan of partition keys (i.e. we are scanning across narrow rows, not scanning within a wide row). The heapdump shows we have a RangeSliceReply containing an ArrayList with 1,823,230 org.apache.cassandra.db.Row objects with a retained heap size of 441MiB.  A look inside one of the Row objects shows an org.apache.cassandra.db.DeletionInfo object so I assume that means the row has been tombstoned.

If all of the 1,823,239 Row objects are tombstoned (and it is likely that most of them are), is there a reason that the TombstoneOverwhelmingException never gets thrown?



Regards,

Rick (R.) Gunderson
Software Engineer
IBM Commerce, B2B Development - GDHA
Phone: 1-250-220-1053
E-mail: rgunderson <at> ca.ibm.com
Find me on:  


1803 Douglas St
Victoria, BC V8T 5C3
Canada


--
Alex
Anubhav Kale | 2 May 18:55 2016
Picon

SS Table File Names not containing GUIDs

Hello,

 

I am wondering if there is any reason as to why the SS Table format doesn’t have a GUID. As far as I can tell, the incrementing number isn’t really used for any special purpose in code, and having a unique name for the file seems to be a better thing, in general.

 

Specifically, this causes some inconvenience when restoring snapshots. Ideally, I would like to restore just the system* keyspaces and boot the node. Then, once the node is taking live traffic copy the SS Tables over and do a DSE restart at the end to load old data.

 

The problem is it is possible to overwrite new data with old files if the file names match. I can’t change the file names of snapshot-ed file to a huge number, because as soon as that file is copied over, C* will use that number in its get-next-number-gen logic potentially causing the same problem for the next snapshot-ed file.

 

How do people usually tackle this ? Is there some easy solution that I am not seeing ?

 

Thanks !

Corry Opdenakker | 2 May 18:04 2016
Picon

In memory code and query executions

Hi all,

Is it possible to execute queries towards an embedded cassandra db whyle bypassing completely the TCP (or IPC) protocol stack?
Apparantly the embedded cassandra is by default accessed using localhost as hostname which will result in an IPC optimized connection I assume.
Is there a way to fully omit the Tcp/ipc stack and execute queries directly in-memory at the cassandra database? preferrably in a (query resultset -> to -> appcode) zero-copy approach.

Cheers, C.
DuyHai Doan | 28 Apr 15:34 2016
Picon
Gravatar

[Announcement] Achilles 4.2.0 releasd

Hello all

 I am pleased to announce the release of Achilles 4.2.0.

 The biggest change is the support for type-safe function calls in the SELECT DSL as well as UDF/UDA declaration in Achilles.

 The generated DSL code enforces the type of each function call so that the parameter types/return type of each function match the one of the column. 


Regards

Duy Hai DOAN
Siddharth Verma | 27 Apr 19:41 2016

Query regarding spark on cassandra

Hi,
I dont know, if someone has faced this problem or not.
I am running a job where some data is loaded from cassandra table. From that data, i make some insert and delete statements.
and execute it (using forEach)

Code snippet:
boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
boolean  insertStatus = connector.openSession().execute(insert).wasApplied();
System.out.println(delete+":"+deleteStatus);
System.out.println(insert+":"+insertStatus);

When i run it locally, i see the respective results in the table.

However when i run it on a cluster, sometimes the result is displayed and sometime the changes don't take place.
I saw the stdout from web-ui of spark, and the query along with true was printed for both the queries.

I can't understand, what could be the issue.

Any help would be appreciated.

Thanks,
Siddharth Verma
Jimmy Lin | 27 Apr 18:14 2016
Picon

how expensive is light weight transaction: if not exists

hi all,
we like to consider using light weight transaction like the following:
begin batch:
update table set x=y where id=A if not exists;
update table set x=y where id=B if not exists;
update table set x=y where id=C if not exists;
update table set x=y where id=D if not exists;
apply batch
(using LOCAL_QUORUM)
I know there is lot of things going on behind the cass light weight transaction, just how much overhead when using "if not exists" ?
Robert Sicoie | 27 Apr 15:21 2016

MX4J support broken in cassandra 3.0.5?

Hi guys,

I'm upgrading from cassandra 2.1 to cassandra 3.0.5 and mx4j support 
seems to be broker. An empty html page is shown:

 > GET / HTTP/1.1
 > Host: localhost:8081
 > User-Agent: curl/7.43.0
 > Accept: */*
 >
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< expires: now
< Server: MX4J-HTTPD/1.0
< Cache-Control: no-cache
< pragma: no-cache
< Content-Type: text/html

This is what I have in cassandra-env.sh
...
MX4J_PORT="-Dmx4jport=8081"
...
And the mx4j-tools.jar is in place.

It worked fine with cassandra 2.1. Is there a new configuration needed 
in 3.0.5?

Any advice?

Thanks,
Robert

In order to protect our email recipients, the Paddy Power Betfair plc group of companies use MessageLabs to
scan all Incoming and Outgoing mail for viruses.  

Paddy Power Betfair may monitor the content of email sent and received for the purpose of ensuring
compliance with its policies and procedures.

Jake Luciani | 26 Apr 16:56 2016
Picon
Gravatar

[RELEASE] Apache Cassandra 2.2.6 released

The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.


Downloads of source and binary distributions are listed in our download
section:


This version is a bug fix release[1] on the 2.2 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/yCpWu7 (CHANGES.txt)
[2]: http://goo.gl/qktJUS (NEWS.txt)


Gmane