DE VITO Dominique | 1 Aug 11:53 2014

question about commitlog segments and memlocking

Hi,

 

The instruction « CLibrary.tryMlockall(); » is called at the very beginning of the setup() Cassandra method.

So, the heap space is memlocked in memory (if OS rights are set).

 

“mlockall()” is called with “MCL_CURRENT” : “MCL_CURRENT  Lock all pages currently mapped into the process's address space.”

 

So, AFAIU(nderstand), the commitlog segments (or other off-heap structures) are NOT memlocked, and may be swapped.

 

Is it also your understanding ?

 

If true, why not using “mlockall(MCL_FUTURE)” instead, or calling mlocka() after commitlog segments allocation ?

 

Thanks.

 

Regards,

Dominique

 

KZ Win | 1 Aug 11:46 2014

how do i know if nodetool repair is finished

I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I change rep factor to 2 using the following command in cqlsh

ALTER KEYSPACE "mykeyspace" WITH REPLICATION =   { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

I then tried to run recommended "nodetool repair" after doing this type of alter.

The problem is that this command sometimes finishes very quickly. When it does finishes like that it will normally say 'Lost notification...' and exit code is not zero.

So I just repeat this 'nodetool repair' until it finishes without error. I also check that 'nodetool status' reports expected disk space for each node. (with rep factor 1, each node has say about 7GB each and I expect after nodetool repair that each is 14GB each assuming no cluster usage in the mean time)

Is there a more correct way to determine that 'nodetool repair' is finished in this case?

Rahul Neelakantan | 31 Jul 22:44 2014
Picon

Authentication for Mx4j

Does anyone know how to enable basic authentication of MX4J with Cassandra? Mx4j supports it but not sure
how to pass the variables to enable it. I was able to set the listen address and port for the http server, but
can't get authentication to work.

Rahul Neelakantan
Philo Yang | 31 Jul 19:44 2014
Picon

select many rows one time or select many times?

Hi all,

I have a cluster of 2.0.6 and one of my tables is like this:
CREATE TABLE word (
  user text,
  word text,
  flag double,
  PRIMARY KEY (user, word)
)

each "user" has about 10000 "word" per node. I have a requirement of selecting all rows where user='someuser' and word is in a large set whose size is about 1000 . 

In C* document, it is not recommended to use "select ... in" just like:

select from word where user='someuser' and word in ('a','b','aa','ab',...) 

So now I select all rows where user='someuser' and filtrate them via client rather than via C*. Of course, I use Datastax Java Driver to page the resultset by setFetchSize(1000).  Is it the best way? I found the system's load is high because of large range query, should I change to select for only one row each time and select 1000 times?

just like:
select from word where user='someuser' and word = 'a';
select from word where user='someuser' and word = 'b';
select from word where user='someuser' and word = 'c';
.....

Which method will cause lower pressure on Cassandra cluster?

Thanks,
Philo Yang

Mark Reddy | 31 Jul 12:30 2014

Re: `system` keyspace replication

Hi Jens,

The system keyspace is configured with LocalStrategy, this strategy only stores data on the the local node. This strategy is reserved for internal use only and is used for other things such as SecondaryIndexes.

You cannot change the replication factor of the 'system' keyspace. If you attempt to do so, the command will not succeed and will return the following message: "Bad Request: Cannot alter system keyspace"


Mark


On Thu, Jul 31, 2014 at 10:58 AM, Jens Rantil <jens.rantil <at> tink.se> wrote:
Hi,

Datastax has a documentation page about configuring replication[1]. It mentions a couple of system keyspaces that they recommend increasing replication for. However, it does not mention the `system` keyspace.

Question: Is it recommended to increase replication factor for the `system` keyspace for production system?


Thanks,
Jens

Jens Rantil | 31 Jul 11:58 2014
Picon

`system` keyspace replication

Hi,

Datastax has a documentation page about configuring replication[1]. It mentions a couple of system keyspaces that they recommend increasing replication for. However, it does not mention the `system` keyspace.

Question: Is it recommended to increase replication factor for the `system` keyspace for production system?


Thanks,
Jens
Akshay Ballarpure | 31 Jul 08:45 2014

Cassandra - Pig integration

Hello,
I am trying to integrate cassandra into Hadoop and PIG and trying to load CSV file into Cassandra using PIG Script. Can someone help ?

root <at> hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat pigCasandra.pig
data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray, value1: chararray, value2: int);
data_to_insert = FOREACH data GENERATE TOTUPLE( TOTUPLE('row_id',row_id) ), TOTUPLE(value1, value2);
STORE data_to_insert INTO 'cql://myschema/example?output_query=update example set value1 <at> #,value2 <at> #' USING CqlStorage();



root <at> hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# /home/hduser/pig/pig-0.13.0/bin/pig pigCasandra.pig
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/yarn/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/apache-cassandra-2.0.9/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hduser/yarn/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled sta       ck guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2014-07-31 17:38:00,255 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java c       lasses where applicable
2014-07-31 17:38:00,398 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-07-31 17:38:00,484 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.a       ddress
2014-07-31 17:38:00,484 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:00,484 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000
2014-07-31 17:38:01,431 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,557 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,609 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/datastax/driver/core/policies/LoadBalancing       Policy
Details at logfile: /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log

Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure <at> tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.        IT Services
                       Business Solutions
                       Consulting
____________________________________________

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

Rahul Neelakantan | 31 Jul 03:04 2014
Picon

Question about Vnodes

Given the issue with repairs and Vnodes (currently expected to be fixed with the 3.0 release) I am
considering reducing the number of tokens per node. One of my clusters has 8 nodes in it with 256 tokens per
node. The main keyspace on it has 40+ column families and nodetool repair takes extremely long to complete
on that keyspace.

I have two questions with respect to this
1) Is there a way/formula to determine an appropriate number of tokens per node. I am considering going as
low as 32 or even 16 tokens per node.
2) What is the recommended procedure for reducing the number of Vnodes? Do I just reduce it in the
cassandra.yaml and start removing tokens via nodetool?

Rahul Neelakantan
ankit tyagi | 30 Jul 14:50 2014
Picon

Error while converting data from sstable to json with sstable2json

Hi,

I am using sstable2json to convert data into json from sstable. it gives me data in below format.

{"key": "000d5549443030303030303738353000000639323765616400000a524541445355444f303100","columns": [["1406126067358:8:","",1406126067369000], ["1406126067358:8:errormessage","53cfc7f3",1406126067369000,"d"], ["1406126067358:8:value","VENDOR_PRODUCT_PERSISTENCE_COMPLETED",1406126067369000], ["1406126217012:14:","",1406126260468000], ["1406126217012:14:errormessage","UID0000007850for vendorCode : 927ead\t and vsku : READSUDO01\t is overridden by uploadid : UID0000007851",1406126260468000], ["1406126217012:14:value","VENDOR_PRODUCT_OVERWRITTEN",1406126260468000]]}

key is given in hexadecimal from. when i convert this into ascii code this key gives me below string 
UID0000007850[0][0][6]927ead[0][0][10]READSUDO01[0]

but when I retrieve this with cassandra-cli it gives me below row key
UID0000007850:927ead:READSUDO01 

I want to  know which encoding is being used to store the data in cassandra. how can I decode this hexadecimal string into proper ascii string like cassandra-cli.

Regards,
Ankit Tyagi
 

Parag Patel | 30 Jul 12:41 2014

dropping secondary indexes

Hi,

 

I’ve noticed that our datamodel has many unnecessary secondary indexes. Are there a recommended procedure to drop a secondary index on a very large table?  Is there any sort of repair/cleanup that should be done after calling the DROP command?

 

Thanks,

Parag

Parag Patel | 30 Jul 04:15 2014

bootstrapping new nodes on 1.2.12

Hi,

 

It’s taking a while to boostrap a 13th node into a 12 node cluster.  The average node size is about 1.7TB.  At the beginning of today we were close to .9TB on the new node and 12 hours later we’re at 1.1TB.  I figured it would have finished by now because when I was looking on OpsCenter, there were 2 transfers remaining.  1 was at 0% and the other was at 2%.  I look again now and those same nodes haven’t progressed all day.  Instead I see 9 more transfers (some of which are progressing).

 

1)      Would anyone be able to help me interrupt this information from OpsCenter? 

2)      Is there anything I can do to speed this up?

 

Thanks,

Parag

 


Gmane