Atul Saroha | 7 Feb 12:18 2016

Re: How to migrate MYSQL to cassandra -special case in scoop+datastax

Hi Sachin.

We can index on collection (either key or value in a map) and can use it in query. But my concern is to migrate the data in this form from a MYSQL table. Using some multi-threaded script and java program is an option but scoop approach would have been better.

Moreover, I am even concerned about this statement mentioned in below link:

Keep collections small to prevent delays during querying because Cassandra reads a collection in its entirety. The collection is not paged internally.




Thanks and Regards
---------------------------------------------------------------------------------------------------------------------
Atul Saroha

Sr. Software Engineer
M: +91 8447784271 T: +91 124-415-6069 EXT: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Sat, Feb 6, 2016 at 5:42 AM, Sachin Uplaonkar <up.sachi <at> gmail.com> wrote:
Hello Atul,


I tried below code:

/**CREATE TABLE demo.phone (
details MAP<text, text>,
initials text,
time timeuuid,
PRIMARY KEY (initials, time)
);**/

//insert into phone (initials, time, details) values ('su', now(), {'Sachi': '408-797-9600'});
//insert into phone(initials, time, details) values ('su', now(), {'Sam': '555-121-2222'});
//insert into phone (initials, time, details) values ('NG', now(), {'National Geographic': '404-509-2000'});
//create index IF NOT EXISTS map_index on phone (details);

select * from phone where details contains key 'Sam';





On Fri, Feb 5, 2016 at 9:47 AM, Sachin Uplaonkar <up.sachi <at> gmail.com> wrote:
Adding to the above:

- Search on Collection types is not yet supported in CQL. [Hence, if we create the table as per your model, we will not be able to query on it]

Refer: http://www.datastax.com/dev/blog/cql3_collections [Section: Things to Know]

On Fri, Feb 5, 2016 at 8:25 AM, Sachin Uplaonkar <up.sachi <at> gmail.com> wrote:
Hello Atul,

I am new to Cassandra but below is the thing which I tried:

** Not sure if this was intended by you **

CREATE TABLE demo.phone (
details MAP<text, text>,
initials text,
time timeuuid,
PRIMARY KEY (initials, time)
);

insert into phone (initials, time, details) values ('su', now(), {'Sachi': '408-797-9600'});
insert into phone(initials, time, details) values ('su', now(), {'Sam': '555-121-2222'});


select * from phone;


Results: 



On Thu, Feb 4, 2016 at 11:55 PM, Atul Saroha <atul.saroha <at> snapdeal.com> wrote:
MySQL Table:
User | PhoneNumber
--------------------------------
raman    1234
bhuvan  2345
atul       5678

Using single Collection column and map collection :

Phone Number
---------------------------
Map <raman,1234>
<bhuvan,2345>
<atul,5678>

Want to transform data in this way, i.e. key is mapped from value of "user" column in the map.

Any help will be appreciated.



---------------------------------------------------------------------------------------------------------------------
Atul Saroha

Sr. Software Engineer
M: +91 8447784271 T: +91 124-415-6069 EXT: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA



--
Regards,
Sachin S Uplaonkar.



--
Regards,
Sachin S Uplaonkar.



--
Regards,
Sachin S Uplaonkar.

Will Hayworth | 7 Feb 00:28 2016
Gravatar

Back to the futex()? :(

tl;dr: other than CAS operations, what are the potential sources of lock contention in C*?

Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a small cluster for production, and I've been seeing something weird. For background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday, that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because while backloading historical data I had been seeing awful throughput (20K op/s at CL.ONE). I'd read through Al Tobey's amazing C* tuning guide once or twice before but this time I was careful and fixed a bunch of defaults that just weren't right, in cassandra.yaml/JVM options/block device parameters. Folks on IRC were super helpful as always (hat tip to Jeff Jirsa in particular) and pointed out, for example, that I shouldn't be using DTCS for loading historical data--heh. After changing to LTCS, unbatching my writes* and reserving a CPU core for interrupts and fixing the clocksource to TSC, I finally hit 80K early this morning. Hooray! :)

Now, my question: I'm still seeing a ton of blocked processes in the vmstats, anything from 2 to 9 per 10 second sample period--and this is before EBS is even being hit! I've been trying in vain to figure out what this could be--GC seems very quiet, after all. On Al's page's advice, I've been running strace and, indeed, I've been seeing tens of thousands of futex() calls in periods of 10 or 20 seconds. What eludes me is where this lock contention is coming from. I'm not using LWTs or performing CAS operations of which I'm aware. Assuming this isn't a red herring, what gives?

Sorry for the essay--I just wanted to err on the side of more context--and thank you for any advice you'd like to offer,
Will

P.S. More background if you'd like--I'm running on Amazon Linux 2015.09, using jemalloc 3.6, JDK 1.8.0_65-b17. Here is my cassandra.yaml and here are my JVM args. I realized I neglected to adjust memtable_flush_writers as I was writing this--so I'll get on that. Aside from that, I'm not sure what to do. (Thanks, again, for reading.)

* They were batched for consistency--I'm hoping to return to using them when I'm back at normal load, which is tiny compared to backloading, but the impact on performance was eye-opening.
___________________________________________________________
Will Hayworth
Developer, Engagement Engine
Atlassian



Richard L. Burton III | 5 Feb 21:18 2016
Picon
Gravatar

Cassandra + OpsWorks

Although I have Chef + Knife Solo seeing up my servers, I'm very curious if anyone is using Cassandra + OpsWorks.

The reason why I ask, it seems like a very good solution to setup servers in AWS and also scaling it out. 

--
-Richard L. Burton III
<at> rburton
Dikang Gu | 5 Feb 21:00 2016
Picon
Gravatar

Questions about Counter updates.

Hi there,

I have a cluster which has a lot of counter updates. My question is that when I run the `nodetool tpstats`, I see a lot of MutationStage actions but no CounterMutationStage stats. I'm wondering is it normal or is it something I should worry about?

I'm using Cassandra 2.1.8 and the C driver.

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
CounterMutationStage              0         0              0         0                 0
ReadStage                         0         0             25         0                 0
RequestResponseStage              0         0             21         0                 0
MutationStage                     0         0       19284070         0                 0

Thanks

--
Dikang

Atul Saroha | 5 Feb 08:55 2016

How to migrate MYSQL to cassandra -special case in scoop+datastax

MySQL Table:
User | PhoneNumber
--------------------------------
raman    1234
bhuvan  2345
atul       5678

Using single Collection column and map collection :

Phone Number
---------------------------
Map <raman,1234>
<bhuvan,2345>
<atul,5678>

Want to transform data in this way, i.e. key is mapped from value of "user" column in the map.

Any help will be appreciated.



---------------------------------------------------------------------------------------------------------------------
Atul Saroha

Sr. Software Engineer
M: +91 8447784271 T: +91 124-415-6069 EXT: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
Debraj Manna | 5 Feb 05:26 2016
Picon

Restart Cassandra automatically

Hi,

What is the best way to keep cassandra running? My requirement is if for some reason cassandra stops then it should get started automatically.

I tried to achieve this by adding cassandra to supervisord. My supervisor conf for cassandra looks like below:-

[program:cassandra] command=/bin/bash -c 'sleep 10 && bin/cassandra' directory=/opt/cassandra/ autostart=true autorestart=true startretries=3 stderr_logfile=/var/log/cassandra_supervisor.err.log stdout_logfile=/var/log/cassandra_supervisor.out.log

But it does not seem to work properly. Even if I stop cassandra from supervisor then the cassandra process seem to be running if I do
ps -ef | grep cassandra

I also tried the configuration mentioned in this question but still no luck.

Can someone let me know what is the best way to keep cassandra running on production environment?

Environment
  • Cassandra 2.2.4
  • Debian 8

Thanks,



Jeff Ferland | 4 Feb 22:27 2016

System block cache vs. disk access and metrics

We struggled for a while to upgrade due to an out of order SStables bug. During this time, load continued to
increase and we were eventually accessing the disk a lot. When we could finally expand the cluster, the
went down by an order of magnitude. This leads me to conclude that we had blown out the block cache.

Linux unfortunately doesn’t have a metric for tracking the block cache hit ratio. There is system tap
which may be the way we have to go, but I’m wondering about Cassandra counters as well. If I can track the
ratio of SSTable reads vs. actual disk reads, I’ll have sufficiently good enough data to not spend my
time writing up a system tap script.

This brings about the following specific questions:
 * Which if any metric corresponds to the number of queries made by clients
 * Which if any metric corresponds to the number of sstable reads performed

Metrics such as cassandra.ReadCount aren’t perfectly clear as to what they do and don’t indicate, so
feedback on that before I go on another source code reading adventure is welcomed.

Cheers all,
-Jeff
Flavien Charlon | 4 Feb 21:53 2016
Picon

"Not enough replicas available for query" after reboot

Hi,

My cluster was running fine. I rebooted all three nodes (one by one), and now all nodes are back up and running. "nodetool status" shows UP for all three nodes on all three nodes:

--  Address        Load       Tokens  Owns    Host ID                               Rack
UN  xx.xx.xx.xx    331.84 GB  1       ?       d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
UN  xx.xx.xx.xx    317.2 GB   1       ?       de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
UN  xx.xx.xx.xx  291.61 GB  1       ?       b489c970-68db-44a7-90c6-be734b41475f  RAC1

However, now the client application fails to run queries on the cluster with:

Cassandra.UnavailableException: Not enough replicas available for query at consistency Quorum (2 required but only 1 alive)

The replication factor is 3. I am running Cassandra 2.1.7.

Any idea where that could come from or how to troubleshoot this further?

Best,
Flavien
aeljami.ext | 4 Feb 11:56 2016

Atomic Batch: Maintaining consistency between tables

Hello,

 

I read in the documentation DataStax :

 

“The coordinator node might also need to work hard to process a logged batch while maintaining consistency between tables”

 

It means that the coordinator send the mutations  to all replica nodes and waits for RF acknowledgements ? or only one node of set of replica ?

 

thx

_________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Bhuvan Rawal | 4 Feb 11:37 2016
Picon
Gravatar

Want inputs about super column family vs map/list

Hi All,

There are two ways to achieve this : 
1. Using super column family: 

raman | atul | bhuvan
---------------------------
1234  | 5678 | 2345

OR
Using single Collection column :
Phone Number
---------------------------
Map <raman,1234>
<bhuvan,2345>
<atul,5678>

I would like to know which approach would be better in the below use cases :
  1. First Case - Frequent complete map Update
  2. Second Case - Frequent complete map Read
  3. Frequent Update only for specific fields.
  4. Frequent Read only for specific fields.
Also is there any way to configure cassandra-stress tool for testing this scenario?

Thanks & Regards,
Bhuvan
Edouard COLE | 4 Feb 09:36 2016

Duplicated key with an IN statement

Hello,

I just discovered this, and I think this is weird:

ed <at> debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
            ...     key int,
            ...     value int,
            ...     PRIMARY KEY (key)
            ... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-----+-------
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-----+-------
 123 |   456
 123 |   456 <----- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE


Gmane