How to guarantee consistency between counter and materialized view?
2014-03-11 23:30:52 GMT
On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang <shaochuan.wang <at> bloomreach.com> wrote:So, does anyone know how to do "describing the splits" and "describing the local rings" using native protocol?
For a ring description, you would do something like "select peer, tokens from system.peers". I'm not sure about describe_splits().
Also, cqlsh uses python client, which is talking via thrift protocol too. Does it mean that it will be migrated to native protocol soon as well?
Hi, I am doing a presentation at Big Data Boston about how people are bridging the gap between OLTP and ingest side databases and their analytic storage and queries. One class of systems I am talking about are things like HBase and DSE that let you run map reduce against your OLTP dataset. I remember reading at some point that DSE allows you to provision dedicated hardware for map reduce, but the docs didn't seem to fully explain how that works.I looked at http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaStrt.html My question is what kind of provisioning can I do? Can I provision dedicated hardware for just the filesystem or can I also provision replicas that are dedicated to the file system and also serving reads for map reduce jobs. What kind of support is there for keeping OLTP reads from hitting the Hadoop storage nodes and how does this relate to doing quorum reads and writes? Thanks, Ariel
Hey all - My company is working on introducing a configuration service system to provide cofig data to several of our applications, to be backed by Cassandra. We're already using Cassandra for other services, and at the moment our pending design just puts all the new tables (9 of them, I believe) in one of our pre-existing keyspaces. I've got a few questions about keyspaces that I'm hoping for input on. Some Google hunting didn't turn up obvious answers, at least not for recent versions of Cassandra. 1) What trade offs are being made by using a new keyspace versus re-purposing an existing one (that is in active use by another application)? Organization is the obvious answer, I'm looking for any technical reasons. 2) Is there any per-keyspace overhead incurred by the cluster? 3) Does it impact on-disk layout at all for tables to be in a different keyspace from others? Is any sort of file fragmentation potentially introduced just by doing this in a new keyspace as opposed to an exiting one? 4) Does it add any metadata overhead to the system keyspace? 5) Why might we *not* want to make a separate keyspace for this? 6) Does anyone have experience with creating additional keyspaces to the point that Cassandra can no longer handle it? Note that we're *not* planning to do this, I'm just curious. Cheers, Martin
Hi All, I've faced an issue with cassandra 2.0.5. I've 6 node cluster with random partitioner, still using tokens instead of vnodes. Cause we're changing hardware we decide to migrate cluster to 6 new machines and change partitioning options to vnode rather then token-based. I've followed instruction on site: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html and started cassandra on 6 new nodes in new DC. Everything seems to work correctly, nodes were seen from all others as up and normal. Then i performed nodetool repair -pr on the first of new nodes. But process falls into infinite loop, sending/receiving merkle trees over and over. It hangs on one very small KS it there were no hope it will stop sometime (process was running whole night). So I decided to stop the repair and restart cass on this particular new node. after restart 'Ive tried repair one more time with another small KS, but it also falls into infinite loop. So i decided to break the procedure of adding datacenter, remove nodes from new DC and start all from scratch. After running removenode on all new nodes I've wiped data dir and start cassandra on new node once again. During the start messages "org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=98bb99a2-42f2-3fcd-af67-208a4faae5fa" appears in logs. Google said, that they may mean problems with schema versions consistency, so I performed describe cluster in cassandra-cli and i get: Cluster Information: Name: Metadata Cluster Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 76198f8b-663f-3434-8860-251ebc6f50c4: [220.127.116.11] f48d3512-e299-3508-a29d-0844a0293f3a: [18.104.22.168] 16ad2e35-1eef-32f0-995c-e2cbd4c18abf: [22.214.171.124] 72352017-9b0d-3b29-8c55-ed86f30363c5: [126.96.36.199] 7f1faa84-0821-3311-9232-9407500591cc: [188.8.131.52] 85cd0ebc-5d33-3bec-a682-8c5880ee2fa1: [184.108.40.206] So now I have 6 diff schema version for cluster. But how it can happened? How can I take my cluster to consistent state? What did I wrong during extending cluster, so nodetool falls into infinite loop? At the first sight data looks ok, I can read from cluster and I'm getting expected output. best regards Aleksander
As I understand it, even though a quorum write fails, the data is still (more than likely) saved and will become eventually consistent through the well known mechanisms. I have a case where I would rather this not happen--where I would prefer that if the quorum write fails, that data NEVER becomes consistent, and the old values remain. After a bit of pondering, I came up the idea of simply making my write a conditional update based on a previous value. In my use case, I will not be contending with any other writes of the same primary key, and this write operation is rare in the grand scheme of things. Using this approach, the desired effect is that if the write fails, it will not eventually happen without the app's knowledge. Is this approach sound? If so, it sounds like a really cool potential addition to CQL like: UPDATE tab SET col=? WHERE key=? AUTHORITATIVE Thoughts? Wayne
How do I check for NULL values in CQL3?
I am trying to write a CQL equivalent for below SQL:
SELECT * FROM table1 WHERE col2 IS NULL;
While inserting data into C*, CQL won’t let me insert NULL, I have to pass ‘’ for Strings and 0 for integers.
I have ‘’ and 0s as valid data so that’s conflicting with fillers I have to use instead of NULL.
DEKA Research & Development
340 Commercial St Manchester, NH 03101
P: 603.666.3908 extn. 6504 | C: 603.718.9676
This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.
> You should be able to achieve what you're looking for with a trigger vs. a modification to the core of Cassandra.
Well, good point.
It leads to the question: (a) are triggers executed on all (local+remote) coordinator nodes (and then, N DC => N coordinator nodes => N executions of the triggers) ?
(b) Or are triggers executed only on the first coordinator node, and not the (next/remote DC) coordinator nodes ?
My opinion is (b), and in that case, triggers won’t do the job.
(b) would make sense, because the first coordinator node would augment original row mutations and propagate them towards other coordinator nodes. Then, no need to execute triggers on other (remote) coordinator nodes.
Is there somebody knowing about trigger execution : is it (a) or (b) ?
On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique <dominique.devito <at> thalesgroup.com> wrote:
> On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
> > If I update a data on DC1, I just want apps "connected-first" to DC2
> > to be informed when this data is available on DC2 after replication.
> If I run a SELECT, I'm going to receive the latest data per the read conditions (ONE, TWO, QUORUM), regardless of location of the client connection. If using > network aware topology, you'll get the most current data in that DC.
> > When using Thrift, one way could be to modify CassandraServer class,
> > to send notification to apps according to data coming in into the
> > coordinator node of DC2.
> > Is it "common" (~ the way to do it) ?
> > Is there another way to do so ?
> > When using CQL, is there a precise "src code" place to modify for the
> > same purpose ?
> Notifying connected clients about random INSERT or UPDATE statements that ran somewhere seems to be far, far outside the scope of storing data. Just configure your client to SELECT in the manner that you need.
> I may not fully understand your problem and could be simplifying things in my head, so feel free to expand.
First of all, thanks for you answer and your attention.
I know about SELECT.
The idea, here, is to avoid doing POLLING regularly, as it could be easily a performance nightmare.
The idea is to replace POLLING with PUSH, just like in many cases like SEDA architecture, or CQRS architecture, or continuous querying with some data stores.
So, following this PUSH idea, it would be nice to inform apps connected to a preferred DC that some new data have been replicated, and is now "available".
I hope it's clearer.