Stack | 1 Sep 2011 01:06

Re: HBase 0.90.4 missing from Maven repository?

My fault.  I missed promoting it.  Let me fix.
St.Ack

On Wed, Aug 31, 2011 at 10:52 AM, Terry Siu <Terry.Siu@...> wrote:
> Hi,
>
> Is there a reason why HBase 0.90.4 is not available for download from the Maven repository? I looked at:
>
> https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase/
>
> and the latest version I see is 0.90.3. Or is the above URL which I'm using incorrect? I'm setting up a Maven
project to use 0.90.4 and would hate for others using it to have to manually install 0.90.4 in their local repository.
>
> Thanks!
> -Terry
>

Himanshu Vashishtha | 1 Sep 2011 02:28
Picon
Picon

Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)

Sorry, I missed the fact that you guys were talking about the oome thing
(the exceptions were of sockettimeout)
Can you give the log snippet where it oome'd? I want to explore this use
case :)

You have about 200 regions per server, and each region configured to 500MB
makes it 100GB data per server.
Each Region is considered open when index block of all its StoreFiles are
read; where the default block size of the HFile is 64KB. Having a larger
block size will help in reducing the index size for each StoreFile. As Chris
said, looking RS metrics will give lot of useful info such as
storefileindex, blockcache.
I think that only increasing Region size to 500MB will not reduce memory
footprint (apart from reducing Region split and some entries in '.META.'),
as one has to deal with StoreFiles eventually. Yes, reducing size of
keyvalue co-ordinates will help in limiting its size (I am sure you already
have an optimised schema).

Your gc-log snapshot says that CMS failed to free even 1 byte, and then fall
back on "stop-the-world" gc. This means there are literally no garbage
object in the heap during that time window? Or, maybe your app was heavily
writing concurrently to the RS. Since not even a single byte was freed,
using MSLAB will not help (if you haven't enabled it yet); as it is for
defragmenting the freed space because cms doesn't do any compaction on its
own.

What did you do to sort this error eventually Oleg? Does bumping the RS heap
fixed it? Are you using compression while writing to HBase?

Thanks,
(Continue reading)

Andrew Purtell | 1 Sep 2011 02:35
Picon
Favicon
Gravatar

Re: Coprocessors?

We have both backported coprocessors to our 0.90-ish HBase (FrankenBase?) and use them in production with
security enabled -- HBASE-3025, but more recent code than the patch on the issue. This code ported to HBase
trunk is here: https://github.com/trendmicro/hbase/tree/security

Backporting is definitely nontrivial if you are starting today, I don't recommend it.

Best regards,

        - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

>________________________________
>From: Li Pi <li@...>
>To: user@...
>Sent: Thursday, September 1, 2011 6:04 AM
>Subject: Re: Coprocessors?
>
>None so far. You can backport yourself, but I'm unsure on whether
>coprocessors are ready for production use.
>On Aug 31, 2011 10:41 AM, "Harsh J" <harsh@...> wrote:
>> Hello Frank,
>>
>> Coprocessors would be available in the 0.92 release onwards. I don't think
>a release has been cut for that yet, so you need to use the development
>branch if you'd like to jump into it right now.
>>
>> On 31-Aug-2011, at 8:39 PM, Frank <at>  <at>  wrote:
>>
>>>
(Continue reading)

Andrew Purtell | 1 Sep 2011 06:35
Picon
Favicon
Gravatar

Re: HBase and Cassandra on StackOverflow


> http://www.quora.com/How-does-HBase-write-performance-differ-from-write-performance-in-Cassandra-with-consistency-level-ALL

Thanks, that was what I was referring to earlier in this thread. Now bookmarked.
Comments there from those more knowledgable about Cassandra than I seem to indicate that N=3,W=3,R=1 is
not practical (one commenter I know to be an expert characterizes it as "suicidal"), and the comments in
the collapsed answer indicate there are corner cases known to Cassandra experts where HBase-equivalent
strong consistency cannot be maintained even with that setting.
 
So it seems that claims that Cassandra can provide consistency equivalent to HBase are erroneous.

Best regards,

       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

>________________________________
>From: Gary Helmling <ghelmling@...>
>To: user@...
>Sent: Thursday, September 1, 2011 2:21 AM
>Subject: Re: HBase and Cassandra on StackOverflow
>
>> Since this is fairly off-topic at this point, I'll keep it short. The
>> simple
>> rule for Dynamo goes like this: if (R+W>N && W>=Quorum), then you're
>> guaranteed a consistent result always. You get eventual consistency if
>> W>=Quorum. If W<Quorum, then you can get inconsistent data that must be
>> detected/fixed by readers (often using timestamps or similar techniques).
>> Joe is right, enforcing (W=3, R=1, N=3) on a Dynamo system gives the same
(Continue reading)

sriram | 1 Sep 2011 07:35
Picon

Distributed Cache

Using distributed cache i put a common file in the hdfs.It contains of frequent
files to remove.In the code i converted words in the table into a hashtable and
removed words from other documents if they occur.

The problem is it removes these words for smaller files.If the file size
increases then those words are not removed.

Any reason for what is the problem.

Stuti Awasthi | 1 Sep 2011 14:41
Favicon

Get query in REST for HBASE

Hi Friends,

I am trying to use  rest server and doing a query using : http://localhost:8080/tablename/rowkey

This lists me   following output :

<CellSet><Row key="Z2FnYW5zaA=="><Cell timestamp="1314870712846"
column="aW5mbzphZ2U=">MzI=</Cell><Cell timestamp="1314870952929" column="aW5mbzpuYW1l">Z2FnYW4=</Cell></Row></CellSet>

Though I do receive timestamp, but rest of column familes(info:name and info:age) are not sent as a response.

Am I missing any step ? How can I receive data for column families directly through REST

Thanks & Regards
Stuti Awasthi
Sr Specialist

________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions
presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or
publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
(Continue reading)

Andrew Purtell | 1 Sep 2011 14:51
Picon
Favicon
Gravatar

Re: Get query in REST for HBASE

Because keys in HBase are byte[], the REST interface base-64 encodes row key, column name, and the value if
you choose XML representation. 

> Though I do receive timestamp, but rest of column familes(info:name and 
> info:age) are not sent as a response.

You say you are looking for two values. This result has two values in the row:

            <CellSet><Row key="Z2FnYW5zaA==">
value 1 ->  <Cell timestamp="1314870712846" column="aW5mbzphZ2U=">MzI=</Cell>
value 2 ->  <Cell timestamp="1314870952929" column="aW5mbzpuYW1l">Z2FnYW4=</Cell>
            </Row></CellSet>

So this looks probably correct. You need to run row key, column name, and value through a base 64 decoder, or
choose a binary representation (using an Accept header in the request of either
application/octet-stream, or application/x-protobuf).

Best regards,

        - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

----- Original Message -----
> From: Stuti Awasthi <stutiawasthi@...>
> To: "user@..." <user@...ache.org>
> Cc: 
> Sent: Thursday, September 1, 2011 8:41 PM
> Subject: Get query in REST for HBASE
> 
(Continue reading)

Edward Capriolo | 1 Sep 2011 19:53
Picon

Re: HBase and Cassandra on StackOverflow

On Wed, Aug 31, 2011 at 1:34 AM, Time Less <timelessness@...> wrote:

> Most of your points are dead-on.
>
> > Cassandra is no less complex than HBase. All of this complexity is
> > "hidden" in the sense that with Hadoop/HBase the layering is obvious --
> > HDFS, HBase, etc. -- but the Cassandra internals are no less layered.
> >
> > Operationally, however, HBase is more complex.  Admins have to configure
> > and manage ZooKeeper, HDFS, and HBase.  Could this be improved?
> >
>
> I strongly disagree with the premise[1]. Having personally been involved in
> the Digg Cassandra rollout, and spent up until a couple months ago being in
> part-time weekly contact with the Digg Cassandra administrator, and having
> very close ties to the SimpleGeo Cassandra admin, I know it is a fickle
> beast. Having also spent a good amount of time at StumbleUpon and Mozilla
> (and now Riot Games) I also see first-hand that HBase is far more stable
> and
> -- dare I say it? -- operationally more simple.
>
> So okay, HBase is "harder to set up" if following a step-by-step guide on a
> wiki is "hard,"[2] but it's FAR easier to administer. Cassandra is rife
> with
> cascading cluster failure scenarios. I would not recommend running
> Cassandra
> in a highly-available high-volume data scenario, but don't hesitate to do
> so
> for HBase.
>
(Continue reading)

Srikanth P. Shreenivas | 1 Sep 2011 20:52
Favicon

Tall-Narrow vs. Flat-Wide Tables

Hi,

HBase: The Definitive Guide book's chapter 9 talks about Tall-Narrow vs Flat-wide tables. (http://ofps.oreilly.com/titles/9781449396107/advanced.html)

It seems to propose that Tall-Narrow tables (more rows, less columns) is better design.  One of the issue it
talks about with "Flat-wide" tables (less rows and more columns) is
...
In addition, HBase can only split at row boundaries, which also enforces the recommendation to go with
tall-narrow tables. Imagine you have all emails of a user in a single row. This will work for the majority of
users, but there will be outliers that will have magnitudes of emails more in their inbox. So much so that a
single row could outgrow the maximum file/region size and work against the region split facility.
...

So, my query is that is it a bad idea to have a table as given in above example wherein emails are stored by
adding columns.   I seem to have a similar table in my application, wherein I have a region size of 1GB and cell
value of 10KB.  So, will I run into region-split issue mentioned above after 100000 (1GB / 10KB = 100000)  columns.

Regards,
Srikanth

________________________________

http://www.mindtree.com/email/disclaimer.html

Ryan Rawson | 1 Sep 2011 21:12
Picon
Gravatar

Re: HBase and Cassandra on StackOverflow

On Thu, Sep 1, 2011 at 10:53 AM, Edward Capriolo
<edlinuxguru@...> wrote:
> On Wed, Aug 31, 2011 at 1:34 AM, Time Less <timelessness@...> wrote:
>
>> Most of your points are dead-on.
>>
>> > Cassandra is no less complex than HBase. All of this complexity is
>> > "hidden" in the sense that with Hadoop/HBase the layering is obvious --
>> > HDFS, HBase, etc. -- but the Cassandra internals are no less layered.
>> >
>> > Operationally, however, HBase is more complex.  Admins have to configure
>> > and manage ZooKeeper, HDFS, and HBase.  Could this be improved?
>> >
>>
>> I strongly disagree with the premise[1]. Having personally been involved in
>> the Digg Cassandra rollout, and spent up until a couple months ago being in
>> part-time weekly contact with the Digg Cassandra administrator, and having
>> very close ties to the SimpleGeo Cassandra admin, I know it is a fickle
>> beast. Having also spent a good amount of time at StumbleUpon and Mozilla
>> (and now Riot Games) I also see first-hand that HBase is far more stable
>> and
>> -- dare I say it? -- operationally more simple.
>>
>> So okay, HBase is "harder to set up" if following a step-by-step guide on a
>> wiki is "hard,"[2] but it's FAR easier to administer. Cassandra is rife
>> with
>> cascading cluster failure scenarios. I would not recommend running
>> Cassandra
>> in a highly-available high-volume data scenario, but don't hesitate to do
>> so
(Continue reading)


Gmane