James Estes | 28 Jul 19:07 2015

Data loss after split in 0.98.12


We've been running with HBase 0.98.12 and Hadoop 2.6.0* for about 3 months
now with really no issues in 4 clusters. However, recently we've been
seeing some issues. I'm not sure they're related to the combination, and
they may be fixed in 1.1.1 (which we are in the process of rolling out
soon), but I wanted to post them here in case anyone can help understand
what is going on, or wanted to dig in to see if this issue could be
affecting others.

The most critical issue is data loss. This happened only once, and is the
first time I've ever personally seen HBase lose data. From what we can
tell, a region was compacting, then a split started for the same region
while the compaction was in progress (it had finished 3/5 column families).
The split starts to wait for the compaction, and the compaction cancels
(presumably because of the split). Then the split starts to progress. It
initializes the daughter regions. Then the region server crashes. The
region server crash is not related to the split (it had been crashing daily
for another reason related to scanning a large row as part of a custom
daily backup which happened to be occurring at the same time as this

When the region comes up, data is missing from the region that was
compacting and splitting (per some monitoring tests we have that scan for
known static data sets). There are logs indicating that the daughter
regions have no store files, so I suspect that the daughter regions
replaced the parent region before the store files were fully associated
with the daughter regions. It could also be that the large row is failing
the split, but then I'd hope the parent region would be restored and abort
the split.
(Continue reading)

Varun Sharma | 27 Jul 23:21 2015

HBase maven package built against Hadoop 2.X


Is there a maven package for HBase 0.94 built against hadoop 2.X ? I think
the default one available is built against Hadoop 1.X ?

Rose, Joseph | 27 Jul 22:05 2015

"Logging in" with my application?


I feel like I’m missing something basic, here. I’m trying to do a
privileged action using the VisibilityClient. That class is leaning on
UserGroupInformation, from Hadoop, and it’s picking up the user who’s
currently logged into the shell, not the Kerberos service principal that
should be used for these sorts of actions. I’ve been banging my head
against this code and can’t find the missing piece in the docs.

I’m trying to pick up the application’s service principal from a keytab,
since I don’t want to have to run kinit before starting the application in
the production environment. Besides, keytabs hold service principals.

Can you point me to the docs that explain how to set this up? Or do you
have any pointers yourself? I’m using version 0.98.12



Amit Tewari | 27 Jul 21:03 2015

HBase Administration (paid) help needed

Hi All

We have basic experience running HBase up to about 1TB data size. However
we are expecting our data size to increase soon and are worried that we may
not have configured things appropriately to scale.

To that end we would like to consult with someone on this list to make sure
we don't make dumb mistakes and set up things right.

Our set up runs on AWS with HBase for data storage, MR for data load, Hive
for data manipulation and Phoenix for webapp querying.

If you or anybody you know is interested in helping us out please send me a
private email.

We will make sure we compensate you appropriately for your time.

Vishwakarma, Chhaya | 27 Jul 14:11 2015

Using Hbase REST API

Hi All,

I'm researching on Hbase REST API for my use case. I have a JSON file on a machine which is outside of Hbase
cluster(No hbase client installed).
Requirement is to put the file in Hbase Table with below columns.
Rec_Id  File_Id  Meggase  Timestamp
File_Id: This will be file name
Message: Contains content of JSON file
Is it possible to do this with Hbase Rest API? If not what will be other solution. Please direct me to any good link
Please help
Jeetendra Gangele | 27 Jul 09:39 2015

Migration of data from POSTSQL

I have a production data in PSQL and i want ti migrant the data to Hbase.
Also if there are any changes in my PSQL data , I wanted to update the

Since I am migrating from production, i don't wanted to hit too many
request to my server, also this Hbase data should always be sync with PSQL

Scoop won't work here any other alternatives?
Ted Yu | 26 Jul 03:39 2015

Re: configurations for compactions

Talat has given summary of how to view config values.

You can also see the default values by searching
Description for the config is available on the refguide.


On Wed, Jul 22, 2015 at 1:32 AM, Talat Uyarer <talat@...> wrote:

> Hi Shushant,
> Hbase-site just store different values from default values. If you do
> not write any settings in hbase-site, Hbase uses default settings. You
> can reach your active Hbase configuration from Hbase's Master and
> RegionServers web interfaces. In 1.0 Interfaces port changes. If you
> Hbase version 0.98 and below version you can this links
> http://<hbase master server ip>:60010/conf or http://<hbase master
> server ip>:60030/conf
> For 1.0 and above
> http://<hbase master server ip>:16010/conf or http://<hbase master
> server ip>:16030/conf
> Talat
> 2015-07-22 7:29 GMT+03:00 Shushant Arora <shushantarora09@...>:
> > where in compaction configurations are defined.
(Continue reading)

F. Jerrell Schivers | 26 Jul 03:28 2015

Protocol message was too large error during bulk load


I'm getting the following error when I try to bulk load some data into
an HBase table at the end of a MapReduce job:

org.apache.hadoop.mapred.YarnChild: Exception running child :
com.google.protobuf.InvalidProtocolBufferException: Protocol message
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit()
to increase the size limit.

This process was working fine until recently, so presumably as the
dataset has grown I've hit the default 64MB protobuf message size

How can I increase this limit?  I'm doing the bulk load
programatically, and I haven't found a way to call
CodedInputStream.setSizeLimit() as suggested.

Only one reducer is failing, out of 500.  Is there any way to figure
out which keys are in that reducer?  When this happened once in the
past I was able to trace the problem to one particular key
corresponding to a very wide row.  Since I knew that key wasn't
important I simply removed it from the dataset.  However I'm having no
luck this time around.

One last question.  Can someone explain what exactly is exceeding this
size limit?  Is it the size of a particular row, or something else?

I'm running HBase 0.98.2.

(Continue reading)

apratim sharma | 24 Jul 20:20 2015

Hbase major compaction question

I have a hbase table with with a wide row almost 2K columns per row. Each
KV size is approx 2.1KB
I have populated this table with generated hfiles using a MR job.
There are no write or mutate operations performed on this table.

So Once I am done with major compaction on this table, ideally we should
not require another major or minor compaction if table is not modified.
What I observe is that if I make some configuration change that need to
restart my hbase service, then after restart my compaction on the table is
And if I start major compaction on the table again, It takes again a long
to compact the table.

Is this expected behavior? I am curious what causes the major compaction to
take a long time if nothing has changed on the table.

I would really appreciate any help.


Ted Yu | 24 Jul 13:49 2015

Re[2]: region servers stuck

Is it possible for you to upgrade to 0.98.10+ ?

I will take a look at your logs later. 


Friday, July 24, 2015, 7:15 PM +0800 from Konstantin Chudinov  <kchudinov <at> griddynamics.com>:
>Hello Ted,
>Thank you for your answer!
>Hadoop and HBase versions are: 
>2.3.0-cdh5.1.0 - версия хадупа (и hdfs)
>About hdfs.. i don’t see anything special in the logs. I’ve attached them to this message. Btw, it’s
another server, which is also crashed (I’ve lost hdfs logs of previous server), so hbase logs are in
archive as well.
>Best regards,
>Konstantin Chudinov
>On 23 Jul 2015, at 20:44, Ted Yu < yuzhihong <at> gmail.com > wrote:
>>What release of HBase do you use ?
>>I looked at the two log files but didn't find such information. 
>>In the log for node 118, I saw something such as the following:
>>Failed to connect to / for block, add to deadNodes and continue 
>>Was hdfs healthy around the time region server got stuck ?
(Continue reading)

jeevi tesh | 24 Jul 11:43 2015


I'm getting following error
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause
of  approximately after this hbase gets hanged. if you have any solution
please let me know.
Changed to Grabagecollector mode in HBASE to G1c1 still error persists.
I'm working with hbase 0.96.2 and single node. hbase is installed in a
single machine. JDK1.7.
If you have any solution kindly let me know
with regards