Arun Patel | 29 Aug 16:31 2015

HBase Import Error

I was able to successfully export data to S3 using below command.

hbase org.apache.hadoop.hbase.mapreduce.Export docs
s3n://KEY:ACCESSKEY <at> fdocshbase/data/bkp1 1440760612 1440848237

and I was able to import data to a new table (after creation) with command

hbase org.apache.hadoop.hbase.mapreduce.Import docsnew
s3n://KEY:ACCESSKEY <at> fdocshbase/data/bkp1

I exported with different time ranges to other directories like bkp2 bkp3.

When I was trying to import all directories into Hbase, getting a
filenotfound exception.

[hdfs <at> ip-172-31-59-10 ~]$ hbase org.apache.hadoop.hbase.mapreduce.Import
docsnew s3n://KEY:ACCESSKEY <at> fdocshbase

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
SLF4J: Found binding in
SLF4J: See for an
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-08-29 09:21:02,058 INFO  [main] impl.TimelineClientImpl: Timeline
service address: http://ip-XXXXXXXXX.ec2.internal:8188/ws/v1/timeline/
2015-08-29 09:21:02,214 INFO  [main] client.RMProxy: Connecting to
ResourceManager at ip-XXXXXXXXX.ec2.internal/XXXXXXXXX:8050
(Continue reading)

Buntu Dev | 27 Aug 20:58 2015

HBase schema design

I'm planning on writing a time series of user action events including user
profile, attributes and product purchase transactions to answer these

- What are the events leading up to the users conversion ie, purchase?
- What the different attributes that changed over a given time period?
- What is the LTV of a given user?
- Retrieve list of attributes set/enabled for given user at some point in

As a newbie to HBase, I wanted to confirm that tall table design ie, with
row key <userid>_<timestamp> is _not_ the right design due to these reasons:

* scanning for the latest state of user seems to be an expensive operation
since not all the columns will be available in the latest event for the user

* constructing a row key always requires timestamp to the appended if I'm
not using the regex filtering

* fetching the user at some point in time t1 involves fetching all the
"<userid>*" rows and looking up the row with timestamp <= t1

Are these valid concerns?

jackiehbaseuser | 27 Aug 03:38 2015

hbase pre-split


How many ways about hbase pre-split? 

thank u very much!

Best regards!

donmai | 26 Aug 18:58 2015

Sporadic incorrect table directory structure when running restore_snapshot


Occasionally when I run restore_snapshot on HBase 0.98.10, it appears that
the table directory structure created by the restore_snapshot command is
not correct:


Is what it should be, but I end up with


The extra foo should not be there, so the master gets stuck in a loop
trying to figure out why there isn't a .tabledesc/.tableinfo.0000000001 in
rootdir/data/default/foo (it's in rootdir/data/default/foo/foo). Moving
everything from rootdir/data/default/foo/foo to rootdir/data/default/foo/
unblocks HBase master and allows it to proceed.

This doesn't always happen, so my question is: why does it happen? The logs
don't seem to be showing anything.

Buntu Dev | 26 Aug 09:30 2015

HBase event triggers

I'm planning on ingesting web page events via Flume to HBase and wanted to
know if there are any ways HBase related projects to define rules to
trigger an action if given criteria is met. For instance, define a rule to
notify the author of the page if the number of page views are significantly
higher/lower in the past 1hr.

Please note the purpose of ingesting into HBase is for other reasons but
wanted to build some sort of event trigger as well. I realize this can be
custom built but if there are any projects along these lines, I would
definitely want to check them out.

donmai | 25 Aug 18:30 2015

Using HBase with a shared filesystem (gluster, nfs, s3, etc)


I'm curious about how exactly region movement works with regard to data
transfer. To my understanding from the docs given an HDFS-backed cluster, a
region movement / transition involves changing things in meta only, all
data movement for locality is handled by HDFS. In the case where rootdir is
a shared file system, there shouldn't be any data movement with a region
reassignment, correct? I'm running into performance issues where region
assignment takes a very long time and I'm trying to figure out why.

Gautam Borah | 27 May 22:53 2015

optimal size for Hbase.hregion.memstore.flush.size and its impact

Hi all,

The default size of Hbase.hregion.memstore.flush.size is define as 128 MB for
Hbase.hregion.memstore.flush.size. Could anyone kindly explain what would be the impact if we
increase this to a higher value 512 MB or 800 MB or higher.

We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the
data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations
such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every
3rd or 4th time it tries to operate on the hot data (it does scan). 

We have region server heap size of 20 GB and set the, = .45 = .55

We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized
by memstore, after that memstore flushes.

At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utelization to by
memstore to 35%. 

It would be very helpful for us to understand the implication of higher
Hbase.hregion.memstore.flush.size  for a long running cluster. 

Chandrashekhar Kotekar | 24 Aug 16:32 2015

thrift.ProcessFunction: Internal error processing get


I have generated node.js files using Thrift and trying to get a single row
from HBase. I am getting thrift.ProcessFunction: Internal error processing
get <> error when I execute Node.js
code. When I try to put dummy column to existing row then I get this error
from node.js <>. There's no error on
thrift server side for 'put' operation.

Can anyone please help?

Chandrash3khar Kotekar
Mobile - +91 8600011455
Chandrashekhar Kotekar | 24 Aug 14:00 2015

Thrift node.js code not working


I am trying to use following code to test HBase Thrift interface for
Node.js but it is not working.

*var thrift = require('thrift');*
*var hbase = require('./gen-nodejs/THBaseService');*
*var hbaseTypes = require('./gen-nodejs/hbase_types');*

*var connection = thrift.createConnection('nn2', 9090, {*
*  transport: thrift.TBufferedTransport//,*
*  //protocol : thrift.TBinaryProtocol*
*console.log('connection : ' + connection );*

*var client = thrift.createClient(hbase, connection);*
*for(a in client) {*
*    console.log(a);*

*connection.on('connect', function(){*
*  console.log('connected to hbase.');*
*  client.get('AD_COMPANY_V1', '028fffac57101a1fa5f9aa53a6d0', 'CF:Id',
null, function(err, data){*
*    console.log(data);*
*  });*
*  connection.end();*

*connection.on('error', function(err){*
(Continue reading)

Yu Li | 24 Aug 11:41 2015

Which version of branch-1 to pick up for product environment

Hi All,

As titled, now we're using 0.98.12 in our product env but would like to do
some planing for upgrading to branch-1. I could see 1.1.2 is coming soon
but 1.2 is also in progress w/ some important improvements, so my question
basically includes two parts: a) If for planning (not action), which one is
better for investigation? 1.1.2 or 1.2? and b) Is there a due date for 1.2?

Also, I've heard that some famous company is planning to upgrade their
product cluster to 1.2, and it would be great if anyone could confirm this
or share more information. :-P

Any suggestions/information would be highly appreciated, and thanks in

Best Regards,
jackiehbaseuser | 23 Aug 17:01 2015

hbase rowkey design ways


 How many  ways when i  design the hbase rowkey ,and give some examples.

Thank u very much!

Best regards!