Stack | 30 Nov 18:02 2015

Re: How to optimize the GC in HBase

On Sat, Nov 28, 2015 at 4:44 PM, 聪聪 <175998806@...> wrote:

> hi,all:
> This morning regionserver shutdown.By analyzing the log, I guess it's
> about GC.Then,How to optimize the GC in HBase.
> I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in
> this.
> JVM configuration is as follows:
> export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m
> -XX:MaxPermSize=256m -XX:+UseG1GC -server -XX:+DisableExplicitGC
> -XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0
> -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=40
> -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB
> -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log"
What the others have said.

How did you arrive at the above configurations?

Looks like you are keeping the gc log in the logs dir at
gc-HOSTNAME-hbase.log.  Looks like your GC output should be rich with lots
of helpful detail.

What is your loading like?  All reads? All writes?  Big objects? A mix?

(Continue reading)

Rajeshkumar J | 30 Nov 17:30 2015

Row Versions in Apache Hbase


  I am new to Apache Hbase and I know that in a table when we try to insert
row key value which is already present either new value is discarded or
updated. Also I came across row version through which we can store
different versions of row key based on timestamp. Any one correct me if I
am wrong? Also I need to know is there any way we can store more than one
row for a row-key value.

吴国泉 | 30 Nov 11:42 2015

Re: How to optimize the GC in HBase

     I also met this problem,    cong,pu is my partner,
     hbase regionserver is configured with 92G RAM,
     sometimes the regionserver would crash,i checked the log, but there 
was no GC happened ,that is very strange,
     here is part of the regionserver log:

2015-11-29 00:48:30,521 DEBUG 
compactions.Compactor: Compacting

keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher] 
util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to 
a long garbage collecting pause and it's usually bad, see
2015-11-29 00:49:46,912 INFO

zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket 
connection and attempting reconnect
2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker] 
util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to 
a long garbage collecting pause and it's usually bad, see
2015-11-29 00:49:46,912 INFO 
zookeeper.ClientCnxn: Client session timed out, have not heard from 
(Continue reading)

Marko Dinic | 29 Nov 23:19 2015

Rowkey design

Hello, everyone!

I'm new to HBase and I need help designing rowkeys for use case that looks
like this:

- Products are listed, where each product has a product id.
- Each product has a timestamp.
- Each product is created in certain place (e.g. city)
- Each product is created by some unit (e.g. factory)

I would like to be able to scan products from a certain time period on one
hand, from a certain place, or from a certain unit.

I read about salting to avoid hot-spotting and I understand that rows are
sequential by rowkey. This will allow me to scan for a certain time period
using with following rowkey:


And I can specify the period using STARTROW, ENDROW.

What confuses me is how to include place (and maybe unit) into key and be
able to select products from certain place during certain time period?

If I limit myself to be able to scan by one of the above (time range OR
place) I have an idea to duplicate data to two different tables, one with
(salt-productId-timestamp) and other with (salt-productId-place) keys. Is
that recommend or not?

So, how to construct my keys?
(Continue reading)

Otis Gospodnetić | 29 Nov 04:05 2015

Official Docker image


I was going through Cosmin Lehene's and noticed there
was no official HBase Docker image:

Is there a particular reason for that or it just slipped everyone's mind or
is just not important?

Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training -
Andrew Purtell | 27 Nov 20:14 2015

[ANNOUNCE] HBase is now available for download

Apache HBase is now available for download. Get it from an Apache
mirror [1] or Maven repository.

The list of changes in this release can be found in the release notes [2],
with the exception of one change removed for this patch release: a revert
of HBASE-14689 (Addendum and unit test for HBASE-13471), a change that was
problematic when committed to later code lines. You can also find the list
of changes in this release below following this announcement.

Thanks to all who contributed to this release.

The HBase Dev Team



HBASE-12911 Client-side metrics
HBASE-12986 Compaction pressure based client pushback
HBASE-13318 RpcServer.getListenerAddress should handle when the accept
channel is closed
HBASE-13330 Region left unassigned due to AM & SSH each thinking the
assignment would be done by the other
HBASE-14283 Reverse scan doesn’t work with HFile inline index/bloom blocks
HBASE-14347 Add a switch to DynamicClassLoader to disable it
HBASE-14366 NPE in case visibility expression is not present in labels
table during importtsv run
HBASE-14436 HTableDescriptor#addCoprocessor will always make
(Continue reading)

Mukesh Jha | 27 Nov 19:11 2015

High get/scan rates on HBase table even if no readers are on

I'm working with cloudera hbase v0.98, my HBase table has ~5k regions.

From the cloudera UI charts i see a lot of get & scan operations active on
my table even after i shut down all the reader applications.

I'm suspecting that this is impacting my scan performance.

So I'd like to know if there is a way by which i can identify the hosts
calling  these get/scan operations? I tried netstat and similar linux
commands without much luck.
Sumit Nigam | 27 Nov 05:42 2015


Need some help/ inputs.
I have set hbase.client.retries.number as 40. Double checked this to make sure it is 40 and not 400 as
reported below!
However, I notice following in my logs:
2015-11-19 16:35:02,687 WARN  [htable-pool5-t1] client.AsyncProcess: #3,
table=ldmns:indx_parameterstore, attempt=401/400 failed 2 ops, last exception: Connection refused on,44031,1447969471426, tracking started Thu Nov 19
Not sure why 400 attempts are being made. The problem is that it seems to show 2+ hours of time taken to fail
because of these 400 attempts.
Then, I am also not sure why just a little later another thread succeeds (again after 2 hours!! ):
2015-11-19 16:35:21,921 INFO  [htable-pool6-t3] client.AsyncProcess: #4,
table=ldmns:indx_parameterstore, attempt=402/400 SUCCEEDED on,46620,1447971819216, tracking started Thu Nov 19
14:23:54 PST 2015
What surprises me is that it seems to have even gone beyond 400 attempts. That seems like a bug to me. It seems
it would have kept trying beyond 400 attempts until it succeeded. Or am I missing something? Also, where is
this 400 coming from if not from hbase.client.retries.number?
This is with Hbase 0.98.14.

sudhir patil | 26 Nov 12:18 2015

How to change blocksize for existing hbase table? Does alter table works?

I want to change block size of 3xisting table, So is alter table as
mentioned below, will this change block size for my table? Also do I need
to run major_compaction after alter table.

disable 'table1'
alter 'table1', { NAME => 'f1',  BLOCKSIZE => '16384', COMPRESSION =>
enable 'table1'
major_compact  'table1'
Rich Bowen | 25 Nov 18:32 2015

[ANNOUNCE] CFP open for ApacheCon North America 2016

Community growth starts by talking with those interested in your
project. ApacheCon North America is coming, are you?

We are delighted to announce that the Call For Presentations (CFP) is
now open for ApacheCon North America. You can submit your proposed
sessions at
for big data talks and
for all other topics.

ApacheCon North America will be held in Vancouver, Canada, May 9-13th
2016. ApacheCon has been running every year since 2000, and is the place
to build your project communities.

While we will consider individual talks we prefer to see related
sessions that are likely to draw users and community members. When
submitting your talk work with your project community and with related
communities to come up with a full program that will walk attendees
through the basics and on into mastery of your project in example use
cases. Content that introduces what's new in your latest release is also
of particular interest, especially when it builds upon existing well
know application models. The goal should be to showcase your project in
ways that will attract participants and encourage engagement in your
community, Please remember to involve your whole project community (user
and dev lists) when building content. This is your chance to create a
project specific event within the broader ApacheCon conference.

Content at ApacheCon North America will be cross-promoted as
mini-conferences, such as ApacheCon Big Data, and ApacheCon Mobile, so
(Continue reading)

Arul | 25 Nov 15:03 2015

Hbase Aggregation


I am new to Hbase and doing an POC. We have a detail table in which rows are
continuously added (1000's of rows added every few minute). We want to build
an summary table from UI read and display it to user. The summary table
would be aggregate of detail table. I was thinking to use Map Reduce to
populate Summary table. Is it the right approach and how to make sure the
map reduce jobs run every few mins so that the Summary table is updated with
latest data. Thanks in Advance.


View this message in context:
Sent from the HBase User mailing list archive at