Shushant Arora | 6 May 03:36 2016

hbase doubts

1.Why is it better to have single file per region than multiple files for
read performance. Why can't multile threads read multiple file and give
better performance?

2Does hbase regionserver has single thread for compactions and split for
all regions its holding? Why can't single thread per regions will work
better than sequential compactions/split for all regions in a regionserver.

3.Why hbase flush and compact all memstores of all the families of a table
at same time irrespective of their size when even one memstore reaches

Lex Toumbourou | 5 May 05:53 2016

Export HBase snapshot to S3 creates empty root directory (prefix)

Hi all,

I'm having a couple of problems with exporting HBase snapshots to S3. I am
running HBase version 1.2.0.

I have a table called "domain"

And I have created a snapshot for it:

hbase(main):003:0> snapshot 'domain', 'domain-aws-test'
0 row(s) in 0.3310 seconds


I am attempting to export it to S3 using the following command:

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
"domain-aws-test" -copy-to s3://my-hbase-snapshots/domain-aws-test

Now, when I view the snapshot metadata in the S3 bucket, there's nothing

> aws s3 ls my-hbase-snapshots/domain-aws-snapshots

But there is data under:

> aws s3 ls my-hbase-snapshots/\/domain-aws-test/
                           PRE .hbase-snapshot/
2016-05-05 13:38:12          1 .hbase-snapshot

(Continue reading)

Shuai Lin | 5 May 03:07 2016

Some regions never get online after a region server crashes

Hi list,

Last weekend I got a region server crashed, but some regions never got
online again on other RSes. I've gone through the logs, and here is the
timeline about some of the events:

* 13:03:50 on of the region server, rs-node7, died because of a disk
failure. Master started to split rs-node7's WALs

2016-04-30 13:03:50,953 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
logs for,60020,1458724695128 before assignment; region
2016-04-30 13:03:50,966 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to
2016-04-30 13:03:50,966 INFO
org.apache.hadoop.hbase.master.SplitLogManager: started splitting 33 logs
in [hdfs://nameservice1/hbase/WALs/,60020,1458724695128-splitting]
for [,60020,1458724695128]

* 13:10:47 WAL splits done, master began to re-assign regions

2016-04-30 13:10:47,655 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning
133 region(s) that,60020,1458724695128 was carrying
(and 0 regions(s) that were opening on this server)
2016-04-30 13:10:47,665 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 133
region(s) across 6 server(s), round-robin=true
(Continue reading)

Dave Birdsall | 5 May 00:20 2016

Does Scan API guarantee key order?


Suppose I have an HBase table with many regions, and possibly many rows in
the memstore from recent additions.

Suppose I have a program that opens a Scan on the table, from start to
finish. Full table scan.

Does HBase guarantee that rows are returned in key order? Or might it jump
around, say, read one region first, then maybe another (and not necessarily
in region order)?


Tokayer, Jason M. | 3 May 14:52 2016

Hbase ACL

I am working on Hbase ACLs in order to lock a particular cell value for writes by a user for an indefinite amount of time. This same user will be writing to Hbase during normal program execution, and he needs to be able to continue to write to other cells during the single cell lock period. I’ve been experimenting with simple authentication (i.e. No Kerberos), and the plan is to extend to a Kerberized cluster once I get this working.

First, I am able to grant ‘user-X’ read and write permissions to a particular namespace. In this way user-X can write to any Hbase table in that namespace during normal execution. What I need to be able to do next is to set user-X’s permissions on a particular cell to read only and have that take precedence over the table permissions. I found a parameter in the codebase here, namely OP_ATTRIBUTE_ACL_STRATEGY_CELL_FIRST, that seems to allow for this prioritization of cell-level over table-/column-level. But I cannot figure out how to set this with key OP_ATTRIBUTE_ACL_STRATEGY. Is it possible to set the strategy to cell-level prioritization, preferably in hbase-site.xml? This feature is critical to our cell-level access control.

Warmest Regards,
Jason Tokayer, PhD

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Andrew Purtell | 2 May 17:07 2016

[ANNOUNCE] Apache HBase 0.98.19 is now available for download

Apache HBase 0.98.19 is now available for download. Get it from an Apache
mirror [1] or Maven repository. The list of changes in this release can be
found in the release notes [2] or at the bottom of this announcement.

Thanks to all who contributed to this release.

The HBase Dev Team


HBASE-11830 TestReplicationThrottler.testThrottling failed on virtual boxes
HBASE-12148 Remove TimeRangeTracker as point of contention when many
threads writing a Store
HBASE-12511 namespace permissions - add support from table creation
privilege in a namespace 'C'
HBASE-12663 unify getTableDescriptors() and
HBASE-12674 Add permission check to getNamespaceDescriptor()
HBASE-13700 Allow Thrift2 HSHA server to have configurable threads
HBASE-14809 Grant / revoke Namespace admin permission to group
HBASE-14870 Backport namespace permissions to 98 branch
HBASE-14983 Create metrics for per block type hit/miss ratios
HBASE-15191 CopyTable and VerifyReplication - Option to specify batch size,
HBASE-15212 RRCServer should enforce max request size
HBASE-15234 ReplicationLogCleaner can abort due to transient ZK issues
HBASE-15368 Add pluggable window support
HBASE-15386 PREFETCH_BLOCKS_ON_OPEN in HColumnDescriptor is ignored
HBASE-15389 Write out multiple files when compaction
HBASE-15400 Use DateTieredCompactor for Date Tiered Compaction
HBASE-15405 Synchronize final results logging single thread in PE, fix
wrong defaults in help message
HBASE-15412 Add average region size metric
HBASE-15460 Fix infer issues in hbase-common
HBASE-15475 Allow TimestampsFilter to provide a seek hint
HBASE-15479 No more garbage or beware of autoboxing
HBASE-15527 Refactor Compactor related classes
HBASE-15548 SyncTable: sourceHashDir is supposed to be optional but won't
work without
HBASE-15569 Make Bytes.toStringBinary faster
HBASE-15582 SnapshotManifestV1 too verbose when there are no regions
HBASE-15587 FSTableDescriptors.getDescriptor() logs stack trace erronously
HBASE-15614 Report metrics from JvmPauseMonitor
HBASE-15621 Suppress Hbase SnapshotHFile cleaner error  messages when a
snaphot is going on
HBASE-15622 Superusers does not consider the keytab credentials
HBASE-15627 Miss space and closing quote in
HBASE-15629 Backport HBASE-14703 to 0.98+
HBASE-15637 TSHA Thrift-2 server should allow limiting call queue size
HBASE-15640 L1 cache doesn't give fair warning that it is showing partial
stats only when it hits limit
HBASE-15647 Backport HBASE-15507 to 0.98
HBASE-15650 Remove TimeRangeTracker as point of contention when many
threads reading a StoreFile
HBASE-15661 Hook up JvmPauseMonitor metrics in Master
HBASE-15662 Hook up JvmPauseMonitor to REST server
HBASE-15663 Hook up JvmPauseMonitor to ThriftServer
HBASE-15664 Use Long.MAX_VALUE instead of HConstants.FOREVER in
HBASE-15665 Support using different StoreFileComparators for different
HBASE-15672 fails
HBASE-15673 [PE tool] Fix latency metrics for multiGet
HBASE-15679 Assertion on wrong variable in
Shushant Arora | 1 May 12:36 2016

hbase architecture doubts

1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in memstore?

2.When mwmstore is flushed to HDFS- does it dump the memstore
Concurrentskiplist as Hfile2? Then How does it calculates blocks out of
CSLM and dmp them in HDFS.

3.After dumping the inmemory CSLM of memstore to HFILe does memstore
content is discarded and if while dumping memstore any read request comes
will it be responded by copy of memstore or discard of memstore will be
blocked until read request is completed?

4.When a read request comes does it look in inmemory CSLM and then in
HFile? And what is LogStructuredMerge tree and its usage in Hbase.

Ted Yu | 29 Apr 19:13 2016

Re: How can i get hbase table memory used? Why hdfs size of hbase table double when i use bulkload?

For #1, can you clarify whether your workload is read heavy, write heavy or
mixed load of read and write ?

For #2, have you run major compaction after the second bulk load ?

On Thu, Apr 28, 2016 at 9:16 PM, Jone Zhang <joyoungzhang@...> wrote:

> *1、How can i get hbase table memory used?*
> *2、Why hdfs size of hbase table  double  when i use bulkload*
> bulkload file to qimei_info
> 101.7 G  /user/hbase/data/default/qimei_info
> bulkload same file to qimei_info agagin
> 203.3 G  /user/hbase/data/default/qimei_info
> hbase(main):001:0> describe 'qimei_info'
>  'qimei_info', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
>  1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => '2147483647',
> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
>   IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 1 row(s) in 1.4170 seconds
> *Besh wishes.*
> *Thanks.*
Manisha Sethi | 28 Apr 10:33 2016

No server address listed in hbase:meta for region SYSTEM.CATALOG


I have my hbase:meta table entries as :
SYSTEM.CATALOG,,1461831992343.a6daf63bd column=info:regioninfo, timestamp=1461831993549,
value={ENCODED => a6daf63bde1f1456ca4acee228b8f5fe, NAME => 'SYSTEM
e1f1456ca4acee228b8f5fe.                .CATALOG,,1461831992343.a6daf63bde1f1456ca4acee228b8f5fe.',
STARTKEY => '', ENDKEY => ''}
hbase:namespace,,1461821579649.7edd6a09 column=info:regioninfo, timestamp=1461821581196,
value={ENCODED => 7edd6a099dc3612b7dafa52f380ac3e6, NAME => 'hbase:
9dc3612b7dafa52f380ac3e6.               namespace,,1461821579649.7edd6a099dc3612b7dafa52f380ac3e6.',
STARTKEY => '', ENDKEY => ''}
hbase:namespace,,1461821579649.7edd6a09 column=info:seqnumDuringOpen,
timestamp=1461831928239, value=\x00\x00\x00\x00\x00\x00\x00\x1C

While I do scan 'SYSTEM.CATALOG' I get exception:

No server address listed in hbase:meta for region SYSTEM.CATALOG,.............

My aim is to connect to hbase throufh phoenix, but even hbase shell scan not working. I can see entries in meta
table,, I tried flush and compact for meta also. But no progress...

I am using Hbase 1.2

Manisha Sethi


James Johansville | 28 Apr 00:36 2016

Question on writing scan coprocessors


I'd like to write a similar coprocessor to the example
RegionObserverExample at  : that is, a scan
coprocessor which intercepts and selectively filters scan results.

My problem is, I need to be able to filter out Results based on a Scan
attribute. preScannerNext() as used in above example does not allow for
this, as Scan object is not passed down to the method.

Any guidance on how to accomplish this?

Saad Mufti | 27 Apr 17:27 2016

HBase Write Performance Under Auto-Split


Does anyone have experience with HBase write performance under auto-split
conditions? Out keyspace is randomized so all regions roughly start
auto-splitting around the same time, although early on when we had the 1024
regions we started with, they all decided to do so within an hour or so and
now that we're up to 6000 regions the process seems to be spread over 12
hours or more as they slowly reach their size thresholds.

During this time, our writes, for which we use a shared BufferedMutator
suffer as writes time out and the underlying AsyncProcess thread pool seems
to fill up. Which means callers to our service see their response times
shoot up as they spend time trying to drain the buffer and submit mutations
to the thread pool. So overall system time suffers and we can't keep up
with our input load.

Are there any guidelines on the size of the BufferedMutator to use? We are
even considering running performance tests without the BufferedMutator to
see if it is buying us anything. Currently we have it sized pretty large at
around 50 MB but maybe having it too big is not a good idea.

Any help/advice would be most appreciated.