Stack | 18 May 2013 00:10

HEADS-UP: Upcoming bay area meetups and hbasecon

We have some meetups happening over the next few months.  Sign up if you
are interested in attending (or if you would like to present, write me
off-list).

First up, there is hbasecon2013 (http://hbasecon.com) on June 13th in SF.
 It is shaping up to be a great community day out with a bursting agenda of
whole-grained, high-fibre hbase and ecosystem talks.  Check out the
schedule here:  http://www.hbasecon.com/schedule/

The Hadoop Summit is happening down south at the San Jose Convention Center
later in June and there is a 'Birds of a Feather' HBase meetup scheduled
for June 25th, the day before the Hadoop Summit opens.  Signup here:
http://www.meetup.com/hbaseusergroup/events/119154442/

In July, we'll be back to our roughly monthly schedule with a meetup hosted
by the kindly folks over at Twitter.  Sign up here:
http://www.meetup.com/hbaseusergroup/events/119929152/

Go easy,
St.Ack
Heng Sok | 17 May 2013 20:17
Picon

Later version of HBase Client has a problem with DNS

Hi all,

I have been trying to run MapReduce job that involves using Hbase as source and sink. I have Hbase 0.94.2 and
Hadoop 2.0 installed using Cloudera repository and following their instructions.

When I use HBase client package version 0.94.2 and above, it gave the following DNS related error. When I try
to use HBase client package 0.92.1 with the Hbase version I have installed(0.94.2), everything seems to
work fine. But I want to use the newer HBase client package and hope someone can tell me what's wrong.
Furthermore, I hope this is not a bug and that's why I want to raise this issue I am facing.

I have disabled IPv6 and not using it at all. I am not sure why it can't parse the string for the DNSclient.

13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.22.1.el6.x86_64
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:user.name=hbase
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:user.home=/var/run/hbase
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Client environment:user.dir=/tmp
13/03/26 05:00:51 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= <MY DOMAIN
NAME>:2181 sessionTimeout=180000 watcher=hconnection
13/03/26 05:00:51 INFO zookeeper.ClientCnxn: Opening socket connection to server /46.4.115.71:2181
13/03/26 05:00:51 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6941 <at> <MY
DOMAIN NAME>
13/03/26 05:00:51 WARN client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
13/03/26 05:00:51 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the
default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore
(Continue reading)

Jinyuan Zhou | 17 May 2013 16:52
Picon
Gravatar

bulk load skipping tsv files

Hi,
I wonder if there are tool similar
to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read from tsv
file and create HFiles which are ready to be loaded into the corresponding
region by another
tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I want
is to read from some hbase table and create hfiles directly  I think I I
know how to write up such class by following steps in ImportTsv class but I
wonder if some one already did this.
Thanks,
Jack

--

-- 
-- Jinyuan (Jack) Zhou
Jan Lukavský | 17 May 2013 15:40
Picon

Scanner returning keys out of order

Hi all,

we are seeing very strange behavior of HBase (version 0.90.6-cdh3u5) in 
the following scenario:

  1) Open scanner and start scanning.
  2) Check order of returned keys (simple test if next key is 
lexigraphically greater than the previous one).
  3) The check may occasionally fail.

When we investigated this a bit, we saw major compaction occurring on 
the scanned region precisely in the time of the check failure. 
Interesting is that the compaction occurred on a column family that was 
not actually involved in the scanning. Has anyone else seen this? Is it 
possible that this is fixed in any higher version of HBase?

Thanks,
  Jan

Rishabh Agrawal | 17 May 2013 12:53
Picon

Doubt Regading HLogs

Hello,

I am working with Hlogs of Hbase and I have this doubt that HDFS shows size of last log file as zero. But when I open it I see data in it. When I add extra data a new file with zero size is created and previous HLog file gets its size.  This thing applies to each region server.  Following is the purged screen shot of same:

 

I have set following parameters in hbase-site.xml for logs:

<property>

    <name>hbase.regionserver.logroll.period</name>

    <value>3600000</value>

</property>

<property>

   <name>hbase.master.logcleaner.ttl</name>

   <value>604800000</value>

    </property>

<property>

   <name>hbase.regionserver.optionallogflushinterval</name>

   <value>3000</value>

</property>

 

 

I plan to read log files for some validation work, Please guide me through this behavior of Hbase.

 

 

Thanks and Regards

Rishabh Agrawal

Software Engineer

Impetus Infotech (India) Pvt. Ltd.

(O) +91.731.426.9300 x4526

(M) +91.812.026.2722

www.impetus.com

 








NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Varun Sharma | 17 May 2013 09:22

Re: Question about HFile seeking

Thanks Stack and Lars for the detailed answers - This question is not
really motivated by performance problems...

So the index indeed knows what part of the HFile key is the row and which
part is the column qualifier. Thats what I needed to know. I initially
thought it saw it as an opaque concatenated key (row+col.
qualifier+timestamp) in which case, it would be difficult to run prefix
scans since prefixes could potentially bleed across row and col.

Varun

On Thu, May 16, 2013 at 11:54 PM, Michael Stack <stack@...> wrote:

> On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <varun@...> wrote:
>
>> Referring to your comment above again
>>
>> "If you doing a prefix scan w/ row1c, we should be starting the scan at
>> row1c, not row1 (or more correctly at the row that starts the block we
>> believe has a row1c row in it...)"
>>
>> I am trying to understand how you could seek right across to the block
>> containing "row1c" using the HFile Index. If the index is just built on
>> HFile keys and there is no demarcation b/w rows and col(s), you would hit
>> the block for "row1,col1". After that you would either need a way to skip
>> right across to "row1c" after you find that this is not the row you are
>> looking for or you will have to simply keep scanning and discarding
>> sequentially until you get "row1c". If you have to keep scanning and
>> discarding, then that is probably suboptimal. But if there is a way to skip
>> right across from "row1,col1" to "row1c", then thats great, though I wonder
>> how that would be implemented.
>>
>> (ugh... meant to send the below at 5pm but see i didn't send it...
> anyways... see mailing list.. hopefully helps)
>
>
> The hfile index looks like an opaque byte array but it actually has a
> strong format.  In KV we have comparators that will look at this byte array
> and exploit the format to tease apart row from column from qualifier.
>
> I have to run just now.  Will give you better answer this evening up on
> list.
>
> St.Ack
>
Varun Sharma | 16 May 2013 22:56

Question about HFile seeking

Lets say I have the following in my table:

            col1
row1       v1          ------> HFile entry would be "row1,col1,ts1-->v1"
             ol1
row1c     v2          ------> HFile entry would be "row1c,ol1,ts1-->v2"

Now I issue a prefix scan asking row for row "row1c", how do we seek - do
we seek directly to row1c or would we seek to row1 first and then to row1c.
The reason being that the HFile keys are the same for both the keys. I
simply absorb one character from the column into the row.

Thanks
Varun
James Taylor | 16 May 2013 21:29
Favicon
Gravatar

[ANNOUNCE] Phoenix 1.2 is now available

We are pleased to announce the immediate availability of Phoenix 1.2 
(https://github.com/forcedotcom/phoenix/wiki/Download). Here are some of 
the release highlights:

* Improve performance of multi-point and multi-range queries (20x plus) 
using new skip scan
* Support TopN queries (3-70x faster than Hive)
* Control row key order when defining primary key columns
* Salt tables declaratively to prevent hot spotting
* Specify columns dynamically at query time
* Write Phoenix-compliant HFiles from Pig scripts and Map/Reduce jobs
* Support SELECT DISTINCT
* Leverage essential column family feature
* Bundle command line terminal interface
* Specify scale and precision on decimal type
* Support fixed length binary type
* Add TO_CHAR, TO_NUMBER, COALESCE, UPPER, LOWER, and REVERSE built-in 
functions

HBase 0.94.4 or above is required with HBase 0.94.7 being recommended. 
For more detail, please see our announcement: 
http://phoenix-hbase.blogspot.com/2013/05/announcing-phoenix-12.html

Regards,

James
 <at> JamesPlusPlus
http://phoenix-hbase.blogspot.com/

Varun Sharma | 16 May 2013 20:49

Key Value collision

Hi,

I am wondering what happens when we add the following:

row, col, timestamp --> v1

A flush happens. Now, we add

row, col, timestamp --> v2

A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie
broken during reads and during minor compactions, is it arbitrary ?

Thanks
Varun
Tianying Chang | 16 May 2013 19:12

NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

Hi,

When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in
the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the
entry for that row already exists and the "waitfoBlock" is false (therefore it won't retry, but just
return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other
places "waitForBlock" is always set to true).

This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is
deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this
misleading error message to log.  Or should the client call some other API?

Thanks
Tian-Ying

2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock
-3430274391270203797 explicitly acquired by client
2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":29783,"call":"lockRow([B <at> 339a6a5c, [B <at> 5ebcd87b), rpc version=1, client
version=29, methodsFingerPrint=0","client":"10.53.106.37:58892","starttimems":1368560701128,"queuetimems":847,"class":"HRegionServer","responsesize":0,"method":"lockRow"}
2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error
obtaining row lock (fsOk: true)
java.lang.NullPointerException
                at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
                at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346)
                at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332)
                at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384)
                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
get([B <at> 2166c821,
{"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":{"id":["tagv"]},"maxVersions":1,"row":"slcsn-s00314.slc.ebay.com"}),
rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error
2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020
caught: java.io.IOException: Connection reset by peer
                at sun.nio.ch.FileDispatcher.write0(Native Method)
                at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
Viral Bajaria | 16 May 2013 11:16
Picon

GET performance degrades over time

Hi,

My setup is as follows:
24 regionservers (7GB RAM, 8-core CPU, 5GB heap space)
hbase 0.94.4
5-7 regions per regionserver

I am doing an avg of 4k-5k random gets per regionserver per second and the
performance is acceptable in the beginning. I have also done ~10K gets for
a single regionserver and got the results back in 600-800ms. After a while
the performance of the GETs starts degrading. The same ~10K random gets
start taking upwards of 9s-10s.

With regards to hbase settings that I have modified, I have disabled major
compaction, increase region size to 100G and bumped up the handler count to
100.

I monitored ganglia for metrics that vary when the performance shifts from
good to bad and found that the fsPreadLatency_avg_time is almost 25x in the
bad performing regionserver. fsReadLatency_avg_time is also slightly higher
but not that much (it's around 2x).

I took a thread dump of the regionserver process and also did CPU
utilization monitoring. The CPU cycles were being spent
on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads
running that function is below this email.

Any pointers on why positional reads degrade over time ? Or is this just an
issue of disk I/O and I should start looking into that ?

Thanks,
Viral

====stacktrace for one of the handler doing blockread====
"IPC Server handler 98 on 60020" - Thread t <at> 147
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324)
- locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561)
- locked <74d81ea7> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

Gmane