Jianshi Huang | 20 Aug 11:35 2014
Picon

Hbase InputFormat for multi-row + column range, how to do it?

Hi,

I know TableInputFormat and HFileInputFormat can both set ROW_START and
ROW_END, but none of them can set the column range (like what we do in
ColumnRangeFilter).

So how can I do column range in HBase InputFormat? Is there an
implementation available? If not, how much effort do you think it takes to
implement one?

Best,
--

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter:  <at> jshuang
Github & Blog: http://huangjs.github.com/
Nick Dimiduk | 20 Aug 07:53 2014
Picon

Shout-out for Misty

Our docs are getting a lot of love lately, courtesy of one Misty
Stanley-Jones. As someone who joined this community by way of
documentation, I'd like to say: Thank you, Misty!

-n
牛兆捷 | 20 Aug 05:41 2014
Picon

performance of block cache

Hi all:

I saw some interesting results from Hortonworks blog (block cache
<http://zh.hortonworks.com/wp-content/uploads/2014/03/perfeval_blockcache_v2.pdf>
).

In this result, the ratio of memory footprint to database size is held
fixed while
the absolute values are increased.

In my mind, the performance should becomes worse for larger ratio as the
increase
of the absolute value. For example BucketCache#(tmpfs), the difference
between ratio (DB"1.5":"RAM"1.0) and ratio (DB"4.5":"RAM"1.0) becomes
larger as the increase of memory.
Actually, the result of ratio ( DB"1.5":"RAM"1.0) increase linearly, and
the result of ratio (DB"1.5":"RAM"1.0) exponentially.

However, for BucketCache#(heap) and LruBlockCache, the result is out of my
expectation.
The curves of ratio (DB"1.5":"RAM"1.0) and ratio (DB"4.5":"RAM"1.0) both
increase exponentially, but the relative differences as the increase of
memory are not consistent.
Take LruBlockCache as an example, the difference of ratio
(DB"1.5":"RAM"1.0) and ratio (DB"4.5":"RAM"1.0) becomes smaller from 20 GB
to 50 GB, but becomes larger from 50 GB to 60 GB.

How to analysis the cause of this result, any ideas?

--

-- 
(Continue reading)

Demai Ni | 20 Aug 03:28 2014
Picon

Re: Scan output to file on each regserver node?

Nick, 

Thanks for the quick responds, I will definitely look into the Hadoop streaming. 

What do you think about AggregationClient? It is carried out at region/region server level, maybe instead
do a count/min/avg, a method can be used to write the data out to local file system? 

Demai on the run

On Aug 19, 2014, at 5:04 PM, Nick Dimiduk <ndimiduk@...> wrote:

> This sounds an awful lot like a map-only MR job... With Hadoop Streaming,
> you should be able to achieve your goal of piping to an arbitrary process.
> 
> 
> On Tue, Aug 19, 2014 at 4:26 PM, Demai Ni <nidmgg@...> wrote:
> 
>> Dear experts ,
>> 
>> I understand that I can do a simple command like:
>> 
>> echo "scan 'table1'"| hbase she'll > myoutput
>> 
>> This scenario i am thinking is to:
>> 1) output to local file system(like Linux ) instead of hdfs
>> 2) each regserver only output its only data to it's node's file system
>> 
>> To elaborate the 2) a bit. Basically, this will be like export Hbase data
>> to local file system without going through network. And on each node, one
>> file will be created.
(Continue reading)

Wei Liu | 20 Aug 01:56 2014

Multiple column families vs Multiple tables

We are doing schema design for our application, One thing we are not so
clear about is multiple column families (more than 3, probably 4 - 5) vs
multiple tables. In our use case, we will have the same number of rows in
all these column families, but some column families may be modified more
often than others, and some column families will have more columns than
others (thousands vs several).

The reason we are thinking about multiple column families is that it
probably can give us better performance if we need to do a search with data
from multiple column families. For example, search for a row with value x
in column family A and with value Y in column family B.

On the other hand, we saw the following paragraph in the user guide which
is scary to us:
"HBase currently does not do well with anything above two or three column
families so keep the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if
one column family is carrying the bulk of the data bringing on flushes, the
adjacent families will also be flushed though the amount of data they carry
is small. When many column families the flushing and compaction interaction
can make for a bunch of needless i/o loading (To be addressed by changing
flushing and compaction to work on a per column family basis). For more
information on compactions, see Section 9.7.6.7, “Compaction”
<http://hbase.apache.org/book.html#compaction>."

Can any one please shed some light on this topic?  Thanks in advance.

Thanks,
Wei
(Continue reading)

Demai Ni | 20 Aug 01:26 2014
Picon

Scan output to file on each regserver node?

Dear experts ,

I understand that I can do a simple command like:

echo "scan 'table1'"| hbase she'll > myoutput

This scenario i am thinking is to:
1) output to local file system(like Linux ) instead of hdfs
2) each regserver only output its only data to it's node's file system 

To elaborate the 2) a bit. Basically, this will be like export Hbase data to local file system without going
through network. And on each node, one file will be created. 

Is there a way to achieve it? Actually the receiving side of 1) doesn't have to be a file system , it can be
another process to process the data. But let's use file system to simplify the scenario for now.

Thanks

Demai on the run
Shahab Yunus | 19 Aug 22:11 2014
Picon

Splitting an existing table with new keys.

I have a table already created and with some data. I want to split it
trough code using HBaseAdmin api into multiple regions, while specifying
keys that do not exist in the table.

I am getting the exception below which makes sense because the key doesn't
exist yet. But at the time of creation of the table we can indeed pre-split
it using keys that don't exist.

Is it possible to do it for table that already exists and has data?

*Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
org.apache.hadoop.hbase.NotServingRegionException: *

Using Hbase: 0.98.1-cdh5.1.0

Thanks a lot.

Regards,
Shahab
Gayatri Rout | 19 Aug 11:40 2014

hbase get command doesn't return rows

Team,

We have upgrade to hbase0.96 recently. When we are trying to run get command it doesn't return any rows for
long columns which is working fine in 0.94 version.

Regards,
Gayatri

::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily
reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying,
disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized
representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------
Ayache Khettar | 19 Aug 11:49 2014

Hbase Master is aborted when regionserver is started

Hi

I am upgrading base from 0.94.6 to hbase-0.98.5-hadoop2. I am running in a
pseudo distributed mode (One master and one region server ) with an
external zookeeper (but all running locally). I start  base master first
all well, then as soon as I start the regionserver node the master gets
aborted - see below the logs from the master node and config set up. Am I
missing anything?

Kind regards,

*CONFIG SET UP:*

*hbase-site.xml*

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 * Copyright 2010 The Apache Software Foundation
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
(Continue reading)

V.J. LeBlanc | 18 Aug 23:42 2014

Development Connectivity Issues in "Stand-Alone" Mode

As part of my workflow, I have a stand alone HBase setup that I use for
debugging. My development environment is on my laptop and I often develop
from different locations. The problem I run in to is that when I change
locations, I am unable to connect to my local HBase server (even after
stopping and re-starting it). I am currently using 0.98.5. I have boiled
the problem down to the following reproductions steps:

   1. Start HBase.
   2. Open a HBase shell and list tables.
   3. Stop HBase.
   4. Go to another location w/ a different internet connection.
   5. Start HBase.
   6. Open a HBase shell and try and list tables.
      1. This is the point I get to and have yet to fix. I can not proceed
      w/o rebooting my machine.

The call stack from the HBase shell looks like:

hbase(main):001:0> list
TABLE
2014-08-18 16:41:29.985 java[23621:1003] Unable to load realm info from
SCDynamicStore
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/local/hbase-0.98.4-hadoop1/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/local/hadoop-1.2.1/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
2014-08-18 16:41:50,997 ERROR [main] zookeeper.RecoverableZooKeeper:
(Continue reading)

Armaselu, Cristian | 18 Aug 23:19 2014
Picon

hbase is not deleting the cell when a Put with a KeyValue, KeyValue.Type.Delete is submitted

Hello,

 

We’re running Hbase 0.96.1.1 under CDH5.0.2 and we’re seeing a different behavior than with CDH 4.8 (Hbase .94.xx)

 

Running the code below is creating an empty cell instead of no cell for this line of code

        String tname = "my_test";

        Configuration configuration = HBaseConfiguration.create();

        HBaseAdmin baseAdmin = new HBaseAdmin(configuration);

        baseAdmin.disableTable(tname);

        baseAdmin.deleteTable(tname);

        TableName tableName = TableName.valueOf(tname);

        HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);

        HColumnDescriptor columnDescriptor = new HColumnDescriptor("default");

        tableDescriptor.addFamily(columnDescriptor);

        baseAdmin.createTable(tableDescriptor);

        final String rowKey = "12345";

        final Put p = new Put(Bytes.toBytes(rowKey));

        for (int j = 0; j < 3; j++) {

            for (int i = 0; i < 6; i++) {

                final byte[] family = Bytes.toBytes("default");

                final byte[] column = Bytes.toBytes("c" + i);

                if (i == 5) {

                    p.add(new KeyValue(p.getRow(), family, column, HConstants.LATEST_TIMESTAMP, KeyValue.Type.Delete));

                } else {

                    p.add(family, column, Bytes.toBytes("c" + i + "_value_" + (j+1)));

                }

            }

        }

        HTableInterface table = new HTable(configuration, tableName);

        table.put(p);

 

hbase shell

 

hbase(main):003:0* scan 'my_test'

ROW                                                 COLUMN+CELL                                                                                                                                         

 12345                                              column=default:c0, timestamp=1408396641845, value=c0_value_3                                                                                       

 12345                                              column=default:c1, timestamp=1408396641845, value=c1_value_3                                                                                       

 12345                                              column=default:c2, timestamp=1408396641845, value=c2_value_3                                                                                       

 12345                                              column=default:c3, timestamp=1408396641845, value=c3_value_3                                                                                        

 12345                                              column=default:c4, timestamp=1408396641845, value=c4_value_3                                                                                       

 12345                                              column=default:c5, timestamp=1408396641845, value=                                                                                                 

1 row(s) in 0.3580 seconds

 

Is there any way we can get the old behavior back?

To be more exact I expect the hbase shell scan to return the following:

 

hbase(main):003:0* scan 'my_test'

ROW                                                 COLUMN+CELL                                                                                                                                         

 12345                                              column=default:c0, timestamp=1408396641845, value=c0_value_3                                                                                       

 12345                                              column=default:c1, timestamp=1408396641845, value=c1_value_3                                                                                       

 12345                                              column=default:c2, timestamp=1408396641845, value=c2_value_3                                                                                       

 12345                                              column=default:c3, timestamp=1408396641845, value=c3_value_3                                                                                        

 12345                                              column=default:c4, timestamp=1408396641845, value=c4_value_3                                                                                       

1 row(s) in 0.3580 seconds

 

Thanks,

Cristian Armaselu

Solution Architect

Shared Technology Services

 

6021 Connection Drive

Irving, TX 75039

carmaselu-Y8gIj/Zv+KRBDgjK7y7TUQ@public.gmane.org

The information contained in this communication is confidential, and is intended only for the sole use of the recipient named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please re-send this communication to the sender and delete the original message or any copy of it from your computer system. Thank you.

 



This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.

Gmane