Ajay | 18 Dec 05:41 2014
Picon

HBase for Real Time Analytics

Hi,

Can HBase be used or best fit for Real Time Analytics?. I went through
couple of benchmark between Cassandra Vs HBase (most of it done 3 years
ago). It mentioned that HBase is designed for intensive reads (but higher
latency for writes).

Secondly I want to understand how it is faster or support intensive reads
(or how it is better than Cassandra) though it is strongly consistent (CP).
In our case we may have high writes and reads (more reads though say 60%
reads). In that case is HBase a better fit?. We are  planning to use Spark
as the in memory computation engine.

Thanks
Ajay
Aaron Beppu | 18 Dec 03:44 2014

Efficient use of buffered writes in a post-HTablePool world?

Hi All,

TLDR; in the absence of HTablePool, if HTable instances are short-lived,
how should clients use buffered writes?

I’m working on migrating a codebase from using 0.94.6 (CDH4.4) to 0.98.6
(CDH5.2). One issue I’m confused by is how to effectively use buffered
writes now that HTablePool has been deprecated[1].

In our 0.94 code, a pathway could get a table from the pool, configure it
with table.setAutoFlush(false); and write Puts to it. Those writes would
then go to the table instance’s writeBuffer, and those writes would only be
flushed when the buffer was full, or when we were ready to close out the
pool. We were intentionally choosing to have fewer, larger writes from the
client to the cluster, and we knew we were giving up a degree of safety in
exchange (i.e. if the client dies after it’s accepted a write but before
the flush for that write occurs, the data is lost). This seems to be a
generally considered a reasonable choice (cf the HBase Book [2] SS 14.8.4)

However in the 0.98 world, without HTablePool, the endorsed pattern [3]
seems to be to create a new HTable via table =
stashedHConnection.getTable(tableName, myExecutorService). However, even if
we do table.setAutoFlush(false), because that table instance is
short-lived, its buffer never gets full. We’ll create a table instance,
write a put to it, try to close the table, and the close call will trigger
a (synchronous) flush. Thus, not having HTablePool seems like it would
cause us to have many more small writes from the client to the cluster, and
basically wipe out the advantage of turning off autoflush.

More concretely :
(Continue reading)

Wilm Schumacher | 18 Dec 01:11 2014
Picon

What's the best way to reduce to map or array?

Hi,

if I can guarantee that the size of the array/map is reasonable small,
what would be the best way to reduce to an array, arrayList, map or
somthing like that?

Something like TableReducer, but for an object in the memory of the main
thread.

I thought of using a TableReducer and scanning the resulting table, but
this could lead to race conditions. And if I use a FileOutputFormat I
would have to create a lot of code for using different output files and
parsing of the file etc. This isn't that hard, but I would rather avoid
writing a lot of code for that task which I have to maintain if there is
a smarter plan to do this.

Any hint is appreciated :/

Best wishes,

Wilm
Jurriaan Mous | 17 Dec 18:32 2014
Picon

Async RpcClient

Hi,

I have been working on a Netty 4 based async HBase client to fit better within the event driven server I have
been developing. - https://github.com/jurmous/async-hbase-client/tree/HBase-0.99
<https://github.com/jurmous/async-hbase-client/tree/HBase-0.99> 

Recently I have been submitting some patches to make it easier to switch out the RpcClient of HBase. This to
enable HBase to use the client itself in all communication. I wanted to do this to use the tests on HBase to
check if the client was solid on all edge cases but also to enable HBase to possibly migrate to an async
client. These were committed on master and branch-1
https://issues.apache.org/jira/browse/HBASE-12597 <https://issues.apache.org/jira/browse/HBASE-12597>
https://issues.apache.org/jira/browse/HBASE-12684 <https://issues.apache.org/jira/browse/HBASE-12684>

Now I am at the next step where I want to contribute back the AsyncRpcClient itself. 

I have opened this issue to add AsyncRpcClient:
https://issues.apache.org/jira/browse/HBASE-12684 <https://issues.apache.org/jira/browse/HBASE-12684>
In the current patch the new async client is the default.

3 questions:

Can anyone with a proper Kerberos setup test if the async client works? SASL Digest auth works but I
haven’t tested Kerberos yet. 

Can anyone with know-how on benchmarking test what the performance of this client is compared to the
current client? The performance should of course be great in all relevant metrics will it ever be the main
client. 

What will we do with the old RpcClient if the async RpcClient is introduced? It would be great to remove it so
hbase can internally base anything async (like AsyncProcess) on the async RPC client and this would not be
(Continue reading)

SandhiyaNagaraj | 17 Dec 07:21 2014
Picon

Hbase_cluster formation in Windows

Hi 

  I formaed cluster in linux successfully.But i can't create Hbase_cluster
in windows.so please give the solution to me for how to create hbase cluster
in windows.

Thanks&Regards

--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-cluster-formation-in-Windows-tp4066946.html
Sent from the HBase User mailing list archive at Nabble.com.

Wilm Schumacher | 17 Dec 14:10 2014
Picon

Re: Cannot connect to Hbase via Java API

Hi,

I didn't found the content of the conf/regionservers files attached.

Could you check whether it contains "localhost" or "host"/"myHost"? It
should contain "host" or "myHost".

Did you edit the content of the files for the attachment? Sometimes it
is "myHost", sometimes it "host". This should be consistent.

Furthermore I once read, that someone had a problem with ipv6 and hbase
(thus all my installations does not use ipv6 ... just for safety).
Perhaps you should turn that off and restart hadoop and hbase, too. Just
as a test.

And yes, it is strange that all the other stuff is working.

Hope this helps,

Wilm

Am 17.12.2014 um 13:40 schrieb Marco:
> Hi Wilm,
>
> I've attached the logs. The region server logs only contain debug
> messages and mostly like the pattern, which I've pasted.
> I'm using the HortonWorksStack and have a single machine, on which
> runs the complete stack (no cluster).
>
> Hbase shell, Hive and Apache Phoenix works fine.
(Continue reading)

Marco | 16 Dec 15:19 2014
Picon

Cannot connect to Hbase via Java API

Hi,

Hbase is installed correctly and working (hbase shell works fine).

But I'm not able to use the Java API to connect to an existing Hbase Table:

<<<
val conf = HBaseConfiguration.create()

conf.clear()

conf.set("hbase.zookeeper.quorum", "ip:2181");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.zookeeper.dns.nameserver", "ip");
conf.set("hbase.regionserver.port","60020");
conf.set("hbase.master", "ip:60000");

val hTable = new HTable(conf, "truck_events")

>>>

Actually the coding is Scala but I think it is understandable, what I
am trying to achieve. I've tried also to use hbase-site.xml instead of
manually configuring it -  but the result is the same.

As response I got
14/12/16 15:10:05 INFO zookeeper.ZooKeeper: Initiating client
connection, connectString=ip:2181 sessionTimeout=30000
watcher=hconnection
14/12/16 15:10:10 INFO zookeeper.ClientCnxn: Opening socket connection
(Continue reading)

Scott Richter | 16 Dec 14:15 2014
Picon

HBase with Redis/Memcached?

Hello,

I am designing an architecture for a website to show analytics on a huge
quantity of data. This data is stored in one HBase table and needs to be
accessed in a semi-random manner. Typically, a big block of rowkeys that
are contiguous will be read at once (say a few thousand rows) and some data
displayed based on them. Where these blocks are within the table will be
the random aspect.

I am trying to figure out how fast I can expect HBase to be. Is it
something where I am ok to link the webpage directly to HBase for this
reading and I can expect realtime page loads (<1 sec), or do I need to get
a distributed cache like Redis running to cache the data so that if the
user requests the same data over and over I don't waste time pulling it
from HBase if it has already been loaded.

In other words, generally speaking, are HBase and Redis/Memcached redundant
or is there a strong use case for using HBase as the on-disk storage and
Redis or Memcached for caching in memory to improve performance?

Thanks,
Scott
Sivasubramaniam, Latha | 16 Dec 01:08 2014

Trying to import data

I am trying to import data into HBase table and tried the following as an example,

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,b,c
datatsv hdfs://data.tsv - this command complains about data.tsv not existing in HDFS, when I run hadoop
fs -ls , I do see the file.

Then I am trying the following with the data.tsv as a local file

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,b,c
datatsv /home/hadoop/BigdataEDW/data.tsv

and get the following, I tried both without and with HBase table datatsv created.

2014-12-15 17:52:25,449 INFO  [main] client.RMProxy: Connecting to ResourceManager at rtr-dev-spark4/10.153.24.132:8032
2014-12-15 17:52:25,559 INFO  [main] Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
2014-12-15 17:52:25,562 INFO  [main] Configuration.deprecation: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class
2014-12-15 17:52:25,562 INFO  [main] Configuration.deprecation: mapreduce.job.counters.limit is
deprecated. Instead, use mapreduce.job.counters.max
2014-12-15 17:52:25,565 INFO  [main] Configuration.deprecation: dfs.permissions is deprecated.
Instead, use dfs.permissions.enabled
2014-12-15 17:52:25,567 INFO  [main] Configuration.deprecation: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class
2014-12-15 17:52:25,569 INFO  [main] Configuration.deprecation: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class
2014-12-15 17:52:25,571 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
2014-12-15 17:52:25,604 INFO  [main] mapreduce.TableOutputFormat: Created table instance for datatsv
2014-12-15 17:52:26,806 INFO  [main] ipc.Client: Retrying connect to server:
(Continue reading)

lars hofhansl | 15 Dec 19:53 2014
Picon

0.94 going forward

Over the past few months the rate of the change into 0.94 has slowed significantly.
0.94.25 was released on Nov 15th, and since then we had only 4 changes.

This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is
contributing to it anymore).

If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will
required downtime), please speak up.
Otherwise it might be time to think about EOL'ing 0.94.

It's not actually much work to do these releases, especially when they are so small, but I'd like to continue
only if they are actually used.
In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow.

-- Lars

Stack | 15 Dec 19:45 2014
Picon

Upcoming meetups: Jan+Feb 2015

On January 15th, we're meeting at AppDynamics in San Francisco. We have
some nice talks linked up [1].  On Feb 17th, lets meet around Strata+Hadoop
World in San Jose.  If you are interested in hosting or speaking, write the
organizers.
Thanks,
St.Ack

1. http://www.meetup.com/hbaseusergroup/events/218744798/
2. http://www.meetup.com/hbaseusergroup/events/219260093/

Gmane