Jonathan Gray | 1 Aug 2010 20:39
Gravatar

RE: Memory Consumption and Processing questions


> -----Original Message-----
> From: Jacques [mailto:whshub@...]
> Sent: Friday, July 30, 2010 1:16 PM
> To: user@...
> Subject: Memory Consumption and Processing questions
> 
> Hello all,
> 
> I'm planning an hbase implementation and had some questions I was
> hoping
> someone could help with.
> 
> 1. Can someone give me a basic overview of how memory is used in Hbase?
>  Various places on the web people state that 16-24gb is the minimum for
> region servers if they also operate as hdfs/mr nodes.  Assuming that
> hdfs/mr
> nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> seems
> like lots of people suggesting use of even 24gb+ for hbase.  Why so
> much?
>  Is it simply to avoid gc problems?  Have data in memory for fast
> random
> reads? Or?

Where exactly are you reading this from?  I'm not actually aware of people using 24GB+ heaps for HBase.

I would not recommend using less than 4GB for RegionServers.  Beyond that, it very much depends on your
application.  8GB is often sufficient but I've seen as much as 16GB used in production.

(Continue reading)

Jacques | 2 Aug 2010 02:29
Picon

Re: Memory Consumption and Processing questions

Thanks, that was very helpful.

Regarding 24gb-- I saw people using servers with 32gb of server memory (a
recent thread here and hstack.org).  I extrapolated the use since it seems
most people use ~8 for hdfs/mr.

-Jacques

On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jgray@...> wrote:

>
>
> > -----Original Message-----
> > From: Jacques [mailto:whshub@...]
> > Sent: Friday, July 30, 2010 1:16 PM
> > To: user@...
> > Subject: Memory Consumption and Processing questions
> >
> > Hello all,
> >
> > I'm planning an hbase implementation and had some questions I was
> > hoping
> > someone could help with.
> >
> > 1. Can someone give me a basic overview of how memory is used in Hbase?
> >  Various places on the web people state that 16-24gb is the minimum for
> > region servers if they also operate as hdfs/mr nodes.  Assuming that
> > hdfs/mr
> > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > seems
(Continue reading)

Venkatesh | 2 Aug 2010 05:13
Picon
Favicon

HTable object - how long it is valid


 Hi

If I construct new HTable() object upon my app init, is it valid until my app is shutdown?
I read in earlier postings that it is better to construct HTable once for performance reasons.
Wonder if underlying connection & other resources are kept around for ever for put/scan/..

Also ..when do I call close()..upon every operation (put/get/..) ? to avoid memory leaks

thanks
venkatesh

Jonathan Gray | 2 Aug 2010 06:08
Gravatar

RE: Memory Consumption and Processing questions

One reason not to extrapolate that is that leaving lots of memory for the linux buffer cache is a good way to
improve overall performance of typically i/o bound applications like Hadoop and HBase.

Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes generally require almost no significant
memory (though generally run with 1GB); their performance will improve with more free memory for the os
buffer cache.  As for MR, this completely depends on the tasks running.  The TaskTrackers also don't
require significant memory, so this completely depends on the number of tasks per node and the memory
requirements of the tasks.

Unfortunately you can't always generalize the requirements too much, especially in MR.

JG

> -----Original Message-----
> From: Jacques [mailto:whshub@...]
> Sent: Sunday, August 01, 2010 5:30 PM
> To: user@...
> Subject: Re: Memory Consumption and Processing questions
> 
> Thanks, that was very helpful.
> 
> Regarding 24gb-- I saw people using servers with 32gb of server memory
> (a
> recent thread here and hstack.org).  I extrapolated the use since it
> seems
> most people use ~8 for hdfs/mr.
> 
> -Jacques
> 
> 
(Continue reading)

Sindy | 2 Aug 2010 07:32
Picon

"connectString" to point to the server instead of "localhost"

Hi,

I have one master , one slave

now I 'm writting Java programme in eclipse on the third pc to connect
the Hbase.

10/08/02 12:40:33 INFO zookeeper.ClientCnxn: Attempting connection to
server localhost/127.0.0.1:2181
10/08/02 12:40:34 WARN zookeeper.ClientCnxn: Exception closing session
0x0 to sun.nio.ch.SelectionKeyImpl <at> 2c84d9
java.net.ConnectException: Connection refused: no further information
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
10/08/02 12:40:34 WARN zookeeper.ClientCnxn: Ignoring exception during
shutdown input
java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.shutdownInput(Unknown Source)
	at sun.nio.ch.SocketAdaptor.shutdownInput(Unknown Source)
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
10/08/02 12:40:34 WARN zookeeper.ClientCnxn: Ignoring exception during
shutdown output
java.nio.channels.ClosedChannelException

connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher <at> 82c01f
how can do to  tells the  "connectString" to point to the server
instead of "localhost".
(Continue reading)

Sasha Maksimenko | 2 Aug 2010 09:36
Picon

Re: [stargate] transaction?

hi!
thanks for answer. I use very simple code

org.apache.hadoop.hbase.stargate.client.Client client = new
org.apache.hadoop.hbase.stargate.client.Client();
Response put = client.post("http://hostname:port/task/2/value",
"application/octet-stream", "1".getBytes());
System.out.println(put.getCode()+new String(put.getBody()));
client.shutdown();
In the first invocation I use  correct column name "value" and everything is
OK. After that I use wrong column name"valueS" and get exception

503javax.ws.rs.WebApplicationException:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column
family values does not exist in region task,,1280306891116 in table {NAME =>
'task', FAMILIES => [{NAME => 'page', COMPRESSION => 'NONE', VERSIONS =>
'3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'value', VERSIONS => '1', COMPRESSION =>
'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}

Next time I change column back but problem still exist. When I re-start
server problem is dissappear

On Fri, Jul 30, 2010 at 7:36 PM, Andrew Purtell <apurtell@...> wrote:

> > seems stargate saves state of previous requests.
>
> If so that's unintentional, and not the way the Jersey/JAX-RS framework
(Continue reading)

Michelan Arendse | 2 Aug 2010 10:21
Favicon

Extending IndexSpecifications

Hi

I have a column that has multiple sets of data, I am trying to implement the secondary indexing as shown by
Rajeev Sharma at http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html. The
problem that I ran into was that HBase does not have a unique index key generator, so we wrote our own and this
is working fine.

We have tried multiple column families with the JSON String example and have failed.

So now my questions are:

*         Has anyone implemented an example of multiple column families and done indexing on that?

*         Does anyone have a work around to indexing multiple values on a column that contains multiple sets of data
of the same type and column field.

*         Any suggestions as to how can do something like this.

If yes to anyone of these question please provide some sample code as well.

Kind regards,
Michelan Arendse
Sasha Maksimenko | 2 Aug 2010 11:32
Picon

Re: [stargate] transaction?

same error in hbase 0.20.6

On Mon, Aug 2, 2010 at 10:36 AM, Sasha Maksimenko <
sasha.maksimenko@...> wrote:

> hi!
> thanks for answer. I use very simple code
>
> org.apache.hadoop.hbase.stargate.client.Client client = new
> org.apache.hadoop.hbase.stargate.client.Client();
>  Response put = client.post("http://hostname:port/task/2/value",
> "application/octet-stream", "1".getBytes());
> System.out.println(put.getCode()+new String(put.getBody()));
>  client.shutdown();
> In the first invocation I use  correct column name "value" and everything
> is OK. After that I use wrong column name"valueS" and get exception
>
> 503javax.ws.rs.WebApplicationException:
> org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:
> org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column
> family values does not exist in region task,,1280306891116 in table {NAME =>
> 'task', FAMILIES => [{NAME => 'page', COMPRESSION => 'NONE', VERSIONS =>
> '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}, {NAME => 'value', VERSIONS => '1', COMPRESSION =>
> 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}]}
>
>
> Next time I change column back but problem still exist. When I re-start
> server problem is dissappear
(Continue reading)

Héctor Izquierdo Seliva | 2 Aug 2010 12:14
Favicon

Re: Thousands of tablesq

> If you expect that some contiguous rows would be really overused, then
> change the row key. UUIDs for example would spread them all over the
> regions.
> 
> In 0.20 you can do a close_region in the shell, that will move the
> region to the first region servers that checks. In 0.90 we are working
> on better load balancing, more properly tuned to region traffic.

I'm on 0.89. Is any of this work in there? I can not use UUIDs, as I
need to address rows by key, but I could add a hash of the key to the
begining so keys are more evenly distributed. This is of course in case
of going to a single table.

I filled a 3 node cluster with data (around 6 GB), and the read
performance was very bad (lots of LRU flushes and misses). I'll test
this approach and see if it works better. 

Thank you very much!

Eran Kutner | 2 Aug 2010 12:25
Favicon

Which LZO library to use?

Hi,
I want to enable LZO compression on my cluster but see there are a few
alternatives and the wiki page itself is very confusing so it's not clear
what is the right choice. I was looking at this page:
http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it recommends
using Kevin Weil's version (which seems to be the same one released by
Twitter) but warns it doesn't contain all fixes and lower in the article it
refers to the original google code repository (
http://code.google.com/p/hadoop-gpl-compression/).
The thing the concerns me most is future compatibility, so whichever library
I pick now I want to be certain my data compressed will still be readable
when I I upgrade to the next major version of Hadoop and Hbase. It seems
that only the Google code project has newer releases compatible with future
version of Hadoop.

So I'm looking for recommendations on which library to use.

Thanks,
Eran

Gmane