Andrew Hu | 1 Oct 01:01 2011

RE: Hbase - Solr Integration


Thanks Drew for your suggestions and ideas, very helpful.

-Andrew

> Date: Fri, 30 Sep 2011 10:17:50 -0400
> Subject: Re: Hbase - Solr Integration
> From: drew.dahlke@...
> To: user@...
> 
> Hi David,
> 
> I did a little proof of concept a few weeks ago indexing hundreds of
> millions of rows from hbase in solr using the near real time stuff in
> solr's trunk.
> 
> You *could* write map reduce jobs against hbase to generate lucene
> indexes on a periodic basis if you want, but that's not going to be
> real time in the least. If that interested you, take a peek at the
> source code for Katta.
> 
> Like you, I wanted updates to be indexed in near real time. At the
> time of writing, they haven't made a point release of Solr that
> includes the near real time code that came out of twitter. It's been
> merged into trunk and is actually quite stable. Check out trunk,
> compile it, and then configure the near real time stuff. They've
> introduced the concept of 'soft commits' which make new documents
> available to the index in near real time without all the overhead of
> flushing to disk (hard commit). In my case, I set it to automatically
> soft commit once a second and hard commit once an hour.
(Continue reading)

Christopher Dorner | 1 Oct 13:05 2011
Picon

Best way to write to multiple tables in one map-only job

Hallo,

i am building a RDF Store using HBase and experimenting with different 
index tables and Schema Designs.

For the input, i have a File where each line is a RDF triple in N3 Format.

I need to write to multiple Tables since i need to build several index 
tables. For the sake of reducing IO and not reading the file a few times 
i want to do that in one Map-Only Job. Later the file will contain a few 
million triples.

I am experimenting in Pseudo-Distributed-Mode so far but will be able to 
run it on our cluster soon.
Storing the data in the Tables does not need to be speed-optimized at 
any cost, but i just want to do it as simple and fast as possible.

What is the best way to write to more than 1 table in one Map-Task?

a)
I can either use "MultiTableOutputFormat.class" and write in map() using:
Put put = new Put(key);
put.add(kv);
context.write(tableName, put);

Can i write to e.g. 6 Tables in this way by creating a new Put for each 
table?

But how can i turn off autoFlush and set writeBufferSize in this case? 
Because i think autoflush is not that good in this case of putting lots 
(Continue reading)

Christopher Dorner | 1 Oct 13:19 2011
Picon

question about writing to columns with lots of versions in map task

Hallo,

I am reading a File containing RDF triples in a Map-job. the RDF triples 
then are stored in a table, where columns can have lots of versions.
So i need to store many values for one rowKey in the same column.

I made the observation, that reading the file is very fast and thus some 
values are put into the table with the same timestamp and therefore 
overriding an existing value.

How can i avoid that? The timestamps are not necessary for later usage.

Could i simply use some sort of custom counter?

How would that work in fully distributed mode? I am working on 
pseudo-distributed-mode for testing purpose right now.

Thank You and Regards,
Christopher

Christopher Dorner | 1 Oct 20:05 2011
Picon

Re: question about writing to columns with lots of versions in map task

Hi again,

i think i solved my issue.

I simply use the byte offset of the row currently read by the Mapper as 
the timestamp for the Put. This is unique for my input file, which 
contains one triple for each row. So the timestamps are unique.

Regards,
Christopher

Am 01.10.2011 13:19, schrieb Christopher Dorner:
> Hallo,
>
> I am reading a File containing RDF triples in a Map-job. the RDF triples
> then are stored in a table, where columns can have lots of versions.
> So i need to store many values for one rowKey in the same column.
>
> I made the observation, that reading the file is very fast and thus some
> values are put into the table with the same timestamp and therefore
> overriding an existing value.
>
> How can i avoid that? The timestamps are not necessary for later usage.
>
> Could i simply use some sort of custom counter?
>
> How would that work in fully distributed mode? I am working on
> pseudo-distributed-mode for testing purpose right now.
>
> Thank You and Regards,
(Continue reading)

danoomistmatiste | 1 Oct 21:51 2011
Picon

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null


Hi,  This is CDH version  hbase-0.90.3-cdh3u1 (my hadoop is also cdh3u1) All
my procs are running, when I do a jps it shows,

763 HMaster
564 JobTracker
1782 Jps
353 NameNode
1579 HRegionServer
506 SecondaryNameNode
640 TaskTracker
726 HQuorumPeer

yet from my shell when I do a list or try to execute any command it says,

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null
--

-- 
View this message in context: http://old.nabble.com/ERROR%3A-org.apache.hadoop.hbase.MasterNotRunningException%3A-null-tp32574927p32574927.html
Sent from the HBase User mailing list archive at Nabble.com.

Akash Ashok | 1 Oct 22:00 2011
Picon

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null

Could you check if there were any exceptions in the HMaster log ?

Cheers,
Akash A

On Sun, Oct 2, 2011 at 1:21 AM, danoomistmatiste <kkhambadkone@...>wrote:

>
> Hi,  This is CDH version  hbase-0.90.3-cdh3u1 (my hadoop is also cdh3u1)
> All
> my procs are running, when I do a jps it shows,
>
> 763 HMaster
> 564 JobTracker
> 1782 Jps
> 353 NameNode
> 1579 HRegionServer
> 506 SecondaryNameNode
> 640 TaskTracker
> 726 HQuorumPeer
>
> yet from my shell when I do a list or try to execute any command it says,
>
> ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null
> --
> View this message in context:
> http://old.nabble.com/ERROR%3A-org.apache.hadoop.hbase.MasterNotRunningException%3A-null-tp32574927p32574927.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
(Continue reading)

Amandeep Khurana | 1 Oct 23:08 2011
Picon

HBase NYC meetup - day before Hadoop World 2011

Hello HBasers,

Hadoop World 2011 (Nov 8th & 9th) is coming up soon and a bunch of Hadoop
and HBase users would be attending it. We are having a meetup the evening
before Hadoop World (Nov 7th) to talk about HBase and Hadoop topics that are
not going being covered in the Hadoop World sessions.

The details of the meetup are as follows:

Date: Nov 7th 2011
Time: 6pm - 8:30pm (we can hang out after that if people want to)
Venue: AppNexus, 28 West 23rd St, 5th Floor, New York, NY

The intention is to have 3-4 short talks (~15 mins each), followed by some
unconference style sessions. If you are interested in giving a talk or
suggesting a topic that you would want to discuss there, let us know. It
would be awesome to hear about your use cases, experiences, challenges,
learnings etc with HBase.

You can RSVP at http://www.meetup.com/hbaseusergroup/events/35682812/

And if you haven't yet registered for Hadoop World, you can do that here ->
http://www.hadoopworld.com/

See you there!

-ak
danoomistmatiste | 2 Oct 02:12 2011
Picon

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null


Hello Akash,  I managed to fix the issue.   The DataNode wasn't up.   I
switched to cdh3u1 version of Hadoop, HBase and Hive and was trying out the
Hive to HBase link.    The message is totally misleading I believe.   Since
there were remnants of the old directories in the dfs namespace,  I had to
clean them out after which the DataNode started and now everything is
working fine.

    --Krish.

Akash Ashok wrote:
> 
> Could you check if there were any exceptions in the HMaster log ?
> 
> Cheers,
> Akash A
> 
> On Sun, Oct 2, 2011 at 1:21 AM, danoomistmatiste
> <kkhambadkone@...>wrote:
> 
>>
>> Hi,  This is CDH version  hbase-0.90.3-cdh3u1 (my hadoop is also cdh3u1)
>> All
>> my procs are running, when I do a jps it shows,
>>
>> 763 HMaster
>> 564 JobTracker
>> 1782 Jps
>> 353 NameNode
>> 1579 HRegionServer
(Continue reading)

danoomistmatiste | 2 Oct 02:12 2011
Picon

Re: ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null


Hello Akash,  I managed to fix the issue.   The DataNode wasn't up.   I
switched to cdh3u1 version of Hadoop, HBase and Hive and was trying out the
Hive to HBase link.    The message is totally misleading I believe.   Since
there were remnants of the old directories in the dfs namespace,  I had to
clean them out after which the DataNode started and now everything is
working fine.

    --Krish.

Akash Ashok wrote:
> 
> Could you check if there were any exceptions in the HMaster log ?
> 
> Cheers,
> Akash A
> 
> On Sun, Oct 2, 2011 at 1:21 AM, danoomistmatiste
> <kkhambadkone@...>wrote:
> 
>>
>> Hi,  This is CDH version  hbase-0.90.3-cdh3u1 (my hadoop is also cdh3u1)
>> All
>> my procs are running, when I do a jps it shows,
>>
>> 763 HMaster
>> 564 JobTracker
>> 1782 Jps
>> 353 NameNode
>> 1579 HRegionServer
(Continue reading)

Mark | 2 Oct 19:19 2011
Picon

HBase Hush Application

I am trying to run the HBase URL application: 
https://github.com/larsgeorge/hbase-book on my local machine in 
psuedo-distributed mode using the Cloudera  CDH3 but I keep receiving 
the following error:

...
INFO [main] (ZooKeeper.java:373) - Initiating client connection, connectString=localhost:2181
sessionTimeout=180000 watcher=hconnection
INFO [main-SendThread()] (ClientCnxn.java:1041) - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181
INFO [main-SendThread(localhost:2181)] (ClientCnxn.java:949) - Socket connection established to
localhost/0:0:0:0:0:0:0:1:2181, initiating session
INFO [main-SendThread(localhost:2181)] (ClientCnxn.java:738) - Session establishment complete on
server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x132c59cf1100004, negotiated timeout = 40000
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
     at java.lang.String.substring(String.java:1937)
     at org.apache.hadoop.hbase.ServerName.parseHostname(ServerName.java:81)
     at org.apache.hadoop.hbase.ServerName.<init>(ServerName.java:63)
     at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:62)
     at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:568)
     at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
     at com.hbasebook.hush.schema.SchemaManager.process(SchemaManager.java:126)
     at com.hbasebook.hush.HushMain.main(HushMain.java:57)

When I type JPS I see /HMaster/ however I do not see any mention of 
/Zookeeper/. Is this to be expected? Is the above error I am receiving 
due to a misconfiguration of zookeeper or is it something completely 
unrelated? Is there something wrong with my hostname? Any ideas why I am 
receiving this error

Any help would be greatly appreciated. Thanks
(Continue reading)


Gmane