Dru Jensen | 1 Aug 01:06 2008
Picon

Re: newbie - map reduce not distributing

UPDATE:  I modified the RowCounter example and verified that it is  
sending the same row to multiple map tasks also. Is this a known bug  
or am I doing something truly as(s)inine?  Any help is appreciated.

On Jul 30, 2008, at 3:02 PM, Dru Jensen wrote:

> J-D,
>
> Again, thank you for your help on this.
>
> hitting the HBASE Master port 60010:
> System 1 - 2 regions
> System 2 - 1 region
> System 3 - 3 regions
>
> In order to demonstrate the behavior I'm seeing, I wrote a test class.
>
> public class Test extends Configured implements Tool {
>
>     public static class Map extends TableMap {
>
>          <at> Override
>         public void map(ImmutableBytesWritable key, RowResult row,  
> OutputCollector output, Reporter r) throws IOException {
>
>             String key_str = new String(key.get());
>             System.out.println("map: key = " + key_str);
>         }
>
>     }
(Continue reading)

Jean-Daniel Cryans | 1 Aug 02:42 2008
Picon

Re: newbie - map reduce not distributing

Dru,

There is something truly weird with your setup. I would advise running your
code (the simple one that only logs the rows) with DEBUG on. See the
faq<http://wiki.apache.org/hadoop/Hbase/FAQ#5>on how to do it. Then
get back with syslog and stdout. This way we will have
more informations on how scanners are handling this.

Also FYI, I ran the same code as yours with 0.2.0 on my setup and had no
problems.

J-D

On Thu, Jul 31, 2008 at 7:06 PM, Dru Jensen <drujensen@...> wrote:

> UPDATE:  I modified the RowCounter example and verified that it is sending
> the same row to multiple map tasks also. Is this a known bug or am I doing
> something truly as(s)inine?  Any help is appreciated.
>
>
> On Jul 30, 2008, at 3:02 PM, Dru Jensen wrote:
>
>  J-D,
>>
>> Again, thank you for your help on this.
>>
>> hitting the HBASE Master port 60010:
>> System 1 - 2 regions
>> System 2 - 1 region
>> System 3 - 3 regions
(Continue reading)

Jean-Daniel Cryans | 1 Aug 02:46 2008
Picon

Re: Linking 2 MarReduce jobs together?

Erik,

Well your data has to be somewhere between the two jobs... So I'd say yes,
put it in HBase or HDFS to reuse it (or maybe I didn't understand your
question).

Regards,

J-D

On Thu, Jul 31, 2008 at 10:42 AM, Erik Holstad <erikholstad@...>wrote:

> Is it possible to put the output from the reduce phase of job 1
> to be the input to job number 2, or is the best way to write it
> to a HBase table  or to the HDFS and the fetch it in the second job?
>
> Erik
>
Jean-Daniel Cryans | 1 Aug 03:08 2008
Picon

Re: bug in RowCounter.java

Yair,

Yes, it is a problem and I don't personally use that class. This will fix
the problem: https://issues.apache.org/jira/browse/HBASE-791

Thx!

J-D

On Wed, Jul 30, 2008 at 5:26 PM, Yair Even-Zohar
<yaire@...>wrote:

> I looked at the code in the 0.2.0 and the args[0] is used twice
>
>
>
>
>
>    c.set("hbase.master", args[0]);
>
>
>
> And
>
>
>
>
>
>    // First arg is the output directory.
>
(Continue reading)

Jean-Daniel Cryans | 1 Aug 03:16 2008
Picon

Re: help with reduce phase understanding

Pavel,

Since each map processes only one region, that a row is only stored in one
region and that all intermediate keys from a given mapper goes to a single
reducer, there will be no stale data in this situation.

J-D

On Wed, Jul 30, 2008 at 10:09 AM, Pavel <pavlikus@...> wrote:

> Hi,
>
> I feel lack of mapreduce approach understanding and would like to ask some
> questions (mainly on its reduce part). Below is reduce job that gets values
> count for given row key and inserts resulting value into other table using
> the same row key.
>
> What makes me doubt is that I cannot figure out how would that code work if
> there're several redurers are running. Is it possible that they will
> process
> values for same row key and as consequence write stale data into the table?
> Say reducerA has counted total for 5 messages while reducerB for 3
> messages,
> would that all end up with 8 value in resulting table?
>
> Thank you.
> Pavel
>
> public class MessagesTableReduce extends TableReduce<Text, LongWritable> {
>
(Continue reading)

Yabo-Arber Xu | 1 Aug 11:40 2008
Picon

Hbase single-Node cluster config problem

Greetings,

I am trying to set up a hbase cluster. To simplify the setting, i first
tried the single node cluster, where HDFS name/data node are set on one
computer, and hbase master/regionserver are also set on the same computer.
The HDFS passed the test and works well. But, for hbase, when I try to
create a table using hbase shell. It keeps popping the following message:

08/08/01 02:30:29 INFO ipc.Client: Retrying connect to server:
ec2-67-202-24-167.compute-1.amazonaws.com/10.254.199.132:60000. Already
tried 1 time(s).

I checked the hbase log, and it has the following error:

2008-08-01 02:30:24,337 ERROR org.apache.hadoop.hbase.HMaster: Can not start
master
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3313)
        at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3347)
Caused by: java.net.SocketTimeoutException: timed out waiting for rpc
response
        at org.apache.hadoop.ipc.Client.call(Client.java:514)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
(Continue reading)

Jean-Daniel Cryans | 1 Aug 15:13 2008
Picon

Re: Hbase single-Node cluster config problem

Yair,

It seems that your master is unable to communicate with HDFS (that's the
SocketTimeoutException). To correct this, I would check that HDFS is running
by looking at the web UI, I would make sure that the ports are open (using
telnet for example) and I would also check that HDFS uses the default ports.

J-D

On Fri, Aug 1, 2008 at 5:40 AM, Yabo-Arber Xu <arber.research@...>wrote:

> Greetings,
>
> I am trying to set up a hbase cluster. To simplify the setting, i first
> tried the single node cluster, where HDFS name/data node are set on one
> computer, and hbase master/regionserver are also set on the same computer.
> The HDFS passed the test and works well. But, for hbase, when I try to
> create a table using hbase shell. It keeps popping the following message:
>
> 08/08/01 02:30:29 INFO ipc.Client: Retrying connect to server:
> ec2-67-202-24-167.compute-1.amazonaws.com/10.254.199.132:60000. Already
> tried 1 time(s).
>
> I checked the hbase log, and it has the following error:
>
> 2008-08-01 02:30:24,337 ERROR org.apache.hadoop.hbase.HMaster: Can not
> start
> master
> java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
(Continue reading)

Pavel | 1 Aug 15:36 2008
Picon

Re: help with reduce phase understanding

Thank you a lot for your answer Jean-Daniel. I think now I understand how
that scenario works.

I have another scenario (probably not doable with mapred thing though) - I
need to get total rows count for whole table. I think I could use Reporter
to increment a counter in map phase, but how can I get the counter value
saved into 'results' table after all? Can you please advice how can I
achieve that? Also, what is preferred way to get table row count?

Thank you for your help!
Pavel

2008/8/1 Jean-Daniel Cryans <jdcryans@...>

> Pavel,
>
> Since each map processes only one region, that a row is only stored in one
> region and that all intermediate keys from a given mapper goes to a single
> reducer, there will be no stale data in this situation.
>
> J-D
>
> On Wed, Jul 30, 2008 at 10:09 AM, Pavel <pavlikus@...> wrote:
>
> > Hi,
> >
> > I feel lack of mapreduce approach understanding and would like to ask
> some
> > questions (mainly on its reduce part). Below is reduce job that gets
> values
(Continue reading)

Yair Even-Zohar | 1 Aug 15:40 2008

RE: help with reduce phase understanding

Actually, there is a RowCounter under the mapred package. There is a bug
in the 0.2.0 candidate release but this was fixed yesterday. You may
wwant to check the new one (see
https://issues.apache.org/jira/browse/HBASE-791)

I would have done so but I probably have a bigger hdfs problem on my
cluster :-)

Thanks
-Yair
-----Original Message-----
From: Pavel [mailto:pavlikus@...] 
Sent: Friday, August 01, 2008 8:37 AM
To: hbase-user@...
Subject: Re: help with reduce phase understanding

Thank you a lot for your answer Jean-Daniel. I think now I understand
how
that scenario works.

I have another scenario (probably not doable with mapred thing though) -
I
need to get total rows count for whole table. I think I could use
Reporter
to increment a counter in map phase, but how can I get the counter value
saved into 'results' table after all? Can you please advice how can I
achieve that? Also, what is preferred way to get table row count?

Thank you for your help!
Pavel
(Continue reading)

Jean-Daniel Cryans | 1 Aug 15:46 2008
Picon

Re: help with reduce phase understanding

It was committed late last night so it's fixed in TRUNK. Another big issue
got fixed so there is a good chance that we see a release candidate 2 soon.

Pavel, FYI, doing a row count is really non-trivial in HBase. Doing a scan
over all rows may take more than one hour because it's not distributed (it's
one row after the other). So mapred is well suited for that.

J-D

On Fri, Aug 1, 2008 at 9:40 AM, Yair Even-Zohar <yaire@...>wrote:

> Actually, there is a RowCounter under the mapred package. There is a bug
> in the 0.2.0 candidate release but this was fixed yesterday. You may
> wwant to check the new one (see
> https://issues.apache.org/jira/browse/HBASE-791)
>
> I would have done so but I probably have a bigger hdfs problem on my
> cluster :-)
>
> Thanks
> -Yair
> -----Original Message-----
> From: Pavel [mailto:pavlikus@...]
> Sent: Friday, August 01, 2008 8:37 AM
> To: hbase-user@...
> Subject: Re: help with reduce phase understanding
>
> Thank you a lot for your answer Jean-Daniel. I think now I understand
> how
> that scenario works.
(Continue reading)


Gmane