Daniel Dai (JIRA | 1 Sep 2011 01:05
Picon
Favicon

[Updated] (PIG-2221) Couldnt find documentation for ColumnMapKeyPrune optimization rule


     [
https://issues.apache.org/jira/browse/PIG-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2221:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.10
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Both pig-2221.patch and PIG-2221-help.patch have been committed to trunk and 0.9 branch. 

> Couldnt find documentation for ColumnMapKeyPrune optimization rule
> ------------------------------------------------------------------
>
>                 Key: PIG-2221
>                 URL: https://issues.apache.org/jira/browse/PIG-2221
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.8.1
>            Reporter: Vivek Padmanabhan
>            Assignee: Corinne Chandel
>             Fix For: 0.9.1, 0.10
>
>         Attachments: PIG-2221-help.patch, pig-2221.patch
>
>
(Continue reading)

Bill Graham (JIRA | 1 Sep 2011 02:05
Picon
Favicon

[Updated] (PIG-2257) AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor


     [
https://issues.apache.org/jira/browse/PIG-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated PIG-2257:
-----------------------------

    Attachment: PIG-2257_1.patch

Attaching patch which fixes this issue.

> AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-2257
>                 URL: https://issues.apache.org/jira/browse/PIG-2257
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: PIG-2257_1.patch
>
>
> PIG-2195 introduced the {{schema_file}} constructor param to {{AvroStorage}}. This field is
currently only supported when passing constructor data via JSON though (a different code path is used
when constructor data is a String array).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
(Continue reading)

Bill Graham (JIRA | 1 Sep 2011 02:05
Picon
Favicon

[Updated] (PIG-2257) AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor


     [
https://issues.apache.org/jira/browse/PIG-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated PIG-2257:
-----------------------------

    Status: Patch Available  (was: Open)

> AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-2257
>                 URL: https://issues.apache.org/jira/browse/PIG-2257
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: PIG-2257_1.patch
>
>
> PIG-2195 introduced the {{schema_file}} constructor param to {{AvroStorage}}. This field is
currently only supported when passing constructor data via JSON though (a different code path is used
when constructor data is a String array).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

(Continue reading)

Thejas M Nair (JIRA | 1 Sep 2011 03:21
Picon
Favicon

[Updated] (PIG-2228) support partial aggregation in map task


     [
https://issues.apache.org/jira/browse/PIG-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-2228:
-------------------------------

    Attachment: PIG-2228.1.patch

PIG-2228.1.patch - initial patch , will be adding more test cases in another patch

> support partial aggregation in map task
> ---------------------------------------
>
>                 Key: PIG-2228
>                 URL: https://issues.apache.org/jira/browse/PIG-2228
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.10
>
>         Attachments: PIG-2228.1.patch
>
>
> h3. Introduction
> Pig does (sort based) partial aggregation in map side through the use of combiner. MR serializes the
output of map to a buffer, sorts it on the keys, deserializes and passes the values grouped on the keys to
combiner phase. The same work of combiner can be done in the map phase itself by using a hash-map on the keys.
This hash based (partial) aggregation can be done with or without a combiner phase.
(Continue reading)

JArod Wen (JIRA | 1 Sep 2011 03:29
Picon
Favicon

[Created] (PIG-2259) Black hole of multiple level dereference on "bag in bag" structure: cannot reach deeper levels

Black hole of multiple level dereference on "bag in bag" structure: cannot reach deeper levels
----------------------------------------------------------------------------------------------

                 Key: PIG-2259
                 URL: https://issues.apache.org/jira/browse/PIG-2259
             Project: Pig
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.9.0
         Environment: Pig 0.9.0 local version, on Linux x86 and Mac OS X 10.7.1
            Reporter: JArod Wen

I noticed that dereference cannot reach the second level of bag in a "bag in bag" structure. Here is a example:

For the following scripts:

a = load 'grade.dat' as (name, age, gpa);
b = load 'rate.dat' as (state, age, rate);
ag = group a by (name, age);
c = cogroup ag by group.age, b by age;
cf = foreach c generate $1.$0;

The relation c has the schema as:

bytearray, bag{tuple(tuple(bytearray, bytearray), bag{tuple(bytearray, bytearray,
bytearray)})}, bag{tuple(bytearray, bytearray, bytearray)}

so for c, $1.$0 means the first field of the bag "ag", which will be the tuple group(name, age). However after
this, $1.$0.$0 and $1.$0.$0.$0 keep the same tuple but no deeper dereference. Actually we can add
arbitrary number of ".$0" after $1.$0 but keep stay at the same position. 
(Continue reading)

JArod Wen (JIRA | 1 Sep 2011 03:33
Picon
Favicon

[Commented] (PIG-2224) Incorrect arity test in AstValidator.g with ALL and column-based grouping condition together in cogroup


    [
https://issues.apache.org/jira/browse/PIG-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095064#comment-13095064
] 

JArod Wen commented on PIG-2224:
--------------------------------

Thanks Alan! Your suggestion makes sense for this scenario. 

> Incorrect arity test in AstValidator.g with ALL and column-based grouping condition together in cogroup
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2224
>                 URL: https://issues.apache.org/jira/browse/PIG-2224
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt
>    Affects Versions: 0.9.0
>         Environment: Suse Linux 9/MacOS(10.7)
>            Reporter: JArod Wen
>              Labels: ALL, arity, astvalidator, cogroup, grunt
>             Fix For: 0.9.1
>
>         Attachments: pig-2224.diff.patch, pig-2224.diff.patch
>
>
> When ALL and column-based grouping condition are used together in COGROUP, the arity test in
AstValidator.g (line 242) incorrectly sets the arity and causes exception. For example, assume we have
the follow two relations:
(Continue reading)

Daniel Dai (JIRA | 1 Sep 2011 06:29
Picon
Favicon

[Updated] (PIG-2239) Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself


     [
https://issues.apache.org/jira/browse/PIG-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2239:
----------------------------

    Attachment: PIG-2239-0.patch

Attach initial patch PIG-2239-0.patch. Some notes for the patch:
1. There is no change in java code, we still produce pig.jar, pig-withouthadoop.jar. However,
pig-withouthadoop.jar is more lightweight (no hadoop dependencies, only Pig dependencies)

2. If HADOOP_HOME is defined, bin/pig will invoke hadoop runJar to run pig-withouthadoop.jar, org.apache.pig.Main

3. If HADOOP_HOME is not defined, fall back to old way--link to bundled hadoop 20.2 libraries.

4. I didn't see any conflict options between Pig and Hadoop command line. The only change is now pig command
line support hadoop generic options which is not supported before.

> Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command
itself  
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2239
>                 URL: https://issues.apache.org/jira/browse/PIG-2239
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ashutosh Chauhan
>         Attachments: PIG-2239-0.patch
(Continue reading)

Alan Gates (JIRA | 1 Sep 2011 06:37
Picon
Favicon

[Updated] (PIG-2260) Make all pig e2e enhancements available to 0.9 branch


     [
https://issues.apache.org/jira/browse/PIG-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-2260:
----------------------------

    Attachment: test0_9.patch

We may not want to check this in, as it's hard to qualify it as a bug fix.  But if anyone wants to use the e2e
harness against the 0.9 branch they can apply this patch.

> Make all pig e2e enhancements available to 0.9 branch
> -----------------------------------------------------
>
>                 Key: PIG-2260
>                 URL: https://issues.apache.org/jira/browse/PIG-2260
>             Project: Pig
>          Issue Type: Test
>          Components: tools
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: test0_9.patch
>
>
> There have a been a lot of changes in the e2e testing (including making it all work from ant and adding tests
for new functionality) since the 0.9 release.  It should be possible to test 0.9 with these improvements.

--
This message is automatically generated by JIRA.
(Continue reading)

Alan Gates (JIRA | 1 Sep 2011 06:37
Picon
Favicon

[Created] (PIG-2260) Make all pig e2e enhancements available to 0.9 branch

Make all pig e2e enhancements available to 0.9 branch
-----------------------------------------------------

                 Key: PIG-2260
                 URL: https://issues.apache.org/jira/browse/PIG-2260
             Project: Pig
          Issue Type: Test
          Components: tools
            Reporter: Alan Gates
            Assignee: Alan Gates
         Attachments: test0_9.patch

There have a been a lot of changes in the e2e testing (including making it all work from ant and adding tests
for new functionality) since the 0.9 release.  It should be possible to test 0.9 with these improvements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Daniel Dai (JIRA | 1 Sep 2011 06:49
Picon
Favicon

[Updated] (PIG-1270) Push limit into loader


     [
https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1270:
----------------------------

    Attachment: PIG-1270-2.patch

PIG-1270-2.patch fix all unit tests. However, I didn't see noticeable performance improvement. The
script I test is:

a = load 'studenttab20m' as (name, age, gpa);
b = limit a 10;
dump b;

Both in local mode and mapreduce mode. 

Need further investigation to find out why performance not improve.

> Push limit into loader
> ----------------------
>
>                 Key: PIG-1270
>                 URL: https://issues.apache.org/jira/browse/PIG-1270
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
(Continue reading)


Gmane