Tom White (JIRA | 28 Dec 20:50 2008
Picon

[jira] Commented: (PIG-569) Inconsistency with Hadoop in Pig load statements involving globs with subdirectories


    [
https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659462#action_12659462
] 

Tom White commented on PIG-569:
-------------------------------

Hadoop has supported this only since 0.19 (see HADOOP-3498, PIG-252). Are you using Hadoop 0.19? You could
be getting this error since Pig is using the libraries from Hadoop 0.17 or 0.18.

> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
>                 Key: PIG-569
>                 URL: https://issues.apache.org/jira/browse/PIG-569
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: FC Linux x86/64, Pig revision 724576
>            Reporter: Kevin Weil
>             Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories.  For example, 
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does
work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
(Continue reading)

Pradeep Kamath (JIRA | 29 Dec 20:00 2008
Picon

[jira] Created: (PIG-580) Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach

Combiner should also be used when there are distinct aggregates in a foreach following a group provided
there are no non-algebraics in the foreach 
---------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-580
                 URL: https://issues.apache.org/jira/browse/PIG-580
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch

Currently Pig uses the combiner only when there is foreach following a group when the elements in the
foreach generate have the following characteristics:
1) simple project of the "group" column
2) Algebraic UDF

The above conditions exclude use of the combiner for distinct aggregates - the distinct operation itself
is combinable (irrespective of whether it feeds to an algebraic or non algebraic udf). So if the following
foreach should also be combinable:
{code}
..
b = group a by $0;
c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
{code}

The combiner optimizer should cause the distinct to be combined and the final combine output should feed
the COUNT() and SUM() in the reduce.

(Continue reading)

Pradeep Kamath (JIRA | 29 Dec 20:02 2008
Picon

[jira] Created: (PIG-581) Pig should enable an option to disable the use of optimizer

Pig should enable an option to disable the use of optimizer
-----------------------------------------------------------

                 Key: PIG-581
                 URL: https://issues.apache.org/jira/browse/PIG-581
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch

There are some cases where a combiner optimization chosen by Pig may actually be slower than the non
optimized version. For example, the use of combiner to address the issue reported in
https://issues.apache.org/jira/browse/PIG-580 could result in slower execution IF the distinct on
groups of values does not actually shrink those groups. This is however very data dependent and the user
may know before hand that this might be the case and may wish to disable the use of the optimizer. Pig should
enable an option to do so.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Pradeep Kamath (JIRA | 29 Dec 22:48 2008
Picon

[jira] Created: (PIG-582) POProject assumes that when "overloaded" field is true, the type of input is BAG

POProject assumes that when "overloaded" field is true, the type of input is BAG
--------------------------------------------------------------------------------

                 Key: PIG-582
                 URL: https://issues.apache.org/jira/browse/PIG-582
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch

POProject.getNext(Tuple) checks if the "overloaded" field is true, and if so casts the input it obtained
to a Bag. The "overloaded" field is set based on the value of the overloaded field in LOProject in
LogToPhyTranslator. LOProject sets overloaded to true when it has a non ExpressionOperator successor.
This does not strictly mean that the type of the input got by POProject is a Bag and POProject should not
assume so. In most real word scripts, the Project which has its overloaded set to true does have input of
type Bag (since typically the input to the project is a relation which is a BAG). However, here is a
contrived example where things go wrong:
{code}
a = load 'distinct.input' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b  {
        l = distinct group;
        generate l;};
explain c;
dump c;
{code}

This  script on execution gives the following exception because of the assumption of the input type in
POProject as described:
(Continue reading)

Santhosh Srinivasan (JIRA | 29 Dec 23:15 2008
Picon

[jira] Created: (PIG-583) Bag constants used in non foreach statements cause lexical errors

Bag constants used in non foreach statements cause lexical errors
-----------------------------------------------------------------

                 Key: PIG-583
                 URL: https://issues.apache.org/jira/browse/PIG-583
             Project: Pig
          Issue Type: Bug
          Components: grunt
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Priority: Minor
             Fix For: types_branch

Use of bag constants in non-foreach statement cause lexical errors in Pig. The root cause is the inability
of grunt to distinguish between nested block and bag constant in non-foreach statements.

{code}
grunt> a = load 'input'; 
grunt> b = filter a by ($0 eq {(1)});

2008-12-29 14:12:15,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
java.io.IOException: Encountered " <FILTEROP> "eq "" at line 1, column 21.
Was expecting one of:
    "*" ...
    ")" ...
    "." ...
    "+" ...
    "-" ...
    "/" ...
    "%" ...
(Continue reading)

Santhosh Srinivasan (JIRA | 29 Dec 23:27 2008
Picon

[jira] Created: (PIG-584) Error handling in Pig

Error handling in Pig
---------------------

                 Key: PIG-584
                 URL: https://issues.apache.org/jira/browse/PIG-584
             Project: Pig
          Issue Type: New Feature
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Assignee: Santhosh Srinivasan
             Fix For: types_branch

This JIRA tracks the error handling feature in Pig.

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Santhosh Srinivasan (JIRA | 29 Dec 23:31 2008
Picon

[jira] Created: (PIG-585) Error handling requirements

Error handling requirements
---------------------------

                 Key: PIG-585
                 URL: https://issues.apache.org/jira/browse/PIG-585
             Project: Pig
          Issue Type: Sub-task
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Assignee: Santhosh Srinivasan
             Fix For: types_branch

The error handling feature requirements are documented at: http://wiki.apache.org/pig/PigErrorHandling

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Santhosh Srinivasan (JIRA | 29 Dec 23:33 2008
Picon

[jira] Created: (PIG-587) Error handling design

Error handling design
---------------------

                 Key: PIG-587
                 URL: https://issues.apache.org/jira/browse/PIG-587
             Project: Pig
          Issue Type: Sub-task
          Components: documentation
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Assignee: Santhosh Srinivasan
             Fix For: types_branch

The error handling design will be at: http://wiki.apache.org/pig/PigErrorHandlingDesign

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Santhosh Srinivasan (JIRA | 29 Dec 23:33 2008
Picon

[jira] Created: (PIG-586) Error handling functional specification

Error handling functional specification
---------------------------------------

                 Key: PIG-586
                 URL: https://issues.apache.org/jira/browse/PIG-586
             Project: Pig
          Issue Type: Sub-task
          Components: documentation
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Assignee: Santhosh Srinivasan
             Fix For: types_branch

The error handling functional specification will be at: http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Santhosh Srinivasan (JIRA | 29 Dec 23:35 2008
Picon

[jira] Updated: (PIG-585) Error handling requirements


     [
https://issues.apache.org/jira/browse/PIG-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-585:
------------------------------------

    Component/s: documentation

> Error handling requirements
> ---------------------------
>
>                 Key: PIG-585
>                 URL: https://issues.apache.org/jira/browse/PIG-585
>             Project: Pig
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> The error handling feature requirements are documented at: http://wiki.apache.org/pig/PigErrorHandling

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

(Continue reading)


Gmane