Hadoop QA (JIRA | 1 Jul 01:22 2010
Picon

Commented: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7


    [
https://issues.apache.org/jira/browse/PIG-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884094#action_12884094
] 

Hadoop QA commented on PIG-1367:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448416/PIG-1367.patch
  against trunk revision 958666.

    +1  <at> author.  The patch does not contain any  <at> author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/356/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/356/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
(Continue reading)

Romain Rigaux (JIRA | 1 Jul 02:07 2010
Picon

Updated: (PIG-1404) PigUnit - Pig script testing simplified.


     [
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Romain Rigaux updated PIG-1404:
-------------------------------

    Attachment: PIG-1404-3.patch
                PIG-1404-3-doc.patch

Sorry it took some time but here is an updated patch, with some improvements, that simplifies a little bit
more PigUnit.

I also added some documentation but I guess this should be reviewed in another patch when this one could be committed.

> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>            Assignee: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily:
(Continue reading)

Xuefu Zhang (JIRA | 1 Jul 02:13 2010
Picon

Commented: (PIG-1321) Logical Optimizer: Merge cascading foreach


    [
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884105#action_12884105
] 

Xuefu Zhang commented on PIG-1321:
----------------------------------

Here is the scope of this type of optimization:

Pre-condition: 
1. two consecutive foreach statements.
2. the second foreach statement is a simple inner plan in which the ognly statement is a GENERATE statement.
In other words, the second foreach statement must be something like "FOREACH A GENERATE ...."

Optimization result:
The two foreach statement will be merged to one. The new foreach statement keeps the first old foreach
statement's inner plan with the new expressions for the GENERATE statement. These new expressions are
generated based on those in the second foreach generate statement, combined with those in the first
foreach generate statement. For instance, suppose we have the following pig script:

A = load 'file.txt' as (a, b, c);
B = foreach A generate a+b as u, c-b as v;
C = foreach B generate $0+5, v;
dump C;

The optimized plan after merge-foreach optimization will be equivalent to the following pig script

A = load 'file.txt' as (a, b, c);
C = foreach A generate a+b+5, c-b;
(Continue reading)

Ashutosh Chauhan (JIRA | 1 Jul 03:18 2010
Picon

Commented: (PIG-1389) Implement Pig counter to track number of rows for each input files


    [
https://issues.apache.org/jira/browse/PIG-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884116#action_12884116
] 

Ashutosh Chauhan commented on PIG-1389:
---------------------------------------

1.
{code}
+    /**
+     * Returns the counter name for the given input file name
+     * 
+     *  <at> param fname the input file name
+     *  <at> return the counter name
+     */
+    public static String getMultiInputsCounterName(String fname) {
+        return MULTI_INPUTS_RECORD_COUNTER +
+                new Path(fname).getName();
+    }

{code}

Its dangerous to assume that input is a file name. It may not be. It can be a jdbc location string. In
particular, new Path(fname) parses fname and throws exception if String is not the way it expects it to be.
So, at various places in the patch, dont assume the path will refer to a file location and particularly
avoid using Path() and deal in Strings.

2. In PigRecordReader, initialization of Counters should be done in initialize() instead of
getCurrentValue() that will avoid branching for every call of getCurrentValue.
(Continue reading)

Brian Mansell (JIRA | 1 Jul 07:23 2010
Picon

Created: (PIG-1477) Syntax error in tutorial Pig Script 1: Query Phrase Popularity (ORDER operator)

Syntax error in tutorial Pig Script 1: Query Phrase Popularity (ORDER operator)
-------------------------------------------------------------------------------

                 Key: PIG-1477
                 URL: https://issues.apache.org/jira/browse/PIG-1477
             Project: Pig
          Issue Type: Bug
          Components: documentation
    Affects Versions: 0.7.0
            Reporter: Brian Mansell
            Priority: Trivial

Documentation syntax should reflect the correct code indicated in the tutorial script.

Documentation syntax 
{code}
ordered_uniq_frequency = ORDER filtered_uniq_frequency BY (hour, score);
{code}

Above syntax results in this error:
{code}
2010-06-30 22:12:16,412 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
parsing. Encountered " "," ", "" at line 1, column 64.
Was expecting:
    ")" ..
{code}

(Correct) Tutorial script syntax
{code}
ordered_uniq_frequency = ORDER filtered_uniq_frequency BY hour, score;
(Continue reading)

Picon

Updated: (PIG-1295) Binary comparator for secondary sort


     [
https://issues.apache.org/jira/browse/PIG-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gianmarco De Francisci Morales updated PIG-1295:
------------------------------------------------

    Attachment: PIG-1295_0.7.patch

Addressed a small bug and added ASF license to source files

> Binary comparator for secondary sort
> ------------------------------------
>
>                 Key: PIG-1295
>                 URL: https://issues.apache.org/jira/browse/PIG-1295
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Gianmarco De Francisci Morales
>             Fix For: 0.8.0
>
>         Attachments: PIG-1295_0.1.patch, PIG-1295_0.2.patch, PIG-1295_0.3.patch, PIG-1295_0.4.patch,
PIG-1295_0.5.patch, PIG-1295_0.6.patch, PIG-1295_0.7.patch
>
>
> When hadoop framework doing the sorting, it will try to use binary version of comparator if available. The
benefit of binary comparator is we do not need to instantiate the object before we compare. We see a ~30%
(Continue reading)

Olga Natkovich (JIRA | 1 Jul 18:27 2010
Picon

Updated: (PIG-1477) Syntax error in tutorial Pig Script 1: Query Phrase Popularity (ORDER operator)


     [
https://issues.apache.org/jira/browse/PIG-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1477:
--------------------------------

    Fix Version/s: 0.8.0

> Syntax error in tutorial Pig Script 1: Query Phrase Popularity (ORDER operator)
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1477
>                 URL: https://issues.apache.org/jira/browse/PIG-1477
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.7.0
>            Reporter: Brian Mansell
>            Priority: Trivial
>             Fix For: 0.8.0
>
>
> Documentation syntax should reflect the correct code indicated in the tutorial script.
> Documentation syntax 
> {code}
> ordered_uniq_frequency = ORDER filtered_uniq_frequency BY (hour, score);
> {code}
> Above syntax results in this error:
> {code}
(Continue reading)

Ashutosh Chauhan (JIRA | 1 Jul 19:55 2010
Picon

Commented: (PIG-1424) Error logs of streaming should not be placed in output location


    [
https://issues.apache.org/jira/browse/PIG-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884365#action_12884365
] 

Ashutosh Chauhan commented on PIG-1424:
---------------------------------------

This turns out to be much more involved then I initially thought. Assumption that output/input location is
a file based path exists at more then one place in Pig. In particular, Streaming kind of make this explicit
assumption and has it in the semantics. We need to be careful about streaming semantics before we fix this.
More at: http://wiki.apache.org/pig/PigStreamingFunctionalSpec

> Error logs of streaming should not be placed in output location
> ---------------------------------------------------------------
>
>                 Key: PIG-1424
>                 URL: https://issues.apache.org/jira/browse/PIG-1424
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>
> This becomes a problem when output location is anything other then a filesystem. Output will be written to
DB but where the logs generated by streaming should go? Clearly, they cant be written into DB. This blocks
PIG-1229 which introduces writing to DB from Pig.

(Continue reading)

Dmitriy V. Ryaboy (JIRA | 1 Jul 20:13 2010
Picon

Commented: (PIG-1434) Allow casting relations to scalars


    [
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884369#action_12884369
] 

Dmitriy V. Ryaboy commented on PIG-1434:
----------------------------------------

A couple of thoughts that came out of the Pig conributor meeting:

1) rather than scalar, we should make this work for single-tuple relations. That way a user can do something
like this: 

{code}
A = load 'data' as (x, y, z);
B = group A all;
C = foreach B generate COUNT(A) as count, MAX(A.y) as max;
.....
X = ....
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}

2) Writing the intermediate relation to a file can cause hotspots. We should push this into the distributed
cache. In cases when the dist. cache is turned off, we can at least increase the replication factor to some
large-ish number (10, maybe, like the jobs?)

> Allow casting relations to scalars
> ----------------------------------
>
>                 Key: PIG-1434
(Continue reading)

Aniket Mokashi (JIRA | 1 Jul 20:47 2010
Picon

Commented: (PIG-928) UDFs in scripting languages


    [
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884378#action_12884378
] 

Aniket Mokashi commented on PIG-928:
------------------------------------

Extension of this jira to track progress for inline script udfs with define clause has been added at https://issues.apache.org/jira/browse/PIG-1471

> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz,
RegisterPythonUDF2.patch, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch,
RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.  This frees users from
needing to compile Java, generate a jar, etc.  It also opens Pig to programmers who prefer scripting
languages over Java.

(Continue reading)


Gmane