Olga Natkovich (JIRA | 1 Sep 2010 01:22
Picon
Favicon

Created: (PIG-1584) deal with inner cogroup

deal with inner cogroup
-----------------------

                 Key: PIG-1584
                 URL: https://issues.apache.org/jira/browse/PIG-1584
             Project: Pig
          Issue Type: Bug
            Reporter: Olga Natkovich
             Fix For: 0.9.0

The current implementation of inner in case of cogroup is in conflict with join. We need to decide of whether
to fix inner cogroup or just remove the functionality if it is not widely used

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Giridharan Kesavan (JIRA | 1 Sep 2010 01:26
Picon
Favicon

Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken


     [
https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Giridharan Kesavan updated PIG-1583:
------------------------------------

    Status: Patch Available  (was: Open)

> piggybank unit test TestLookupInFiles is broken
> -----------------------------------------------
>
>                 Key: PIG-1583
>                 URL: https://issues.apache.org/jira/browse/PIG-1583
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1583-1.patch
>
>
> Error message:
> 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
> attempt_20100831093139211_0001_m_000000_3: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
> error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles 
(Continue reading)

Giridharan Kesavan (JIRA | 1 Sep 2010 01:26
Picon
Favicon

Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken


     [
https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Giridharan Kesavan updated PIG-1583:
------------------------------------

    Status: Open  (was: Patch Available)

submitting to hudson 

> piggybank unit test TestLookupInFiles is broken
> -----------------------------------------------
>
>                 Key: PIG-1583
>                 URL: https://issues.apache.org/jira/browse/PIG-1583
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1583-1.patch
>
>
> Error message:
> 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
> attempt_20100831093139211_0001_m_000000_3: 
(Continue reading)

Olga Natkovich (JIRA | 1 Sep 2010 01:28
Picon
Favicon

Commented: (PIG-1506) Need to clarify the difference between null handling in JOIN and COGROUP


    [
https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904829#action_12904829
] 

Olga Natkovich commented on PIG-1506:
-------------------------------------

I verified that 0.8 code does deal correctly with multi-column keys with nulls

> Need to clarify the difference between null handling in JOIN and COGROUP
> ------------------------------------------------------------------------
>
>                 Key: PIG-1506
>                 URL: https://issues.apache.org/jira/browse/PIG-1506
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Olga Natkovich
>            Assignee: Corinne Chandel
>             Fix For: 0.8.0
>
>

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

(Continue reading)

Yan Zhou (JIRA | 1 Sep 2010 01:36
Picon
Favicon

Updated: (PIG-1501) need to investigate the impact of compression on pig performance


     [
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou updated PIG-1501:
--------------------------

    Release Note: 
This feature will save HDFS space used to store the intermediate data used by PIG and potentially improve
query execution speed. In general, the more intermediate data generated, the more storage and speedup benefits.

There are no backward compatibility issues as result of this feature.

Two java properties are used to control the behavoir:

pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not.  If
true, then

pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts
"gz" and "lzo" as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use
LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details.

An example is the following "test.pig" script:

register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent:long, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B1 = filter A by timespent == 4;
B = load '/user/pig/tests/data/pigmix/queryterm' as (query_term);
C = join B1 by query_term, B by query_term using 'skewed' parallel 300;
(Continue reading)

Yan Zhou (JIRA | 1 Sep 2010 01:38
Picon
Favicon

Updated: (PIG-1501) need to investigate the impact of compression on pig performance


     [
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou updated PIG-1501:
--------------------------

    Release Note: 
This feature will save HDFS space used to store the intermediate data used by PIG and potentially improve
query execution speed. In general, the more intermediate data generated, the more storage and speedup benefits.

There are no backward compatibility issues as result of this feature.

Two java properties are used to control the behavoir:

pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not.  If
true, then

pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts
"gz" and "lzo" as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use
LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details.

An example is the following "test.pig" script:

register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent:long, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B1 = filter A by timespent == 4;
B = load '/user/pig/tests/data/pigmix/queryterm' as (query_term);
C = join B1 by query_term, B by query_term using 'skewed' parallel 300;
(Continue reading)

Scott Carey (JIRA | 1 Sep 2010 01:54
Picon
Favicon

Commented: (PIG-1506) Need to clarify the difference between null handling in JOIN and COGROUP


    [
https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904835#action_12904835
] 

Scott Carey commented on PIG-1506:
----------------------------------

I have just confirmed that on 0.7 it works fine, but 0.5 does not. So this was fixed in 0.6 or 0.7.  I suppose I can
take out some null guards from my scripts now :)

This was my test:

{code}
A = LOAD '/tmp/test.txt' as (a,b,c);
B = LOAD '/tmp/test.txt' as (a,b,c);
C = JOIN A by (a,b), B by (a,b);

DUMP A;
DUMP C;
{code}

With 0.5 I get:
A:
(fred,1,3)
(bob,,4)
C:
(bob,,4,bob,,4)
(fred,1,3,fred,1,3)

(Continue reading)

Olga Natkovich (JIRA | 1 Sep 2010 02:02
Picon
Favicon

Created: (PIG-1585) Add new properties to help and documentation

Add new properties to help and documentation
--------------------------------------------

                 Key: PIG-1585
                 URL: https://issues.apache.org/jira/browse/PIG-1585
             Project: Pig
          Issue Type: Bug
            Reporter: Olga Natkovich
            Assignee: Olga Natkovich
             Fix For: 0.8.0

New properties:

Compression:

pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not. If
true, then 
pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts
"gz" and "lzo" as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use
LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details. 

Combining small files:

pig.noSplitCombination - disables combining multiple small files to the block size

--

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

(Continue reading)

Ashutosh Chauhan (JIRA | 1 Sep 2010 02:06
Picon
Favicon

Commented: (PIG-1501) need to investigate the impact of compression on pig performance


    [
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904843#action_12904843
] 

Ashutosh Chauhan commented on PIG-1501:
---------------------------------------

If its not backward-incompatible then is there any specific reason to default pig.tmpfilecompression to
false. This seems to be a useful feature, so it should be true by default, no ?

> need to investigate the impact of compression on pig performance
> ----------------------------------------------------------------
>
>                 Key: PIG-1501
>                 URL: https://issues.apache.org/jira/browse/PIG-1501
>             Project: Pig
>          Issue Type: Test
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: compress_perf_data.txt, compress_perf_data_2.txt, PIG-1501.patch,
PIG-1501.patch, PIG-1501.patch
>
>
> We would like to understand how compressing map results as well as well as reducer output in a chain of MR
jobs impacts performance. We can use PigMix queries for this investigation.

--

-- 
(Continue reading)

Thejas M Nair (JIRA | 1 Sep 2010 02:19
Picon
Favicon

Updated: (PIG-1572) change default datatype when relations are used as scalar to bytearray


     [
https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1572:
-------------------------------

    Attachment: PIG-1572.2.patch

PIG-1572.2.patch 
- Fixed loss of lineage information in translation during explain call
- Added cast on output of ReadScalars so that type information is not lost during schema reset from optimizer.

Unit tests and test-patch has passed. Patch is ready for review.

     [exec] +1 overall.  
     [exec] 
     [exec]     +1  <at> author.  The patch does not contain any  <at> author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

> change default datatype when relations are used as scalar to bytearray
(Continue reading)


Gmane