jira | 13 Feb 08:00 2016
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key         Summary
PIG-4808    PluckTuple overwrites regex if used more than once in the same script
            https://issues.apache.org/jira/browse/PIG-4808
PIG-4806    UDFContext can be overwritten with empty properties in the task for inputs and outputs during initialization
            https://issues.apache.org/jira/browse/PIG-4806
PIG-4777    Enable "TestEvalPipelineLocal" for spark
            https://issues.apache.org/jira/browse/PIG-4777
PIG-4745    DataBag should protect content of passed list of tuples
            https://issues.apache.org/jira/browse/PIG-4745
PIG-4734    TOMAP schema inferring breaks some scripts in type checking for bincond
            https://issues.apache.org/jira/browse/PIG-4734
PIG-4684    Exception should be changed to warning when job diagnostics cannot be fetched
            https://issues.apache.org/jira/browse/PIG-4684
PIG-4656    Improve String serialization and comparator performance in BinInterSedes
            https://issues.apache.org/jira/browse/PIG-4656
PIG-4641    Print the instance of Object without using toString()
            https://issues.apache.org/jira/browse/PIG-4641
PIG-4616    Fix UT errors of TestPigRunner in Spark mode
            https://issues.apache.org/jira/browse/PIG-4616
PIG-4598    Allow user defined plan optimizer rules
            https://issues.apache.org/jira/browse/PIG-4598
PIG-4581    thread safe issue in NodeIdGenerator
            https://issues.apache.org/jira/browse/PIG-4581
PIG-4551    Partition filter is not pushed down in case of SPLIT
            https://issues.apache.org/jira/browse/PIG-4551
(Continue reading)

Siddhi Mehta | 12 Feb 23:30 2016
Picon

Customizable Error Handling for Loaders in Pig

Hello Everyone,

Following the same approach of PIG-4704
<https://issues.apache.org/jira/browse/PIG-4704> I want to introduce a
notion of
'*Customizable Error Handling for Loaders in Pig to deal with bad records*'

There was some discussions/available patches for the same in PIG-3059
<https://issues.apache.org/jira/browse/PIG-3059>

I would like to take a similar approach to the one taken for Error Handling
in Storers.
We will use the same ErrorHandling and ErrorHandler Interface as used in
storers.

The idea is to intercept the getNext() call and wrap it with try/catch and
errorhandler if a loader implements ErrorHandling.

Looking into the pig codebase I see the all getNext() on a loader is called
from PigRecordReader.nextKeyValue()

The approach I am thinking now is to create a LoadFuncDecorator to
delegate/wrap calls from PigRecordReader to the loaders getNext().

Thoughts?Does this cover all cases?

-Siddhi
Eyal Allweil (JIRA | 12 Feb 22:49 2016
Picon

[Updated] (PIG-4808) PluckTuple overwrites regex if used more than once in the same script


     [
https://issues.apache.org/jira/browse/PIG-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eyal Allweil updated PIG-4808:
------------------------------
    Status: Patch Available  (was: Open)

> PluckTuple overwrites regex if used more than once in the same script
> ---------------------------------------------------------------------
>
>                 Key: PIG-4808
>                 URL: https://issues.apache.org/jira/browse/PIG-4808
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>            Reporter: Eyal Allweil
>            Assignee: Eyal Allweil
>         Attachments: PIG-4808.patch
>
>
> If you define two PluckTuples with different regexes, and use them together, they will overwrite the
regex field of one another's (this will not affect cases where non-regex prefixes are used).
> Example (from testOutput in TestPluckTuple.java, modified)
> a = load 'a' using mock.Storage();
> b = load 'b' using mock.Storage();
> c = join a by x, b by x;
> define pluck1 PluckTuple('a::.*');
> define pluck2 PluckTuple('b::.*');
> d = foreach c generate pluck1(*) AS pluck1, pluck2(*) AS pluck2;
(Continue reading)

Eyal Allweil (JIRA | 12 Feb 22:48 2016
Picon

[Updated] (PIG-4808) PluckTuple overwrites regex if used more than once in the same script


     [
https://issues.apache.org/jira/browse/PIG-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eyal Allweil updated PIG-4808:
------------------------------
    Attachment: PIG-4808.patch

This patch fixes the bug by removing the static modifier, and adds a new test to check for this case. I also
changed the order of the main conditional in the exec method, because checking the boolean (which is fast)
should come before the pattern matching (which is slower).

> PluckTuple overwrites regex if used more than once in the same script
> ---------------------------------------------------------------------
>
>                 Key: PIG-4808
>                 URL: https://issues.apache.org/jira/browse/PIG-4808
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>            Reporter: Eyal Allweil
>            Assignee: Eyal Allweil
>         Attachments: PIG-4808.patch
>
>
> If you define two PluckTuples with different regexes, and use them together, they will overwrite the
regex field of one another's (this will not affect cases where non-regex prefixes are used).
> Example (from testOutput in TestPluckTuple.java, modified)
> a = load 'a' using mock.Storage();
> b = load 'b' using mock.Storage();
(Continue reading)

Eyal Allweil (JIRA | 12 Feb 22:14 2016
Picon

[Commented] (PIG-4803) Improve performance of regex-based builtin functions


    [
https://issues.apache.org/jira/browse/PIG-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145309#comment-15145309
] 

Eyal Allweil commented on PIG-4803:
-----------------------------------

Definitely a bug in PluckTuple - I opened https://issues.apache.org/jira/browse/PIG-4808 for it.

> Improve performance of regex-based builtin functions
> ----------------------------------------------------
>
>                 Key: PIG-4803
>                 URL: https://issues.apache.org/jira/browse/PIG-4803
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Eyal Allweil
>            Assignee: Eyal Allweil
>              Labels: perfomance, regex
>         Attachments: PIG-4803.patch
>
>
> There are three strategies used by Pig's regex-based built in functions.
> 1) REPLACE doesn't do any pattern caching.
> 2) REGEX_EXTRACT and REGEX_EXTRACT_ALL attempt to cache a single pattern as an instance variable.
> 3) PluckTuple attempts to cache a single pattern statically. (doesn't this cause problems if two
clashing defines for different PluckTuples are used?)
> I have a little fix and a medium fix in mind. The little fix is to give REPLACE a similar caching strategy, and
to fix PluckTuple, if the static nature of the pattern is indeed a problem.
(Continue reading)

Eyal Allweil (JIRA | 12 Feb 22:04 2016
Picon

[Created] (PIG-4808) PluckTuple overwrites regex if used more than once in the same script

Eyal Allweil created PIG-4808:
---------------------------------

             Summary: PluckTuple overwrites regex if used more than once in the same script
                 Key: PIG-4808
                 URL: https://issues.apache.org/jira/browse/PIG-4808
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.15.0
            Reporter: Eyal Allweil
            Assignee: Eyal Allweil

If you define two PluckTuples with different regexes, and use them together, they will overwrite the regex
field of one another's (this will not affect cases where non-regex prefixes are used).

Example (from testOutput in TestPluckTuple.java, modified)

a = load 'a' using mock.Storage();
b = load 'b' using mock.Storage();
c = join a by x, b by x;
define pluck1 PluckTuple('a::.*');
define pluck2 PluckTuple('b::.*');
d = foreach c generate pluck1(*) AS pluck1, pluck2(*) AS pluck2;

Expected:

((1,hey,2),(1,sasf,5))
((2,woah,3),(2,woah,6))

Actual:
(Continue reading)

Daniel Dai (JIRA | 12 Feb 19:55 2016
Picon

[Updated] (PIG-4728) Compilation against hbase 1.x fails with hbase-hadoop1-compat not found


     [
https://issues.apache.org/jira/browse/PIG-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-4728:
----------------------------
    Attachment: PIG-4728-1.patch

The reason we cannot upgrade HBase to 1.1 is because HBase drop support for Hadoop 1. Untill we reach
consensus to drop Hadoop 1 in Pig, we cannot commit the patch.

Attach the patch here to make Pig work with HBase 1.1 only under Hadoop 2.

> Compilation against hbase 1.x fails with hbase-hadoop1-compat not found
> -----------------------------------------------------------------------
>
>                 Key: PIG-4728
>                 URL: https://issues.apache.org/jira/browse/PIG-4728
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ted Yu
>             Fix For: 0.16.0
>
>         Attachments: PIG-4728-1.patch
>
>
> With the following change:
> {code}
> diff --git a/ivy/libraries.properties b/ivy/libraries.properties
> index c40befd..41ce9fb 100644
(Continue reading)

Rohini Palaniswamy (JIRA | 12 Feb 16:39 2016
Picon

[Commented] (PIG-4759) Fix Classresolution_1 e2e failure


    [
https://issues.apache.org/jira/browse/PIG-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144720#comment-15144720
] 

Rohini Palaniswamy commented on PIG-4759:
-----------------------------------------

Committed PIG-4759-fix-testfailure.patch to trunk. Thanks for the review Daniel.

> Fix Classresolution_1 e2e failure
> ---------------------------------
>
>                 Key: PIG-4759
>                 URL: https://issues.apache.org/jira/browse/PIG-4759
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4759-1.patch, PIG-4759-2.patch, PIG-4759-fix-testfailure.patch
>
>
>   We had left it as a known issue to be fixed later as that was a very odd and uncommon usage put in just for the
particular testcase - store into a file with one StoreFunc, but read back with a different reader in the
same script. But came across one of our user doing that same case. Storing bags using PigStorage and
reading back with TextLoader and processing them as plain strings later on.

--
(Continue reading)

Rohini Palaniswamy (JIRA | 12 Feb 16:32 2016
Picon

[Commented] (PIG-4728) Compilation against hbase 1.x fails with hbase-hadoop1-compat not found


    [
https://issues.apache.org/jira/browse/PIG-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144709#comment-15144709
] 

Rohini Palaniswamy commented on PIG-4728:
-----------------------------------------

I am not sure about what is required for BigTop. Will leave it to [~daijy] to comment on that. 

Unit tests for HBase has not be fixed to work with HBase 1.1 yet. That does not mean it will not work. There are
users like [~shashikantkulkarni], who have been able to compile against HBase 1.1 and have it work
successfully. But for official certification the unit tests have to be fixed.

> Compilation against hbase 1.x fails with hbase-hadoop1-compat not found
> -----------------------------------------------------------------------
>
>                 Key: PIG-4728
>                 URL: https://issues.apache.org/jira/browse/PIG-4728
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ted Yu
>             Fix For: 0.16.0
>
>
> With the following change:
> {code}
> diff --git a/ivy/libraries.properties b/ivy/libraries.properties
> index c40befd..41ce9fb 100644
> --- a/ivy/libraries.properties
(Continue reading)

jira | 12 Feb 08:00 2016
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key         Summary
PIG-4806    UDFContext can be overwritten with empty properties in the task for inputs and outputs during initialization
            https://issues.apache.org/jira/browse/PIG-4806
PIG-4777    Enable "TestEvalPipelineLocal" for spark
            https://issues.apache.org/jira/browse/PIG-4777
PIG-4745    DataBag should protect content of passed list of tuples
            https://issues.apache.org/jira/browse/PIG-4745
PIG-4734    TOMAP schema inferring breaks some scripts in type checking for bincond
            https://issues.apache.org/jira/browse/PIG-4734
PIG-4684    Exception should be changed to warning when job diagnostics cannot be fetched
            https://issues.apache.org/jira/browse/PIG-4684
PIG-4656    Improve String serialization and comparator performance in BinInterSedes
            https://issues.apache.org/jira/browse/PIG-4656
PIG-4641    Print the instance of Object without using toString()
            https://issues.apache.org/jira/browse/PIG-4641
PIG-4616    Fix UT errors of TestPigRunner in Spark mode
            https://issues.apache.org/jira/browse/PIG-4616
PIG-4598    Allow user defined plan optimizer rules
            https://issues.apache.org/jira/browse/PIG-4598
PIG-4581    thread safe issue in NodeIdGenerator
            https://issues.apache.org/jira/browse/PIG-4581
PIG-4551    Partition filter is not pushed down in case of SPLIT
            https://issues.apache.org/jira/browse/PIG-4551
PIG-4539    New PigUnit
            https://issues.apache.org/jira/browse/PIG-4539
(Continue reading)

Pallavi Rao (JIRA | 12 Feb 07:59 2016
Picon

[Commented] (PIG-4777) Enable "TestEvalPipelineLocal" for spark


    [
https://issues.apache.org/jira/browse/PIG-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144180#comment-15144180
] 

Pallavi Rao commented on PIG-4777:
----------------------------------

+1 for the new patch. [~xuefuz], the patch can be committed.

> Enable "TestEvalPipelineLocal" for spark
> ----------------------------------------
>
>                 Key: PIG-4777
>                 URL: https://issues.apache.org/jira/browse/PIG-4777
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: prateek vaishnav
>             Fix For: spark-branch
>
>         Attachments: test_patch, test_path_v2
>
>
> in latest jenkins
report(https://builds.apache.org/job/Pig-spark/lastUnsuccessfulBuild/#showFailuresLink),
it shows that following unit tests fail:
> org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
(Continue reading)


Gmane