liyunzhang_intel (JIRA | 5 May 08:54 2016
Picon

[Updated] (PIG-4886) Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode


     [
https://issues.apache.org/jira/browse/PIG-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4886:
----------------------------------
    Attachment: PIG-4886.patch

[~kexianda],[~mohitsabharwal],[~pallavi.rao] and [~xuefuz]:
Changes in PIG-4886.patch
1. implement PigSplit#getLocationInfo to fix NPE

> Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode
> ----------------------------------------------------------------------
>
>                 Key: PIG-4886
>                 URL: https://issues.apache.org/jira/browse/PIG-4886
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4886.patch
>
>
> Use branch code(119f313) to test following pig script in spark mode:
> {code}
> A = load './SkewedJoinInput1.txt' as (id,name,n);
(Continue reading)

liyunzhang_intel (JIRA | 5 May 08:50 2016
Picon

[Updated] (PIG-4886) Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode


     [
https://issues.apache.org/jira/browse/PIG-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4886:
----------------------------------
    Description: 
Use branch code(119f313) to test following pig script in spark mode:
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = load './SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name);
store D into './testFRJoin.out';
{code}

cat bin/SkewedJoinInput1.txt 
{noformat}
100	apple1	aaa
200	orange1	bbb
300	strawberry	ccc
{noformat}

cat bin/SkewedJoinInput2.txt 
{noformat}
100	apple1
100	apple2
100	apple2
200	orange1
200	orange2
300	strawberry
(Continue reading)

liyunzhang_intel (JIRA | 5 May 08:50 2016
Picon

[Updated] (PIG-4886) Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode


     [
https://issues.apache.org/jira/browse/PIG-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4886:
----------------------------------
    Description: 
Use branch code(119f313) to test following pig script in spark mode:
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = load './SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name);
store D into './testFRJoin.out';
{code}

cat bin/SkewedJoinInput1.txt 
{noformat}
100	apple1	aaa
200	orange1	bbb
300	strawberry	ccc
{noformat}

cat bin/SkewedJoinInput2.txt 
{noformat}
100	apple1
100	apple2
100	apple2
200	orange1
200	orange2
300	strawberry
(Continue reading)

Prashant Kommireddi (JIRA | 5 May 08:47 2016
Picon

[Updated] (PIG-4724) GROUP ALL must create an output record in case there is no input


     [
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-4724:
-------------------------------------
    Issue Type: Improvement  (was: Bug)

> GROUP ALL must create an output record in case there is no input
> ----------------------------------------------------------------
>
>                 Key: PIG-4724
>                 URL: https://issues.apache.org/jira/browse/PIG-4724
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.16.0
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the expected output should probably
be (ALL, 0). The implementation generates a pseudo key “all” for every input on map side, thus reduce
side we can combine all input together. However, this does not work for 0 input since the reduce side does
not get any input. If the input is empty, yield a pseudo “all, 0” to reduce

(Continue reading)

liyunzhang_intel (JIRA | 5 May 08:45 2016
Picon

[Created] (PIG-4886) Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode

liyunzhang_intel created PIG-4886:
-------------------------------------

             Summary: Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode
                 Key: PIG-4886
                 URL: https://issues.apache.org/jira/browse/PIG-4886
             Project: Pig
          Issue Type: Sub-task
            Reporter: liyunzhang_intel
            Assignee: liyunzhang_intel

Use branch code(119f313) to test following pig script in spark mode:
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = load './SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name);
store D into './testFRJoin.out';
{code}

cat bin/SkewedJoinInput1.txt 
{noformat}
100	apple1	aaa
200	orange1	bbb
300	strawberry	ccc
{noformat}

cat bin/SkewedJoinInput2.txt 
{noformat}
100	apple1
100	apple2
(Continue reading)

Daniel Dai (JIRA | 5 May 08:08 2016
Picon

[Commented] (PIG-4724) GROUP ALL must create an output record in case there is no input


    [
https://issues.apache.org/jira/browse/PIG-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271935#comment-15271935
] 

Daniel Dai commented on PIG-4724:
---------------------------------

This has to be done through PigGenericMapReduce.cleanup, since no input will feed to date pipeline. We
shall maintain a flag indicating this is a group all query (refer to GroupByConstParallelSetter.java),
and if there is no input (in PigGenericMapReduce.reduce), we shall generate a psudo record.

> GROUP ALL must create an output record in case there is no input
> ----------------------------------------------------------------
>
>                 Key: PIG-4724
>                 URL: https://issues.apache.org/jira/browse/PIG-4724
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Prashant Kommireddi
>            Assignee: Prashant Kommireddi
>
> {code}
> A = load 'data';
> B = filter A by $0 == 'THIS_DOES_NOT_EXIST';
> C = group B ALL;
> D = foreach C generate group, COUNT(B);
> {code}
> Even if the filter did not output any rows, since we are grouping on ALL the expected output should probably
(Continue reading)

jira | 5 May 08:00 2016
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key         Summary
PIG-4882    Remove hardcoded groovy.grape.report.downloads=true from DownloadResolver
            https://issues.apache.org/jira/browse/PIG-4882
PIG-4880    Overlapping of parameter substitution names inside&outside a macro fails with NPE
            https://issues.apache.org/jira/browse/PIG-4880
PIG-4879    Pull latest version of joda-time
            https://issues.apache.org/jira/browse/PIG-4879
PIG-4874    Remove schema tuple reference overhead for replicate join hashmap
            https://issues.apache.org/jira/browse/PIG-4874
PIG-4873    InputSplit.getLocations return null and result a NPE in Pig
            https://issues.apache.org/jira/browse/PIG-4873
PIG-4871     Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark
            https://issues.apache.org/jira/browse/PIG-4871
PIG-4854    Merge spark branch to trunk
            https://issues.apache.org/jira/browse/PIG-4854
PIG-4849    pig on tez will cause tez-ui to crash,because the content from timeline server is too long. 
            https://issues.apache.org/jira/browse/PIG-4849
PIG-4797    Analyze JOIN performance and improve the same.
            https://issues.apache.org/jira/browse/PIG-4797
PIG-4788    the value BytesRead metric info always returns 0 even the length of input file is not 0 in spark engine
            https://issues.apache.org/jira/browse/PIG-4788
PIG-4771    Implement FR Join for spark engine
            https://issues.apache.org/jira/browse/PIG-4771
PIG-4745    DataBag should protect content of passed list of tuples
            https://issues.apache.org/jira/browse/PIG-4745
(Continue reading)

liyunzhang_intel (JIRA | 4 May 10:52 2016
Picon

[Resolved] (PIG-4290) Enable unit test "TestMapSideCogroup" for spark


     [
https://issues.apache.org/jira/browse/PIG-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel resolved PIG-4290.
-----------------------------------
    Resolution: Fixed

> Enable unit test "TestMapSideCogroup" for spark
> -----------------------------------------------
>
>                 Key: PIG-4290
>                 URL: https://issues.apache.org/jira/browse/PIG-4290
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: TEST-org.apache.pig.test.TestMapSideCogroup.txt
>
>
> error log is attached

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

(Continue reading)

liyunzhang_intel (JIRA | 4 May 10:51 2016
Picon

[Resolved] (PIG-4778) Enable “TestStoreLocal" for spark


     [
https://issues.apache.org/jira/browse/PIG-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel resolved PIG-4778.
-----------------------------------
    Resolution: Fixed

> Enable “TestStoreLocal" for spark
> ---------------------------------
>
>                 Key: PIG-4778
>                 URL: https://issues.apache.org/jira/browse/PIG-4778
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> in latest
jenkins(https://builds.apache.org/job/Pig-spark/lastUnsuccessfulBuild/#showFailuresLink)
report, it shows that following  unit tests fail:
> org.apache.pig.test.TestStoreLocal.testCleanupOnFailureMultiStore
> org.apache.pig.test.TestStoreLocal.testCleanupOnFailure

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
(Continue reading)

liyunzhang_intel (JIRA | 4 May 08:03 2016
Picon

[Updated] (PIG-4553) Implement secondary sort using 1 shuffle not twice


     [
https://issues.apache.org/jira/browse/PIG-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4553:
----------------------------------
    Attachment: PIG-4553.patch

> Implement secondary sort using 1 shuffle not twice
> --------------------------------------------------
>
>                 Key: PIG-4553
>                 URL: https://issues.apache.org/jira/browse/PIG-4553
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4553.patch
>
>
> Now we implement secondary key sort in 
> GlobalRearrangeConverter#convert
> first shuffle in repartitionAndSortWithinPartitions second shuffle in groupBy
> {code}
> public RDD<Tuple> convert(List<RDD<Tuple>> predecessors,
>                               POGlobalRearrangeSpark physicalOperator) throws IOException {
> ....
(Continue reading)

jira | 4 May 08:00 2016
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key         Summary
PIG-4882    Remove hardcoded groovy.grape.report.downloads=true from DownloadResolver
            https://issues.apache.org/jira/browse/PIG-4882
PIG-4880    Overlapping of parameter substitution names inside&outside a macro fails with NPE
            https://issues.apache.org/jira/browse/PIG-4880
PIG-4879    Pull latest version of joda-time
            https://issues.apache.org/jira/browse/PIG-4879
PIG-4874    Remove schema tuple reference overhead for replicate join hashmap
            https://issues.apache.org/jira/browse/PIG-4874
PIG-4873    InputSplit.getLocations return null and result a NPE in Pig
            https://issues.apache.org/jira/browse/PIG-4873
PIG-4871     Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark
            https://issues.apache.org/jira/browse/PIG-4871
PIG-4854    Merge spark branch to trunk
            https://issues.apache.org/jira/browse/PIG-4854
PIG-4849    pig on tez will cause tez-ui to crash,because the content from timeline server is too long. 
            https://issues.apache.org/jira/browse/PIG-4849
PIG-4797    Analyze JOIN performance and improve the same.
            https://issues.apache.org/jira/browse/PIG-4797
PIG-4788    the value BytesRead metric info always returns 0 even the length of input file is not 0 in spark engine
            https://issues.apache.org/jira/browse/PIG-4788
PIG-4771    Implement FR Join for spark engine
            https://issues.apache.org/jira/browse/PIG-4771
PIG-4745    DataBag should protect content of passed list of tuples
            https://issues.apache.org/jira/browse/PIG-4745
(Continue reading)


Gmane