Brian Johnson (JIRA | 3 Mar 15:30 2015
Picon

[Commented] (PIG-4408) Merge join should support replicated join as a predecessor


    [
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345115#comment-14345115
] 

Brian Johnson commented on PIG-4408:
------------------------------------

That's very strange. Daniel's patch looks like it's doing the same thing as my original patch and all the
tests passed for me for that as well as the second patch I submitted. I created the test case that is failing
for you and submitted the patch for it with PIG-4166, what happens if you revert the change to
POCollectedGroup? Alternately we would use the second patch I submitted which makes a much smaller
change to the process.

> Merge join should support replicated join as a predecessor
> ----------------------------------------------------------
>
>                 Key: PIG-4408
>                 URL: https://issues.apache.org/jira/browse/PIG-4408
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.14.0
>            Reporter: Brian Johnson
>            Assignee: Brian Johnson
>             Fix For: 0.15.0
>
>         Attachments: PIG-4408-3.patch, patch, patch
>
>
> Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work
(Continue reading)

Praveen Rachabattuni (JIRA | 3 Mar 10:30 2015
Picon

[Updated] (PIG-4193) Make collected group work with Spark


     [
https://issues.apache.org/jira/browse/PIG-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Praveen Rachabattuni updated PIG-4193:
--------------------------------------
    Status: Open  (was: Patch Available)

> Make collected group work with Spark
> ------------------------------------
>
>                 Key: PIG-4193
>                 URL: https://issues.apache.org/jira/browse/PIG-4193
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: Praveen Rachabattuni
>             Fix For: spark-branch
>
>         Attachments: PIG-4193-1.patch
>
>
> Related e2e tests: CollectedGroup_1 - CollectedGroup_6
> Sample script:
> a = load '/user/pig/tests/data/singlefile/studenttab10k'; 
>                             b = order a by $0;
>                             store b into '/user/pig/out/praveenr-1411383735-nightly.conf/CollectedGroup_1.out.intermediate';
>                             exec;
>                        	    register ./lib/java/testudf.jar;	     
(Continue reading)

Praveen Rachabattuni (JIRA | 3 Mar 10:30 2015
Picon

[Commented] (PIG-4193) Make collected group work with Spark


    [
https://issues.apache.org/jira/browse/PIG-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344812#comment-14344812
] 

Praveen Rachabattuni commented on PIG-4193:
-------------------------------------------

Currently, PIG-4193-1.patch makes a change in POCollectedGroup. Would need to revert this change and
figure out a workaround for this.

> Make collected group work with Spark
> ------------------------------------
>
>                 Key: PIG-4193
>                 URL: https://issues.apache.org/jira/browse/PIG-4193
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: Praveen Rachabattuni
>             Fix For: spark-branch
>
>         Attachments: PIG-4193-1.patch
>
>
> Related e2e tests: CollectedGroup_1 - CollectedGroup_6
> Sample script:
> a = load '/user/pig/tests/data/singlefile/studenttab10k'; 
>                             b = order a by $0;
(Continue reading)

Ratandeep Ratti (JIRA | 3 Mar 09:11 2015
Picon

[Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.


    [
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344711#comment-14344711
] 

Ratandeep Ratti commented on PIG-4417:
--------------------------------------

I think we should refrain from configuring the location of ivy settings file. It is better to always pick it
from PIG_CONF_DIR/ivysettings.xml. In this way, users will not have a backdoor entry to provide  their
own ivy settings.

Having a standard location will help make sure that resources /dependencies are only being downloaded
from configured repositories (For instance  admins may want to configure the ivy settings so that no
resources are downloaded from external repos and only make use of internal repositories.

> Pig's register command should support automatic fetching of jars from repo.
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4417
>                 URL: https://issues.apache.org/jira/browse/PIG-4417
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Akshay Rai
>
> Currently Pig's register command takes a local path to a dependency jar . 
> This clutters the local file-system as users may forget to remove this jar later.
> It would be nice if Pig supported a Gradle like notation to download the jar from a repository.
> Ex: At the top of the Pig script a user could add
> register '<group>:<module>:<version>'; 
(Continue reading)

Akshay Rai (JIRA | 3 Mar 09:11 2015
Picon

[Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.


    [
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344709#comment-14344709
] 

Akshay Rai commented on PIG-4417:
---------------------------------

RB Link: https://reviews.apache.org/r/31662/

> Pig's register command should support automatic fetching of jars from repo.
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4417
>                 URL: https://issues.apache.org/jira/browse/PIG-4417
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Akshay Rai
>
> Currently Pig's register command takes a local path to a dependency jar . 
> This clutters the local file-system as users may forget to remove this jar later.
> It would be nice if Pig supported a Gradle like notation to download the jar from a repository.
> Ex: At the top of the Pig script a user could add
> register '<group>:<module>:<version>'; 
> It should be backward compatible and should support a local file path if so desired.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

(Continue reading)

Akshay Rai (JIRA | 3 Mar 08:06 2015
Picon

[Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.


    [
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344645#comment-14344645
] 

Akshay Rai commented on PIG-4417:
---------------------------------

To add to the above doc, 
4. The location of the ivysettings file can also be configured using IVY_SETTINGS_LOCATION.

> Pig's register command should support automatic fetching of jars from repo.
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4417
>                 URL: https://issues.apache.org/jira/browse/PIG-4417
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Akshay Rai
>
> Currently Pig's register command takes a local path to a dependency jar . 
> This clutters the local file-system as users may forget to remove this jar later.
> It would be nice if Pig supported a Gradle like notation to download the jar from a repository.
> Ex: At the top of the Pig script a user could add
> register '<group>:<module>:<version>'; 
> It should be backward compatible and should support a local file path if so desired.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
(Continue reading)

jira | 3 Mar 08:00 2015
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (21 issues)

Subscriber: pigdaily

Key         Summary
PIG-4377    Skewed outer join produce wrong result in some cases
            https://issues.apache.org/jira/browse/PIG-4377
PIG-4341    Add CMX support to pig.tmpfilecompression.codec
            https://issues.apache.org/jira/browse/PIG-4341
PIG-4323    PackageConverter hanging in Spark
            https://issues.apache.org/jira/browse/PIG-4323
PIG-4313    StackOverflowError in LIMIT operation on Spark
            https://issues.apache.org/jira/browse/PIG-4313
PIG-4300    Enable unit test "TestSample" for spark
            https://issues.apache.org/jira/browse/PIG-4300
PIG-4287    Enable unit test "TestLimitVariable" for spark
            https://issues.apache.org/jira/browse/PIG-4287
PIG-4251    Pig on Storm
            https://issues.apache.org/jira/browse/PIG-4251
PIG-4193    Make collected group work with Spark
            https://issues.apache.org/jira/browse/PIG-4193
PIG-4111    Make Pig compiles with avro-1.7.7
            https://issues.apache.org/jira/browse/PIG-4111
PIG-4004    Upgrade the Pigmix queries from the (old) mapred API to mapreduce
            https://issues.apache.org/jira/browse/PIG-4004
PIG-4002    Disable combiner when map-side aggregation is used
            https://issues.apache.org/jira/browse/PIG-4002
PIG-3952    PigStorage accepts '-tagSplit' to return full split information
            https://issues.apache.org/jira/browse/PIG-3952
(Continue reading)

Daniel Dai (JIRA | 3 Mar 07:36 2015
Picon

[Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.


    [
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344619#comment-14344619
] 

Daniel Dai commented on PIG-4442:
---------------------------------

Sorry Prashant, accidentally being fast :)

> Eliminate redundant RPC call to get file information in HPath.
> --------------------------------------------------------------
>
>                 Key: PIG-4442
>                 URL: https://issues.apache.org/jira/browse/PIG-4442
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>             Fix For: 0.15.0
>
>         Attachments: PIG-4442.001.patch
>
>
> The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to get the block size and
replication.  In the case of HDFS, this results in 2 separate but identical RPC transactions with the
NameNode.  The situation is the same for many other alternative {{FileSystem}} implementations too.  We
can get a minor latency improvement and lighten some RPC load on the remote services by using a single call
(Continue reading)

Praveen Rachabattuni (JIRA | 3 Mar 05:08 2015
Picon

[Reopened] (PIG-4374) Add SparkPlan in spark package


     [
https://issues.apache.org/jira/browse/PIG-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Praveen Rachabattuni reopened PIG-4374:
---------------------------------------

Re-opening jira for follow-up clean up tasks.

> Add SparkPlan in spark package
> ------------------------------
>
>                 Key: PIG-4374
>                 URL: https://issues.apache.org/jira/browse/PIG-4374
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: How to change a physical plan to a spark plan.docx, PIG-4374_1.patch,
PIG-4374_2.patch.1, PIG-4374_3.patch.1, PIG-4374_4.patch,
PIG-4374_4.patch_Jenkins_result.png, PIG-4374_6.patch, PIG-4374_6_patch_jekins.png,
PIG-4374_7.patch, PIG-4374_8.patch, PIG-4374_8.patch.jenkins.png, Pig-spark #76 [Jenkins .png, jenkins_PIG-4374_1_patch.png.png
>
>
> in current code, there are following class in mapreduce and tez package.
> mapreduce:
> MRCompiler
(Continue reading)

Prashant Kommireddi (JIRA | 3 Mar 02:25 2015
Picon

[Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.


    [
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344249#comment-14344249
] 

Prashant Kommireddi commented on PIG-4442:
------------------------------------------

[~daijy] beat me to it :)

> Eliminate redundant RPC call to get file information in HPath.
> --------------------------------------------------------------
>
>                 Key: PIG-4442
>                 URL: https://issues.apache.org/jira/browse/PIG-4442
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>             Fix For: 0.15.0
>
>         Attachments: PIG-4442.001.patch
>
>
> The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to get the block size and
replication.  In the case of HDFS, this results in 2 separate but identical RPC transactions with the
NameNode.  The situation is the same for many other alternative {{FileSystem}} implementations too.  We
can get a minor latency improvement and lighten some RPC load on the remote services by using a single call
(Continue reading)

Prashant Kommireddi (JIRA | 3 Mar 02:25 2015
Picon

[Commented] (PIG-4442) Eliminate redundant RPC call to get file information in HPath.


    [
https://issues.apache.org/jira/browse/PIG-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344248#comment-14344248
] 

Prashant Kommireddi commented on PIG-4442:
------------------------------------------

+1. Will commit this shortly.

> Eliminate redundant RPC call to get file information in HPath.
> --------------------------------------------------------------
>
>                 Key: PIG-4442
>                 URL: https://issues.apache.org/jira/browse/PIG-4442
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>             Fix For: 0.15.0
>
>         Attachments: PIG-4442.001.patch
>
>
> The {{HPath}} class makes 2 separate calls to {{FileSystem#getFileStatus}} to get the block size and
replication.  In the case of HDFS, this results in 2 separate but identical RPC transactions with the
NameNode.  The situation is the same for many other alternative {{FileSystem}} implementations too.  We
can get a minor latency improvement and lighten some RPC load on the remote services by using a single call
(Continue reading)


Gmane