jira | 29 Mar 08:01 2015
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (24 issues)

Subscriber: pigdaily

Key         Summary
PIG-4481    e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and 
StreamingPerformance_4 produce different result on Windows
            https://issues.apache.org/jira/browse/PIG-4481
PIG-4455    Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter
            https://issues.apache.org/jira/browse/PIG-4455
PIG-4452    Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error
            https://issues.apache.org/jira/browse/PIG-4452
PIG-4425    Upgrade to Spark 1.3
            https://issues.apache.org/jira/browse/PIG-4425
PIG-4422    Implement visitMergeJoin in SparkCompiler
            https://issues.apache.org/jira/browse/PIG-4422
PIG-4377    Skewed outer join produce wrong result in some cases
            https://issues.apache.org/jira/browse/PIG-4377
PIG-4341    Add CMX support to pig.tmpfilecompression.codec
            https://issues.apache.org/jira/browse/PIG-4341
PIG-4323    PackageConverter hanging in Spark
            https://issues.apache.org/jira/browse/PIG-4323
PIG-4313    StackOverflowError in LIMIT operation on Spark
            https://issues.apache.org/jira/browse/PIG-4313
PIG-4251    Pig on Storm
            https://issues.apache.org/jira/browse/PIG-4251
PIG-4193    Make collected group work with Spark
            https://issues.apache.org/jira/browse/PIG-4193
PIG-4111    Make Pig compiles with avro-1.7.7
(Continue reading)

Hao Zhu (JIRA | 29 Mar 01:40 2015
Picon

[Commented] (PIG-4485) Can Pig disable RandomSampleLoader when doing "Order by"


    [
https://issues.apache.org/jira/browse/PIG-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385583#comment-14385583
] 

Hao Zhu commented on PIG-4485:
------------------------------

BTW: I have confirmed this behavior on Pig 0.12 on CDH 5.3 and also Pig 0.13 on MapR 4.0.1.

> Can Pig disable RandomSampleLoader when doing "Order by"
> --------------------------------------------------------
>
>                 Key: PIG-4485
>                 URL: https://issues.apache.org/jira/browse/PIG-4485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Hao Zhu
>            Priority: Critical
>
> When reading parquet files with "order by":
> {code}
> a = load '/xxx/xxx/parquet/xxx.parquet' using ParquetLoader();
> b = order a by col1 ;
> c = limit b 100 ;
> dump c
> {code}
> Pig spawns a Sampler job always in the begining:
> {code}
(Continue reading)

Apache Jenkins Server | 28 Mar 14:41 2015
Picon

Jenkins build is back to normal : Pig-trunk-commit #2078

See <https://builds.apache.org/job/Pig-trunk-commit/2078/>

Praveen Rachabattuni (JIRA | 28 Mar 09:58 2015
Picon

[Commented] (PIG-4425) Upgrade to Spark 1.3


    [
https://issues.apache.org/jira/browse/PIG-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385199#comment-14385199
] 

Praveen Rachabattuni commented on PIG-4425:
-------------------------------------------

+1. Committed to Spark branch. Thanks [~mohitsabharwal]

Shall mark this resolved once we have the jenkins report.

> Upgrade to Spark 1.3
> --------------------
>
>                 Key: PIG-4425
>                 URL: https://issues.apache.org/jira/browse/PIG-4425
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>              Labels: spark
>             Fix For: spark-branch
>
>         Attachments: PIG-4425.patch
>
>
> Needed for improvements like SPARK-4874. Several unit tests need number of records in output file.

(Continue reading)

jira | 28 Mar 08:01 2015
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (24 issues)

Subscriber: pigdaily

Key         Summary
PIG-4481    e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and 
StreamingPerformance_4 produce different result on Windows
            https://issues.apache.org/jira/browse/PIG-4481
PIG-4455    Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter
            https://issues.apache.org/jira/browse/PIG-4455
PIG-4452    Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error
            https://issues.apache.org/jira/browse/PIG-4452
PIG-4425    Upgrade to Spark 1.3
            https://issues.apache.org/jira/browse/PIG-4425
PIG-4422    Implement visitMergeJoin in SparkCompiler
            https://issues.apache.org/jira/browse/PIG-4422
PIG-4377    Skewed outer join produce wrong result in some cases
            https://issues.apache.org/jira/browse/PIG-4377
PIG-4341    Add CMX support to pig.tmpfilecompression.codec
            https://issues.apache.org/jira/browse/PIG-4341
PIG-4323    PackageConverter hanging in Spark
            https://issues.apache.org/jira/browse/PIG-4323
PIG-4313    StackOverflowError in LIMIT operation on Spark
            https://issues.apache.org/jira/browse/PIG-4313
PIG-4251    Pig on Storm
            https://issues.apache.org/jira/browse/PIG-4251
PIG-4193    Make collected group work with Spark
            https://issues.apache.org/jira/browse/PIG-4193
PIG-4111    Make Pig compiles with avro-1.7.7
(Continue reading)

Hao Zhu (JIRA | 28 Mar 00:15 2015
Picon

[Commented] (PIG-4485) Can Pig disable RandomSampleLoader when doing "Order by"


    [
https://issues.apache.org/jira/browse/PIG-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384875#comment-14384875
] 

Hao Zhu commented on PIG-4485:
------------------------------

BTW, even if we " set pig.random.sampler.sample.size 0" in pig grunt, it still launches the sampler with 0
row sampled:

{code}
Input(s):
Successfully sampled 0 records from: "/user/xxx/xxx/parquet/xxx/xxx.parquet"
Successfully read 1111 records from: "/user/xxx/xxx/parquet/xxx/xxx.parquet"
{code}

So I do not understand why can't we disable it ?

> Can Pig disable RandomSampleLoader when doing "Order by"
> --------------------------------------------------------
>
>                 Key: PIG-4485
>                 URL: https://issues.apache.org/jira/browse/PIG-4485
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Hao Zhu
>            Priority: Critical
>
(Continue reading)

Mohit Sabharwal (JIRA | 28 Mar 00:06 2015
Picon

[Commented] (PIG-4425) Upgrade to Spark 1.3


    [
https://issues.apache.org/jira/browse/PIG-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384863#comment-14384863
] 

Mohit Sabharwal commented on PIG-4425:
--------------------------------------

Upgraded Spark dependency in ivy. And added implementations for the two new SparkListener methods in JobMetricsListener.

[~praveenr019], [~kellyzly], could you please review the patch ? thanks.

> Upgrade to Spark 1.3
> --------------------
>
>                 Key: PIG-4425
>                 URL: https://issues.apache.org/jira/browse/PIG-4425
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>              Labels: spark
>             Fix For: spark-branch
>
>         Attachments: PIG-4425.patch
>
>
> Needed for improvements like SPARK-4874. Several unit tests need number of records in output file.

(Continue reading)

Mohit Sabharwal (JIRA | 28 Mar 00:03 2015
Picon

[Updated] (PIG-4425) Upgrade to Spark 1.3


     [
https://issues.apache.org/jira/browse/PIG-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mohit Sabharwal updated PIG-4425:
---------------------------------
    Status: Patch Available  (was: Open)

> Upgrade to Spark 1.3
> --------------------
>
>                 Key: PIG-4425
>                 URL: https://issues.apache.org/jira/browse/PIG-4425
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>              Labels: spark
>             Fix For: spark-branch
>
>         Attachments: PIG-4425.patch
>
>
> Needed for improvements like SPARK-4874. Several unit tests need number of records in output file.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

(Continue reading)

Mohit Sabharwal (JIRA | 28 Mar 00:03 2015
Picon

[Updated] (PIG-4425) Upgrade to Spark 1.3


     [
https://issues.apache.org/jira/browse/PIG-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mohit Sabharwal updated PIG-4425:
---------------------------------
    Attachment: PIG-4425.patch

> Upgrade to Spark 1.3
> --------------------
>
>                 Key: PIG-4425
>                 URL: https://issues.apache.org/jira/browse/PIG-4425
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>              Labels: spark
>             Fix For: spark-branch
>
>         Attachments: PIG-4425.patch
>
>
> Needed for improvements like SPARK-4874. Several unit tests need number of records in output file.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

(Continue reading)

Hao Zhu (JIRA | 27 Mar 23:32 2015
Picon

[Created] (PIG-4485) Can Pig disable RandomSampleLoader when doing "Order by"

Hao Zhu created PIG-4485:
----------------------------

             Summary: Can Pig disable RandomSampleLoader when doing "Order by"
                 Key: PIG-4485
                 URL: https://issues.apache.org/jira/browse/PIG-4485
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.13.0
            Reporter: Hao Zhu
            Priority: Critical

When reading parquet files with "order by":
{code}
a = load '/xxx/xxx/parquet/xxx.parquet' using ParquetLoader();
b = order a by col1 ;
c = limit b 100 ;
dump c
{code}

Pig spawns a Sampler job always in the begining:
{code}
Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTIme	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1426804645147_1270	1	1	8	8	8	8	4	4	4	4	b	SAMPLER
job_1426804645147_1271	1	1	10	10	10	10	4	4	4	4	b	ORDER_BY,COMBINER
job_1426804645147_1272	1	1	2	2	2	2	4	4	4	4	b		hdfs:/tmp/temp-xxx/tmp-xxx,
{code}

The issue is when reading lots of files, the first sampler job can take a long time to finish.
(Continue reading)

Mohit Sabharwal (JIRA | 27 Mar 21:04 2015
Picon

[Commented] (PIG-4422) Implement visitMergeJoin in SparkCompiler


    [
https://issues.apache.org/jira/browse/PIG-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384512#comment-14384512
] 

Mohit Sabharwal commented on PIG-4422:
--------------------------------------

Thanks, [~praveenr019]

+1 for the patch after addressing RB comments.

> Implement visitMergeJoin in SparkCompiler
> -----------------------------------------
>
>                 Key: PIG-4422
>                 URL: https://issues.apache.org/jira/browse/PIG-4422
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Praveen Rachabattuni
>             Fix For: spark-branch
>
>
> in PIG-4374_6.patch. SparkCompiler#visitMergeJoin is marked "TODO"

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
(Continue reading)


Gmane