Rohini Palaniswamy (JIRA | 25 Nov 22:23 2014
Picon

[Commented] (PIG-4232) UDFContext is not initialized in executors when running on Spark cluster


    [
https://issues.apache.org/jira/browse/PIG-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225227#comment-14225227
] 

Rohini Palaniswamy commented on PIG-4232:
-----------------------------------------

Adding a getter and setter is not at all an issue. I am good with that. Will leave it to [~praveenr019] to
review the Spark related changes and commit the patch.

> UDFContext is not initialized in executors when running on Spark cluster
> ------------------------------------------------------------------------
>
>                 Key: PIG-4232
>                 URL: https://issues.apache.org/jira/browse/PIG-4232
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4232.patch, PIG-4232_1.patch, PIG-4232_2.patch, PIG-4232_3.patch,
PIG-4232_4.patch, pouserfunc.readObject.stacktrace.txt
>
>
> UDFContext is used in lot of features across pig code base. For example its used in PigStorage to pass
columns information between the frontend and the backend code. https://github.com/apache/pig/blob/spark/src/org/apache/pig/builtin/PigStorage.java#L246-L247

--
This message was sent by Atlassian JIRA
(Continue reading)

Cheolsoo Park (JIRA | 25 Nov 21:23 2014
Picon

[Updated] (PIG-4343) Tez auto parallelism fails at query compile time


     [
https://issues.apache.org/jira/browse/PIG-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-4343:
-------------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: PIG-3446

> Tez auto parallelism fails at query compile time
> ------------------------------------------------
>
>                 Key: PIG-4343
>                 URL: https://issues.apache.org/jira/browse/PIG-4343
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.14.0
>            Reporter: Cheolsoo Park
>
> I was running some legacy MR jobs in Tez mode to do perf benchmarks. But when
{{pig.tez.auto.parallelism}} is enabled (by default), Pig fails with the following error-
> {code}
> org.apache.pig.impl.plan.VisitorException: ERROR 0: java.io.IOException: Cannot estimate
parallelism for scope-892, effective parallelism for predecessor scope-892 is -1
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter.visitTezOp(ParallelismSetter.java:189)
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:232)
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:49)
>     at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
>     at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>     at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:429)
(Continue reading)

Daniel Dai (JIRA | 25 Nov 19:59 2014
Picon

[Commented] (PIG-2599) Mavenize Pig


    [
https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225006#comment-14225006
] 

Daniel Dai commented on PIG-2599:
---------------------------------

[~brocknoland], I don't think anyone is actively working on it. Do you have interest to help?

> Mavenize Pig
> ------------
>
>                 Key: PIG-2599
>                 URL: https://issues.apache.org/jira/browse/PIG-2599
>             Project: Pig
>          Issue Type: New Feature
>          Components: build
>            Reporter: Daniel Dai
>            Assignee: Vimuth Fernando
>              Labels: gsoc2014
>             Fix For: 0.15.0
>
>         Attachments: PIG-2599-wip.zip, maven-pig.1.zip, maven-wip.xml
>
>
> Switch Pig build system from ant to maven.
> This is a candidate project for Google summer of code 2014. More information about the program can be found
at https://cwiki.apache.org/confluence/display/PIG/GSoc2014

(Continue reading)

Michael Prim (JIRA | 25 Nov 09:48 2014
Picon

[Commented] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records


    [
https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224242#comment-14224242
] 

Michael Prim commented on PIG-4326:
-----------------------------------

The supportForMapsOfArraysOfRecords.patch file is the updated one, should I delete the first attempts
from the attached files? 

> AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4326
>                 URL: https://issues.apache.org/jira/browse/PIG-4326
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Michael Prim
>            Assignee: Michael Prim
>             Fix For: 0.15.0
>
>         Attachments: PIG-4326-0.patch, mapsOfArraysOfRecords.patch, supportForMapsOfArraysOfRecords.patch
>
>
> I tried to convert the avro schema of a map of arrays of records into the proper pig schema and got always
empty map schemas in pig.
> The reason is that the AvroStorageSchemaConversionUtilities does only assume records or primitive
(Continue reading)

liyunzhang_intel (JIRA | 25 Nov 09:45 2014
Picon

[Commented] (PIG-4232) UDFContext is not initialized in executors when running on Spark cluster


    [
https://issues.apache.org/jira/browse/PIG-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224236#comment-14224236
] 

liyunzhang_intel commented on PIG-4232:
---------------------------------------

Hi [~rohini], Can you help review PIG-4232_4.patch?  I have mailed the detail problem to rohini.aditya@...

> UDFContext is not initialized in executors when running on Spark cluster
> ------------------------------------------------------------------------
>
>                 Key: PIG-4232
>                 URL: https://issues.apache.org/jira/browse/PIG-4232
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4232.patch, PIG-4232_1.patch, PIG-4232_2.patch, PIG-4232_3.patch,
PIG-4232_4.patch, pouserfunc.readObject.stacktrace.txt
>
>
> UDFContext is used in lot of features across pig code base. For example its used in PigStorage to pass
columns information between the frontend and the backend code. https://github.com/apache/pig/blob/spark/src/org/apache/pig/builtin/PigStorage.java#L246-L247

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
(Continue reading)

Zhang, Liyun | 25 Nov 09:41 2014
Picon

Can you help review PIG-4232_4.patch

Hi Rohini:
   We met some problem when fixing jira PIG-4232(https://issues.apache.org/jira/browse/PIG-4232):
UDFContext is not initialized in executors when running on Spark cluster.
Now patch PIG-4232_4.patch contains some modification out of
org.apache.pig.backend.hadoop.executionengine.spark package.

  POUserFunc.java
        +  public void setFuncInputSchema(){
        +         setFuncInputSchema(signature);
        +   }

Why need add setFuncInputSchema in POUserFunc.java?
The reason why add setFuncInputSchema is to call setFuncInputSchema(String signature) in spark. In
previous
code,
POUserFunc#readObject->POUserFunc#instantiateFunc(FuncSpec)->POUserFunc#setFuncInputSchema(String)->
EvalFunc#setInputSchema is called when variable "tmpS" is not null. But in spark mode, tmpS is null which
means UDFContext is not initialized by configuration file when POUserFunc#readObject is called.

POUserFunc#setFuncInputSchema

   public void setFuncInputSchema(String signature) {

        Properties props = UDFContext.getUDFContext().getUDFProperties(func.getClass());

        Schema tmpS=(Schema)props.get("pig.evalfunc.inputschema."+signature);

        if(tmpS!=null) {  // tmpS is null when
PoUserFunc#readObject->POUserFunc#instantiateFunc(FuncSpec)->POUserFunc#setFuncInputSchema(String)
is called
(Continue reading)

jira | 25 Nov 08:00 2014
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (22 issues)

Subscriber: pigdaily

Key         Summary
PIG-4340    PigStorage fails parsing empty map.
            https://issues.apache.org/jira/browse/PIG-4340
PIG-4326    AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
            https://issues.apache.org/jira/browse/PIG-4326
PIG-4313    StackOverflowError in LIMIT operation on Spark
            https://issues.apache.org/jira/browse/PIG-4313
PIG-4264    Port TestAvroStorage to tez local mode
            https://issues.apache.org/jira/browse/PIG-4264
PIG-4251    Pig on Storm
            https://issues.apache.org/jira/browse/PIG-4251
PIG-4239    "pig.output.lazy" not works in spark mode
            https://issues.apache.org/jira/browse/PIG-4239
PIG-4207    Make python udfs work with Spark
            https://issues.apache.org/jira/browse/PIG-4207
PIG-4111    Make Pig compiles with avro-1.7.7
            https://issues.apache.org/jira/browse/PIG-4111
PIG-4103    Fix TestRegisteredJarVisibility(after PIG-4083)
            https://issues.apache.org/jira/browse/PIG-4103
PIG-4066    An optimization for ROLLUP operation in Pig
            https://issues.apache.org/jira/browse/PIG-4066
PIG-4004    Upgrade the Pigmix queries from the (old) mapred API to mapreduce
            https://issues.apache.org/jira/browse/PIG-4004
PIG-4002    Disable combiner when map-side aggregation is used
            https://issues.apache.org/jira/browse/PIG-4002
(Continue reading)

Cheolsoo Park (JIRA | 24 Nov 23:45 2014
Picon

[Commented] (PIG-4343) Tez auto parallelism fails at query compile time


    [
https://issues.apache.org/jira/browse/PIG-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223693#comment-14223693
] 

Cheolsoo Park commented on PIG-4343:
------------------------------------

My workaround is to disable {{pig.tez.auto.parallelism}}, but I think it would be nicer if it is
automatically disabled instead of failing when parallelism cannot be estimated.

> Tez auto parallelism fails at query compile time
> ------------------------------------------------
>
>                 Key: PIG-4343
>                 URL: https://issues.apache.org/jira/browse/PIG-4343
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Cheolsoo Park
>
> I was running some legacy MR jobs in Tez mode to do perf benchmarks. But when
{{pig.tez.auto.parallelism}} is enabled (by default), Pig fails with the following error-
> {code}
> org.apache.pig.impl.plan.VisitorException: ERROR 0: java.io.IOException: Cannot estimate
parallelism for scope-892, effective parallelism for predecessor scope-892 is -1
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter.visitTezOp(ParallelismSetter.java:189)
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:232)
>     at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:49)
>     at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
(Continue reading)

Cheolsoo Park (JIRA | 24 Nov 23:43 2014
Picon

[Created] (PIG-4343) Tez auto parallelism fails at query compile time

Cheolsoo Park created PIG-4343:
----------------------------------

             Summary: Tez auto parallelism fails at query compile time
                 Key: PIG-4343
                 URL: https://issues.apache.org/jira/browse/PIG-4343
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: Cheolsoo Park

I was running some legacy MR jobs in Tez mode to do perf benchmarks. But when {{pig.tez.auto.parallelism}}
is enabled (by default), Pig fails with the following error-
{code}
org.apache.pig.impl.plan.VisitorException: ERROR 0: java.io.IOException: Cannot estimate
parallelism for scope-892, effective parallelism for predecessor scope-892 is -1
    at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter.visitTezOp(ParallelismSetter.java:189)
    at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:232)
    at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:49)
    at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:429)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:143)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
    at org.apache.pig.LipstickPigServer.launchPlan(LipstickPigServer.java:151)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
    at org.apache.pig.PigServer.execute(PigServer.java:1364)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
(Continue reading)

Daniel Dai (JIRA | 24 Nov 23:18 2014
Picon

[Updated] (PIG-4341) Add CMX support to pig.tmpfilecompression.codec


     [
https://issues.apache.org/jira/browse/PIG-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-4341:
----------------------------
    Fix Version/s:     (was: site)
                   0.15.0

> Add CMX support to pig.tmpfilecompression.codec
> -----------------------------------------------
>
>                 Key: PIG-4341
>                 URL: https://issues.apache.org/jira/browse/PIG-4341
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.13.0
>            Reporter: fang fang chen
>             Fix For: 0.15.0
>
>
> Pig has supported compression(GZ, GZIP, LZO). But latest pig has not supported CMX codec yet. cmx is
"com.ibm.biginsights.compress.CmxCodec". This information also could be found from latest release
pig-0.13.0 documentation: http://pig.apache.org/docs/r0.13.0/perf.html. 
> Besides, I once tested CMX codec with pig-0.13.0 before. Following was the current settings:
> SET pig.tmpfilecompression true;
> SET pig.tmpfilecompression.codec cmx;
> Error:
> Caused by: java.io.IOException: Invalid temporary file compression codec [cmx]. Expected
(Continue reading)

Daniel Dai (JIRA | 24 Nov 23:16 2014
Picon

[Updated] (PIG-4342) Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT


     [
https://issues.apache.org/jira/browse/PIG-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-4342:
----------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed to both 0.14 branch and trunk. Thanks Rohini!

> Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT
> -------------------------------------------------------------
>
>                 Key: PIG-4342
>                 URL: https://issues.apache.org/jira/browse/PIG-4342
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Daniel Dai
>             Fix For: 0.14.1
>
>         Attachments: PIG-4342-1.patch
>
>
> One of the users encountered
> {code}
> Unexpected internal error. Pig Internal Error. Invalid preprocessor command
> specified : %DECLARE
(Continue reading)


Gmane