jira | 23 Oct 08:00 2014
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (21 issues)

Subscriber: pigdaily

Key         Summary
PIG-4241    Auto local mode mistakenly converts large jobs to local mode when using with Hive tables
            https://issues.apache.org/jira/browse/PIG-4241
PIG-4239    "pig.output.lazy" not works in spark mode
            https://issues.apache.org/jira/browse/PIG-4239
PIG-4224    Upload Tez payload history string to timeline server
            https://issues.apache.org/jira/browse/PIG-4224
PIG-4160    -forcelocaljars / -j flag when using a remote url for a script
            https://issues.apache.org/jira/browse/PIG-4160
PIG-4111    Make Pig compiles with avro-1.7.7
            https://issues.apache.org/jira/browse/PIG-4111
PIG-4103    Fix TestRegisteredJarVisibility(after PIG-4083)
            https://issues.apache.org/jira/browse/PIG-4103
PIG-4084    Port TestPigRunner to Tez
            https://issues.apache.org/jira/browse/PIG-4084
PIG-4066    An optimization for ROLLUP operation in Pig
            https://issues.apache.org/jira/browse/PIG-4066
PIG-4004    Upgrade the Pigmix queries from the (old) mapred API to mapreduce
            https://issues.apache.org/jira/browse/PIG-4004
PIG-4002    Disable combiner when map-side aggregation is used
            https://issues.apache.org/jira/browse/PIG-4002
PIG-3952    PigStorage accepts '-tagSplit' to return full split information
            https://issues.apache.org/jira/browse/PIG-3952
PIG-3911    Define unique fields with  <at> OutputSchema
            https://issues.apache.org/jira/browse/PIG-3911
(Continue reading)

Apache Jenkins Server | 23 Oct 00:24 2014
Picon

Build failed in Jenkins: Pig-trunk #1686

See <https://builds.apache.org/job/Pig-trunk/1686/changes>

Changes:

[rohini] PIG-4039: New interface for resetting static variables for jvm reuse (rohini)

------------------------------------------
[...truncated 132 lines...]
    [mkdir] Created dir: <https://builds.apache.org/job/Pig-trunk/ws/build/classes>
    [mkdir] Created dir: <https://builds.apache.org/job/Pig-trunk/ws/build/test/classes>
    [mkdir] Created dir: <https://builds.apache.org/job/Pig-trunk/ws/test/org/apache/pig/test/utils/dotGraph/parser>
    [mkdir] Created dir: <https://builds.apache.org/job/Pig-trunk/ws/src-gen/org/apache/pig/data/parser>
     [move] Moving 1 file to <https://builds.apache.org/job/Pig-trunk/ws/build/ivy/lib/Pig>

cc-compile:
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file
<https://builds.apache.org/job/Pig-trunk/ws/src/org/apache/pig/tools/pigscript/parser/PigScriptParser.jj>
. . .
   [javacc] File "TokenMgrError.java" does not exist.  Will create one.
   [javacc] File "ParseException.java" does not exist.  Will create one.
   [javacc] File "Token.java" does not exist.  Will create one.
   [javacc] File "JavaCharStream.java" does not exist.  Will create one.
   [javacc] Parser generated successfully.
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file
<https://builds.apache.org/job/Pig-trunk/ws/src/org/apache/pig/tools/parameters/PigFileParser.jj>
. . .
(Continue reading)

Rohini Palaniswamy (JIRA | 22 Oct 23:18 2014
Picon

[Created] (PIG-4246) HBaseStorage should implement getShipFiles

Rohini Palaniswamy created PIG-4246:
---------------------------------------

             Summary: HBaseStorage should implement getShipFiles
                 Key: PIG-4246
                 URL: https://issues.apache.org/jira/browse/PIG-4246
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy

HBaseStorage.initializeHBaseClassLoaderResources() uses TableMapReduceUtil APIs to add
dependency jars. That sets the tmpjars setting which makes JobClient ship the jars to hdfs and use that
path in distributed cache. That bypasses the optimizations in PIG-2672 and PIG-3861 which avoid
shipping the jars to hdfs. Instead it should implement the getShipFiles() API introduced in PIG-4141 so
that PIG-2672 or PIG-3861 avoid shipping the same jar multiple times to hdfs for a job.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Rohini Palaniswamy (JIRA | 22 Oct 19:43 2014
Picon

[Updated] (PIG-4039) New interface for resetting static variables for jvm reuse


     [
https://issues.apache.org/jira/browse/PIG-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4039:
------------------------------------
    Summary: New interface for resetting static variables for jvm reuse  (was: New interface for resetting
static variables)

> New interface for resetting static variables for jvm reuse
> ----------------------------------------------------------
>
>                 Key: PIG-4039
>                 URL: https://issues.apache.org/jira/browse/PIG-4039
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>         Attachments: PIG-4039-1.patch, PIG-4039-2.patch
>
>
> In Tez, when there is container reuse, static variables, thread locals, etc have to be reinitialized to
avoid memory leak or having wrong values. For the short term, we ended up making some of the static
variables public or adding destroy method to each class which is hacky. Also it does not help users who will
want something similar to be done in their UDFs or LoadFunc. Need to define a interface with a
reset/destroy method and find all loaded classes implementing that interface and call destroy on them in
PigProcessor.close(). ServiceLoader and annotations are some of the ways to find classes implementing
(Continue reading)

Rohini Palaniswamy (JIRA | 22 Oct 19:43 2014
Picon

[Updated] (PIG-4039) New interface for resetting static variables


     [
https://issues.apache.org/jira/browse/PIG-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4039:
------------------------------------
    Attachment: PIG-4039-2.patch

Thanks Daniel. Checked into branch-0.14 and trunk. Had to remove a  <at> Override that was added to
getProgress() in PigStatusReporter as that failed compilation in Hadoop 1.x. PIG-4039-2.patch is the
final patch checked in.

> New interface for resetting static variables
> --------------------------------------------
>
>                 Key: PIG-4039
>                 URL: https://issues.apache.org/jira/browse/PIG-4039
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>         Attachments: PIG-4039-1.patch, PIG-4039-2.patch
>
>
> In Tez, when there is container reuse, static variables, thread locals, etc have to be reinitialized to
avoid memory leak or having wrong values. For the short term, we ended up making some of the static
variables public or adding destroy method to each class which is hacky. Also it does not help users who will
(Continue reading)

Rohini Palaniswamy (JIRA | 22 Oct 19:43 2014
Picon

[Updated] (PIG-4039) New interface for resetting static variables for jvm reuse


     [
https://issues.apache.org/jira/browse/PIG-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4039:
------------------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

> New interface for resetting static variables for jvm reuse
> ----------------------------------------------------------
>
>                 Key: PIG-4039
>                 URL: https://issues.apache.org/jira/browse/PIG-4039
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>         Attachments: PIG-4039-1.patch, PIG-4039-2.patch
>
>
> In Tez, when there is container reuse, static variables, thread locals, etc have to be reinitialized to
avoid memory leak or having wrong values. For the short term, we ended up making some of the static
variables public or adding destroy method to each class which is hacky. Also it does not help users who will
want something similar to be done in their UDFs or LoadFunc. Need to define a interface with a
reset/destroy method and find all loaded classes implementing that interface and call destroy on them in
(Continue reading)

Rohini Palaniswamy (JIRA | 22 Oct 18:37 2014
Picon

[Created] (PIG-4245) 1-1 edge vertices should use same jvm opts

Rohini Palaniswamy created PIG-4245:
---------------------------------------

             Summary: 1-1 edge  vertices should use same jvm opts
                 Key: PIG-4245
                 URL: https://issues.apache.org/jira/browse/PIG-4245
             Project: Pig
          Issue Type: Sub-task
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.14.0

   We use map jvm opts for root vertices and reduce jvm opts for intermediate and leaf vertices. For a simple
order by or skewed join the Sampler will be a root vertex and Partitioner will be a intermediate vertex. If
the map opts and reduce opts are different containers will not be reused and there will be lot of resource
wastage when the reduce opts specify higher memory than map which is usually the case. The partitioner
does not do much and just does identity mapping. So we should be good using map opts there. 

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Eyal Allweil (JIRA | 22 Oct 09:54 2014
Picon

[Commented] (PIG-2836) Namespace in Pig macros collides with Pig scripts


    [
https://issues.apache.org/jira/browse/PIG-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179692#comment-14179692
] 

Eyal Allweil commented on PIG-2836:
-----------------------------------

From what  version is this fixed? I'm using 0.11.1 and have a script in which this occurs - did the macro
relation name suffixing occur in Pig 0.12?

> Namespace in Pig macros collides with Pig scripts
> -------------------------------------------------
>
>                 Key: PIG-2836
>                 URL: https://issues.apache.org/jira/browse/PIG-2836
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt, parser
>    Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>            Reporter: Russell Jurney
>            Assignee: Alan Gates
>            Priority: Critical
>              Labels: bacon, confit, goto, hash, macros, pig, sad
>
> Relation names in macros collide with relation names in the calling pig script. This is my most common
source of errors and it makes writing macros hard. Suggest that the macro processor create a unique
namespace for all relations in a macro other than $in and $out. Prepend something to each relation name or
somehow create a unique per-macro namespace.
> This may conflict with some uses of macros where relation names are passed through passively, but this is
(Continue reading)

jira | 22 Oct 08:00 2014
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (22 issues)

Subscriber: pigdaily

Key         Summary
PIG-4241    Auto local mode mistakenly converts large jobs to local mode when using with Hive tables
            https://issues.apache.org/jira/browse/PIG-4241
PIG-4239    "pig.output.lazy" not works in spark mode
            https://issues.apache.org/jira/browse/PIG-4239
PIG-4224    Upload Tez payload history string to timeline server
            https://issues.apache.org/jira/browse/PIG-4224
PIG-4160    -forcelocaljars / -j flag when using a remote url for a script
            https://issues.apache.org/jira/browse/PIG-4160
PIG-4111    Make Pig compiles with avro-1.7.7
            https://issues.apache.org/jira/browse/PIG-4111
PIG-4103    Fix TestRegisteredJarVisibility(after PIG-4083)
            https://issues.apache.org/jira/browse/PIG-4103
PIG-4084    Port TestPigRunner to Tez
            https://issues.apache.org/jira/browse/PIG-4084
PIG-4066    An optimization for ROLLUP operation in Pig
            https://issues.apache.org/jira/browse/PIG-4066
PIG-4039    New interface for resetting static variables
            https://issues.apache.org/jira/browse/PIG-4039
PIG-4004    Upgrade the Pigmix queries from the (old) mapred API to mapreduce
            https://issues.apache.org/jira/browse/PIG-4004
PIG-4002    Disable combiner when map-side aggregation is used
            https://issues.apache.org/jira/browse/PIG-4002
PIG-3952    PigStorage accepts '-tagSplit' to return full split information
            https://issues.apache.org/jira/browse/PIG-3952
(Continue reading)

Daniel Dai (JIRA | 22 Oct 00:08 2014
Picon

[Commented] (PIG-4039) New interface for resetting static variables


    [
https://issues.apache.org/jira/browse/PIG-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179210#comment-14179210
] 

Daniel Dai commented on PIG-4039:
---------------------------------

+1

> New interface for resetting static variables
> --------------------------------------------
>
>                 Key: PIG-4039
>                 URL: https://issues.apache.org/jira/browse/PIG-4039
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>         Attachments: PIG-4039-1.patch
>
>
> In Tez, when there is container reuse, static variables, thread locals, etc have to be reinitialized to
avoid memory leak or having wrong values. For the short term, we ended up making some of the static
variables public or adding destroy method to each class which is hacky. Also it does not help users who will
want something similar to be done in their UDFs or LoadFunc. Need to define a interface with a
reset/destroy method and find all loaded classes implementing that interface and call destroy on them in
(Continue reading)

Rohini Palaniswamy (JIRA | 22 Oct 00:07 2014
Picon

[Updated] (PIG-3861) duplicate jars get added to distributed cache


     [
https://issues.apache.org/jira/browse/PIG-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-3861:
------------------------------------
    Attachment: PIG-3861-5.patch

Rebased the patch. Removed the List to Set conversion as Daniel said order needs to be maintained.

> duplicate jars get added to distributed cache
> ---------------------------------------------
>
>                 Key: PIG-3861
>                 URL: https://issues.apache.org/jira/browse/PIG-3861
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: PIG-3681-1.patch, PIG-3861-2.patch, PIG-3861-3.patch, PIG-3861-4.patch, PIG-3861-5.patch
>
>
> PigContext's scriptJars should handle de-duplication of jars to account for script engines e.g.
JythonScriptEngine performing various jar loading for module and sometimes adding same jar twice.
AlsoJobControlCompiler.shipToHdfs() needs a check against adding the same jar more than once, under
different randomly incremented sub-dirs.

(Continue reading)


Gmane