Daniel Dai (JIRA | 25 Apr 03:31 2014
Picon

[Commented] (PIG-3672) Pig should not check for hardcoded file system implementations


    [
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980606#comment-13980606
] 

Daniel Dai commented on PIG-3672:
---------------------------------

+1 for PIG-3672-3.patch

> Pig should not check for hardcoded file system implementations
> --------------------------------------------------------------
>
>                 Key: PIG-3672
>                 URL: https://issues.apache.org/jira/browse/PIG-3672
>             Project: Pig
>          Issue Type: Bug
>          Components: data, parser
>    Affects Versions: 0.10.0, 0.12.0, 0.11.1
>            Reporter: Suhas Satish
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.13.0
>
>         Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, PIG-3672.patch
>
>
> QueryParserUtils.java has the code - 
> result.add("hdfs://"+thisHost+":"+uri.getPort());
> I propose to make it generic like - 
> result.add(uri.getScheme() + "://"+thisHost+":"+uri.getPort());
(Continue reading)

jira | 25 Apr 03:03 2014
Picon

Subscription: PIG patch available

Issue Subscription
Filter: PIG patch available (15 issues)

Subscriber: pigdaily

Key         Summary
PIG-3901    Organize the Pig properties file and document all properties
            https://issues.apache.org/jira/browse/PIG-3901
PIG-3877    Getting Geo Latitude/Longitude from Address Lines
            https://issues.apache.org/jira/browse/PIG-3877
PIG-3874    FileLocalizer temp path can sometimes be non-unique
            https://issues.apache.org/jira/browse/PIG-3874
PIG-3873    Geo distance calculation using Haversine
            https://issues.apache.org/jira/browse/PIG-3873
PIG-3867    Added hadoop home to build classpath for build pig with unit test on windows
            https://issues.apache.org/jira/browse/PIG-3867
PIG-3866    Create ThreadLocal classloader per PigContext
            https://issues.apache.org/jira/browse/PIG-3866
PIG-3861    duplicate jars get added to distributed cache
            https://issues.apache.org/jira/browse/PIG-3861
PIG-3825    Stats collection needs to be changed for hadoop2 (with auto local mode)
            https://issues.apache.org/jira/browse/PIG-3825
PIG-3735    UDF to data cleanse the dirty data with expected pattern
            https://issues.apache.org/jira/browse/PIG-3735
PIG-3672    Pig should not check for hardcoded file system implementations
            https://issues.apache.org/jira/browse/PIG-3672
PIG-3668    COR built-in function when atleast one of the coefficient values is NaN
            https://issues.apache.org/jira/browse/PIG-3668
PIG-3635    Fix e2e tests for Hadoop 2.X on Windows
            https://issues.apache.org/jira/browse/PIG-3635
(Continue reading)

Mona Chitnis (JIRA | 25 Apr 02:31 2014
Picon

[Updated] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories


     [
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis updated PIG-3891:
------------------------------

    Attachment: PIG-3891-1.patch

Attaching patch. End-to-end test done - using MultiStorage, split and both.
Example output:
{code}
Output(s):
Successfully stored 4 records (94 bytes) in: "hdfs://localhost:8020/user/chitnis/split1"
Successfully stored 2 records (47 bytes) in: "hdfs://localhost:8020/user/chitnis/split2"
Successfully stored 3 records (60 bytes) in: "hdfs://localhost:8020/user/chitnis/split3"
{code}

Unit tests included in patch, but can go through minor tweaks

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Mona Chitnis
(Continue reading)

Mona Chitnis (JIRA | 25 Apr 02:31 2014
Picon

[Updated] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories


     [
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis updated PIG-3891:
------------------------------

    Attachment: PIG-3891-branch-0.11.patch

Attaching fix for class JobStats.java (pig 0.11)

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Mona Chitnis
>         Attachments: PIG-3891-1.patch, PIG-3891-branch-0.11.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output directory. So if files are stored
under subdirectories (For eg: MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a regression. A quick look at the code shows
that the JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and code is same as 
FileBasedOutputSizeReader. Need to investigate where the correct value comes from in 0.11 and fix it in 0.12.1/0.13.

(Continue reading)

Daniel Dai (JIRA | 25 Apr 02:09 2014
Picon

[Commented] (PIG-3904) Pig support windows i18n


    [
https://issues.apache.org/jira/browse/PIG-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980526#comment-13980526
] 

Daniel Dai commented on PIG-3904:
---------------------------------

The patch does not apply in trunk. There is quite a few change and now we are using Reader instead of
InputStream. Can you check if trunk works for you?

> Pig support windows i18n
> ------------------------
>
>                 Key: PIG-3904
>                 URL: https://issues.apache.org/jira/browse/PIG-3904
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.9.2, 0.9.3
>         Environment: Windows 7(de_DE/fr_FR/zh_CN)
>            Reporter: Lizhao.Du
>             Fix For: 0.9.3
>
>         Attachments: PIG-3904.patch
>
>
> Utilize  Pig   run  a pig script  in  Windows (de_DE), it is  failed.
> The error message showed "Input path does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput",
but /tmp/测试/pwInput is exist 
(Continue reading)

Daniel Dai (JIRA | 25 Apr 02:01 2014
Picon

[Updated] (PIG-3865) Remodel the XMLLoader to work to be faster and more maintainable


     [
https://issues.apache.org/jira/browse/PIG-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-3865:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.13.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

+1, except I removed  <at> author tag since it is not allowed in Apache.

Patch committed to trunk, thanks Ahmed!

> Remodel the XMLLoader to work to be faster and more maintainable
> ----------------------------------------------------------------
>
>                 Key: PIG-3865
>                 URL: https://issues.apache.org/jira/browse/PIG-3865
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>            Reporter: Ahmed Eldawy
>            Assignee: Ahmed Eldawy
>            Priority: Minor
>             Fix For: 0.13.0
>
>         Attachments: PIG-3865-2.patch, PIG-3865-test.txt, XMLLoader.java, bad-file.xml.bz2, test-file-2.xml.bz2
(Continue reading)

Lorand Bendig (JIRA | 25 Apr 01:59 2014
Picon

[Created] (PIG-3911) Define unique fields with <at> OutputSchema

Lorand Bendig created PIG-3911:
----------------------------------

             Summary: Define unique fields with  <at> OutputSchema
                 Key: PIG-3911
                 URL: https://issues.apache.org/jira/browse/PIG-3911
             Project: Pig
          Issue Type: Improvement
            Reporter: Lorand Bendig
            Assignee: Lorand Bendig

As a continuation of PIG-2361, I think that {{ <at> OutputSchema}} could be extended in order to eliminate the
repeating patterns of {{EvalFunc#outputSchema()}} found in most UDFs. 
I'd come up with the following syntax:

Complex schema definition:
{code}
 <at> OutputSchema("y:bag{t:tuple(len:int,word:chararray,${0}:int)},${1}:chararray,${2}:bytearray")
 <at> SchemaFields({
   <at> Unique(name="word"),
   <at> Unique(name="${0}"),
   <at> Unique(name="${1}", prefix="id"),
   <at> Unique(name="${2}", prefix="item", postfix="id")}
)
public class MyUDF {...}
{code}
Rewrite rules:
{code}
word => "word" + "_" + nextSchemaId
${0} => this.getClass.getName().toLower() + "_" + nextSchemaId
(Continue reading)

Daniel Dai (JIRA | 25 Apr 01:31 2014
Picon

[Commented] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.


    [
https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980488#comment-13980488
] 

Daniel Dai commented on PIG-3880:
---------------------------------

Not sure what happen but here is something might be related. Pig fat jar contains commons-io.jar but
pig-withouthadoop.jar does not. However, when you run bin/pig, Pig will invoke "hadoop jar
pig-withouthadoop.jar", and hadoop will provide commons-io classes in runtime.

> After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-3880
>                 URL: https://issues.apache.org/jira/browse/PIG-3880
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt
>    Affects Versions: 0.13.0
>            Reporter: David Medinets
>
> I pulled trunk from subversion using the following commands:
> mkdir pig
> cd pig
> svn co http://svn.apache.org/repos/asf/pig/trunk
> cd trunk
> ant
> export PATH=$PATH:$HOME/pig/trunk/bin
(Continue reading)

Daniel Dai (JIRA | 25 Apr 01:25 2014
Picon

[Commented] (PIG-3910) Cannot use DUMP command in macro


    [
https://issues.apache.org/jira/browse/PIG-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980474#comment-13980474
] 

Daniel Dai commented on PIG-3910:
---------------------------------

Dump is in grunt parser which cannot be included in Macro. This is part of PIG-2597 [~kyunghoj] is going to
work for GSoC 2014.

> Cannot use DUMP command in macro
> --------------------------------
>
>                 Key: PIG-3910
>                 URL: https://issues.apache.org/jira/browse/PIG-3910
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10.0
>            Reporter: Ory
>
> When trying to use a 'DUMP' command inside Macros, I get a syntax error:
> ERROR 1200: <file /tmp/pig-plugin/Ory/1398254029560/ORY_macros.pig, line 27> Failed to parse macro
'num_and_dump'. Reason: <file /tmp/pig-plugin/Ory/1398254029560/ORY_macros.pig, line 3, column
0>  Syntax error, unexpected symbol at or near 'DUMP'
> I have found one reference on the web to this being due to issues between PigLatin parser and Grunt parser,
but that was 2 years ago and I can find no further information on this.

--
(Continue reading)

Daniel Dai (JIRA | 25 Apr 00:49 2014
Picon

[Updated] (PIG-3737) Bundle dependent jars in distribution in %PIG_HOME%/lib folder


     [
https://issues.apache.org/jira/browse/PIG-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-3737:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.13.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Cheolsoo for review!

> Bundle dependent jars in distribution in %PIG_HOME%/lib folder
> --------------------------------------------------------------
>
>                 Key: PIG-3737
>                 URL: https://issues.apache.org/jira/browse/PIG-3737
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Shuaishuai Nie
>            Assignee: Daniel Dai
>             Fix For: 0.13.0
>
>         Attachments: PIG-3737-2.patch, PIG-3737.1.patch
>
>
> Pig should bundle with dependencies like avro.jar and json-simple.jar

(Continue reading)

Daniel Dai (JIRA | 25 Apr 00:47 2014
Picon

[Resolved] (PIG-3905) 0.12.1 release can't be build for Hadoop2


     [
https://issues.apache.org/jira/browse/PIG-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-3905.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.13.0
         Assignee: Daniel Dai
     Hadoop Flags: Reviewed

Compilation fail only when use a local maven repository. Commit to trunk first.

Also the fix speeds up the compilation. 

> 0.12.1 release can't be build for Hadoop2
> -----------------------------------------
>
>                 Key: PIG-3905
>                 URL: https://issues.apache.org/jira/browse/PIG-3905
>             Project: Pig
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 0.12.1
>            Reporter: Konstantin Boudnik
>            Assignee: Daniel Dai
>            Priority: Blocker
>             Fix For: 0.13.0
>
(Continue reading)


Gmane