KUNAL GANDHRE | 24 Apr 14:05 2014

How to Update mahout classification model for new record?


How to Update mahout classification model for new record.
I am using Naive Bayes to do text classify. I wish to classify facebook
post and tweets. Using 700 post training data i prepared a Model. I can
classify new post coming to system. Now my question is simple, as system is
classifying new post so i wish to update model based on this output, i
think this is called what machine learning?

*Training/ Learning ---> Model ---> Test ---> Training/ Learning ---> Model

*Can you please help me in this? Sorry for my bad english.*

Best regards,


*Kunal Gandhre*

Software Developer

Facebook:  facebook.com/gandhre.kunal<http://www.facebook.com/gandhre.kunal>

Phone:       +91-*8087382982*
Sebastian Schelter | 24 Apr 12:19 2014

Welcome Pat Ferrel as new committer on Mahout


this is to announce that the Project Management Committee (PMC) for 
Apache Mahout has asked Pat Ferrel to become committer and we are 
pleased to announce that he has accepted.

Being a committer enables easier contribution to the project since in 
addition to posting patches on JIRA it also gives write access to the 
code repository. That also means that now we have yet another person who 
can commit patches submitted by others to our repo *wink*

Pat, we look forward to working with you in the future. Welcome! It 
would be great if you could introduce yourself with a few words.


Darshan Sonagara | 22 Apr 15:12 2014

Getting error in qualcluster command

I want to analyze cluster which i did clustering on mahout by kmeans algorithm.
In qualcluster command there is an comman linne argument as -c what kind of file i need to give as input for
kmeans algorithm. 
I did it for streaming kmeans. It worked. But every time i run qualcluster i am getting diffirent value for
all parameter in csv file.

Sent from my iPhone
Himanshu | 22 Apr 14:27 2014

Does Mahout handle missing values in train and test data, for Decision Forest?

In Weka it is possible to mark the field with a question mark "?" for unknown 
values and these are handled. Is there a similar way to mark 
"unknown"/"missing" field values in Mahout training and test data as well.

Appreciate any suggestions/pointers. Breiman talks about two ways to handle 
missing values.

Darshan Sonagara | 22 Apr 09:11 2014

Question Regarding Entropy calculation in Mahout

I am Final year BE Student from Gujarat,India. right now studying in
Information Technology Branch. i have Final Year project as Document
Clustering using Hadoop.
At this stage i am able to find final result from cluster dump command in
which i can see number of document in particular cluster and their
respective top term.
i am also applying various distance Measure Technique with K-means
But the problem is that i want check that whether my clustering is good or
bad. so for that i need to calculate Entropy Value. I am not having any
idea how to calculate entropy in mahout or by other technique.
by finding entropy i can have good conclusion.
so please can anyone help me with these.



*Regards From:*

*Darshan  Sonagara*
*Collaborative Platform lead,** SSN Team | Gujarat Section.*

*Vice-Chairperson | **GCET IEEE SB.*

(: +*91* 9408002452

 : Darshan Sonagara<http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
  : Darshan Sonagara <http://www.facebook.com/darshansonagara>
Christopher Eugene | 19 Apr 13:12 2014

Re: Installation on Ubuntu

thank you Donald I had a tough time installing PredictionIO yesterday but I
managed to crack it!

On Sat, Apr 19, 2014 at 12:33 AM, Donald Szeto <donald <at> prediction.io> wrote:

>  [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (donald <at> prediction.io) Add cleanup
rule<https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DHjLCarGwXA5qvg4v9GxdYWLTeca6jMxxcZp1pqf8FstmtwSIBfqhN%252B10yba1T6S9GZ89b4rkcLVXsUaa7NkVGJXsjV1oyfhKR%252BbXFdzGW%252F151u0Vkmhbkb8wMcm8O6NuumT92cAX4Xk%253D%26key%3DaYt24F4htil8Zf3Qx7yvFu8JK7%252Bbw8rGHNqtfmadX%252Fc%253D&tc_serial=16968737781&tc_rand=1318346415&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>| More
> info<http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=16968737781&tc_rand=1318346415&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> Regarding "Quick question are all the methods available in mahout available
> on PREDICTIONIO?", the answer is almost all CF algorithms from Mahout has
> been integrated.
> Regards,
> Donald
> On Fri, Apr 18, 2014 at 2:25 PM, Donald Szeto <donald <at> prediction.io>
> wrote:
> > We are glad to help answer any Mahout + PredictionIO questions here.
> >
> > Regards,
> > Donald
> >


(Continue reading)

Bob Morris | 18 Apr 23:47 2014

Grumble about (lack of) warning of deprecation of Canopy KMeans

I was taken aback  that the immensely touted and convenient Canopy
KMeans package was today deprecated [1] in the incubating  mahout 1.0
with no hint that I could find warned in this, at least back through
March. And even then I can  see only in retrospect that a suggestion
lurked in [2] that Streaming KMeans is preferable.  I only saw this
when, in tinkering with my apparently successful use of CanopyDriver,
I ended up in the JavaDoc from the SVN repo, which is where a public
document [3] directs developers.

At the very least, [4] should begin with a deprecation warning (I
commented  Jira MAHOUT-1513 to this effect). Also,  perhaps [3] should
point to the most released JavaDoc rather than the SVN trunk.

Apologies for the grumble if in fact I missed a warning posted here
and I redirect my grumble at myself.


[1] https://issues.apache.org/jira/browse/MAHOUT-1513
[2] http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3C1394662437.49533.YahooMailNeo%40web163502.mail.gq1.yahoo.com%3E
[3] https://mahout.apache.org/developers/developer-resources.html
[4] http://mahout.apache.org/users/clustering/canopy-clustering.html

Robert A. Morris

Emeritus Professor  of Computer Science
100 Morrissey Blvd
Boston, MA 02125-3390

(Continue reading)

Terry Blankers | 18 Apr 22:18 2014

Re: lucene2seq error: field does not exist in the index

No problem Suneel, I've been traveling & unavailable myself until now.

> On 4/13/14, 6:12 PM, Suneel Marthi wrote:
>> Apologies for the delayed response Terry.
>> Mahout's presently at Lucene 4.6.1 (both 0.9 and trunk).  The 
>> practice so far has been to upgrade to the latest Lucene version 
>> right before a planned release.
>> Not sure what has changed in Solr/Lucene 4.7.1.
>> You could try either of 2 things:-
>> a) Is your index spread across multiple shards? 

No, I'm using a fairly simple installation with no sharding.

>> b) Upgrade Mahout locally to Lucene 4.7.1 and run ur tests again and 
>> see if that works.

Actually I'm using Solr 4.2.1 and I did build Mahout locally from trunk 
about 2 weeks ago against Hadoop 2.3. I've tested against my local build 
and against 0.9 binaries. Sorry for the confusion.

>> c) It could possibly be a bug in lucene2seq and we may not have 
>> adequate test coverage, could u create a unit test to reproduce this 
>> scenario?
>> Would it possible for u to share a sample index along with the Solr 
(Continue reading)

Christopher Eugene | 18 Apr 22:11 2014

Re: Installation on Ubuntu

sorry I thought I replied to it :). I can ask predictionio related
questions on the list too?

On Fri, Apr 18, 2014 at 11:06 PM, Sebastian Schelter <ssc <at> apache.org> wrote:

> Please reply to the list, not to me in person :)
> On 04/18/2014 10:05 PM, Christopher Eugene wrote:
>> Thank you Sebastian, I could've sworn I saw something involving mahout and
>> php not so long ago. Quick question are all the methods available in
>> mahout
>> available on PREDICTIONIO?
>> On Fri, Apr 18, 2014 at 10:53 PM, Sebastian Schelter <ssc <at> apache.org>
>> wrote:
>>    [image: Boxbe] <https://www.boxbe.com/overview> This message is
>>> eligible
>>> for Automatic Cleanup! (ssc <at> apache.org) Add cleanup rule<
>>> https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.
>>> boxbe.com%2Fcleanup%3Ftoken%3DI1jJlussgKo%252FgNnu0piiTjSz4XM0mnIqukN5wT
>>> %252BQRNmLPkyWOH0REpeI8f1ieFq90qMLvqA8YMt1NSyh5v7uv5blLasRGnu
>>> Tyw%252F4uVI3zs%252BXKaoEm2vHJk54%252F1sEmGkvry98ht1MW0M%253D%
>>> 26key%3Dv33YAIUda%252F72bTRCeq4yfV92BTK%252FJZM1xG3rsd7W2bY%253D&tc_
>>> serial=16968129293&tc_rand=1599246981&utm_source=stf&utm_
>>> medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>| More
>>> info<http://blog.boxbe.com/general/boxbe-automatic-
(Continue reading)

Christopher Eugene | 18 Apr 21:49 2014

Re: Installation on Ubuntu

 <at> sebastian I have version 1.7.  <at> Andrew I plan on using mahout with php
since I heard that there is a new API or am I wrong?

On Fri, Apr 18, 2014 at 10:45 PM, Andrew Musselman <
andrew.musselman <at> gmail.com> wrote:

>  [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (andrew.musselman <at> gmail.com) Add cleanup
rule<https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DmHSwpoBQ%252B6%252FJ3fW9yUA910ycGPeUT52Q%252Fal25IyYKsdhPwMs0QIM107VdsJQmYwJIZUxElWJcJOFczNqRvadXgKw58KV6DBHGzisKUyc7%252FXdNTfzycKNF8q7TqaJZzQWsiKseZB4uiAuGRbLb4mQVQ%253D%253D%26key%3DLq7NFbPs6NRMzQNN67fbd1t58GhHGdt2F%252F7YgWWx158%253D&tc_serial=16968089574&tc_rand=991651927&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>| More
> info<http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=16968089574&tc_rand=991651927&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> I would say if you want to get started, just grab the pre-built version via
> the "download" button on the home page of http://mahout.apache.org
> E.g., following those links you would end up here:
> http://apache.cs.utah.edu/mahout/0.9 and then get either the "-src" or
> non-"-src" version and use the pre-built jars and examples.
> On Fri, Apr 18, 2014 at 12:39 PM, Christopher Eugene
> <xriseugene <at> gmail.com>wrote:
> > Hello,
> > I want to install mahout on Ubuntu 14.04. I had previously tried in vain
> to
> > install on 13.10. Could the version  of Java be the problem? I am
> compiling
> > from source. Any help will be appreciated.
> > --
> > Omar Christopher Eugene
(Continue reading)

Christopher Eugene | 18 Apr 21:39 2014

Installation on Ubuntu

I want to install mahout on Ubuntu 14.04. I had previously tried in vain to
install on 13.10. Could the version  of Java be the problem? I am compiling
from source. Any help will be appreciated.

Omar Christopher Eugene