Benjamin Eckstein | 24 Oct 00:51 2014

Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

Hello, i have 2 lines of code, that produces a class not found exception
Joe Gong | 22 Oct 08:30 2014

Is neural network model ready for using now?

Hi everyone,
I want to use ANN model on Hadoop, but it is not included in Mahout. I find
Early implementation of a Multi Layer Perceptron (MLP) classifier
I want to know whether this model can be used now?


View this message in context:
Sent from the Mahout User List mailing list archive at

gongjuntai | 22 Oct 05:02 2014

Is ANN model ready for using now?

Hi everyone,
I want to use ANN model on Hadoop, but it is not included in Mahout. I find this:
Early implementation of a Multi Layer Perceptron (MLP) classifier

I want to know whether this model can be used now?

Si Chen | 22 Oct 02:26 2014

using Mahout to classify customer service and sales emails?

Hi everybody,

Do you think Mahout could be used to classify a stream of customer
service/sales request emails, for example as "sales inquiry", "service
request", "return request", as well as "satisfied" vs "unsatisfied"?

We've worked on the first part -- building streams of customer emails and
communications -- and would like to partner with somebody to help us with
the classification part.


Si Chen
Open Source Strategies, Inc.

Work Smarter Naturally <> with opentaps CRM2
Now available for Android
iPad/iPhone <>,
Mahesh Balija | 22 Oct 00:04 2014

Mahout Vs Spark

Hi Team,

As Spark framework is gaining more attention in the Big Data - Open source
frameworks, with the support for variety of applications like,

1) Shark
2) GraphX
3) MLLib
4) Streaming

With the rapid development algorithms supporting Clustering,
Classification, Regression etc in the MLlib package and inbuilt support for
I am trying to differentiate between Mahout and Spark, here is the small

  Features Mahout Spark  Clustering Y Y  Classification Y Y
Regression Y Y  Dimensionality
Reduction Y Y  Java Y Y  Scala N Y  Python N Y  Numpy N Y  Hadoop Y Y  Text
Mining Y N  Scala/Spark Bindings Y N/A  scalability Y  Y
Apart from above, Mahout has vast coverage of Machine Learning algorithms
with many utilities and API's as opposed to Spark.
And Mahout 1.0 providing support for Scala, Spark bindings.

I was trying to demarcate between Mahout and Spark?
Can you throw some light on key differences and uniqueness of Mahout
framework. Am I missing any important distinction which makes Mahout the
only choice for Scalable machine learning.

(Continue reading)

Yang | 21 Oct 23:13 2014

mahout kmeans gives a random result for short documents

we are trying to run kmeans  on some product titles
so that we could cluster together similar products
like "nike flex sneaker size 9" vs "nike flex sneaker size 8"
it works fine for most
but it turns out that a lot of the titles are very short (particularly
after filtering stopwords)
so I got many 1-word or 2-word titles
and somehow these got lumped together into a huge cluster
which does not have any similarly between the members at all
I followed some specific examples in this cluster,
it seems that the algorithm is indeed doing what it's supposed to do.

anybody has similar experience clustering particularly short "documents" ?
generally any tricks to force the members to "jump" out and join another
cluster ? (I do see other smaller clusters, with matching words)

Aspasia Beneti | 20 Oct 16:31 2014

Questions concerning k-means

Hello all,

I am using k-means to cluster some data and I have the following two

1. In a Cluster what is the difference of  the centre and the centroid in
the specific implementation? I was trying to grasp the convergence
condition by looking at the code and I saw that the distance between the
centre and the centroid is calculated. I think I understand what the
centroid is but what is the centre then?

2. Why the k-means results have 2 final clusters? I compared the results
with R kmeans and it seems that the later final cluster is the right one.
But what is the first one then?

Sorry if this information is somewhere available but I couldn't find it so
far. Any help will be much appreciated.

Lee S | 20 Oct 04:14 2014

How to use naivebayes on ordinary data not on text files?

I hava an ordinary data file containing labels and  feature vectors.
How can I use naivebayes to classify it?
The example on the official website is used with text files. Can it be used
on ordinary files?

I wonder if *trainnb* can be directly used on data files only if the format
of data file is ok.
Pat Ferrel | 19 Oct 19:49 2014

Upgrade to Spark 1.1.0?

Several people have experienced problems using Spark 1.0.2 with Mahout so I tried it. Spark 1.0.1 is no
longer a recommended version and so is a little harder to get and people seem to be using newer versions. I
discovered that Mahout compiles with 1.0.2 in the pom and executes the tests but fails a simple test on a
cluster. It has an anonymous function name error, which causes a class not found. This looks like a Scala
thing but not sure. At first blush this means we can’t upgrade to Spark 1.0.2 without some relative deep
diving so I’m giving up on it for now and trying Spark 1.1.0, the current stable version that actually had
an RC cycle. It uses the same version of Scala as 1.0.1 

On Spark 1.1.0 Mahout builds and runs test fine but on a cluster I get a class not found for a random number
generator used in mahout common. I think it’s because it is never packaged as a dependency in a “job”
jar assembly so tried adding it to the spark pom. Not sure if this is the right way to solve this so if anyone
has a better idea please speak up.

Getting off the dubious Spark 1.0.1 version is turning out to be a bit of work. Does anyone object to
upgrading our Spark dependency? I’m not sure if Mahout built for Spark 1.1.0 will run on 1.0.1 so it may
mean upgrading your Spark cluster.   
Lee S | 19 Oct 17:55 2014

Why do seqdumper and clusterdumper poduce output in local disk?

When I run the two commands in hadoop mode , the output are all produced in
the disk. Why is the ouput in the hdfs in hadoop mode to perserve a
parnab kumar | 19 Oct 14:56 2014

mahout in action source with mahout 0.9

     The source code that comes with Mahout in Action seems to be
compatible with older versions of Mahout. Are the updated code with the
latest 0.9 version available anywhere.