Joe Gong | 22 Oct 08:30 2014
Picon

Is neural network model ready for using now?

Hi everyone,
I want to use ANN model on Hadoop, but it is not included in Mahout. I find
this:
Early implementation of a Multi Layer Perceptron (MLP) classifier
https://issues.apache.org/jira/browse/MAHOUT-1265
I want to know whether this model can be used now?

Thanks.
Joe

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-neural-network-model-ready-for-using-now-tp4165296.html
Sent from the Mahout User List mailing list archive at Nabble.com.

gongjuntai | 22 Oct 05:02 2014
Picon

Is ANN model ready for using now?

Hi everyone,
I want to use ANN model on Hadoop, but it is not included in Mahout. I find this:
Early implementation of a Multi Layer Perceptron (MLP) classifier
https://issues.apache.org/jira/browse/MAHOUT-1265

I want to know whether this model can be used now?

Thanks.
Joe
2014-10-22 
Si Chen | 22 Oct 02:26 2014

using Mahout to classify customer service and sales emails?

Hi everybody,

Do you think Mahout could be used to classify a stream of customer
service/sales request emails, for example as "sales inquiry", "service
request", "return request", as well as "satisfied" vs "unsatisfied"?

We've worked on the first part -- building streams of customer emails and
communications -- and would like to partner with somebody to help us with
the classification part.

Thanks!

-----
Si Chen
Open Source Strategies, Inc.
twitter.com/opentaps

Work Smarter Naturally <http://youtu.be/6nMg15yWZyM> with opentaps CRM2
<http://www.opentaps.com>,
Now available for Android
<https://play.google.com/store/apps/details?id=com.opentaps.crm2client>,
iPad/iPhone <https://itunes.apple.com/us/app/opentaps-crm2/id899333198>,
Gmail
<https://chrome.google.com/webstore/detail/opentaps-crm2/apkbgpfokhbplllnjkndenaopihfiaop>
Mahesh Balija | 22 Oct 00:04 2014
Picon

Mahout Vs Spark

Hi Team,

As Spark framework is gaining more attention in the Big Data - Open source
frameworks, with the support for variety of applications like,

1) Shark
2) GraphX
3) MLLib
4) Streaming

With the rapid development algorithms supporting Clustering,
Classification, Regression etc in the MLlib package and inbuilt support for
Scala.
I am trying to differentiate between Mahout and Spark, here is the small
list,

  Features Mahout Spark  Clustering Y Y  Classification Y Y
Regression Y Y  Dimensionality
Reduction Y Y  Java Y Y  Scala N Y  Python N Y  Numpy N Y  Hadoop Y Y  Text
Mining Y N  Scala/Spark Bindings Y N/A  scalability Y  Y
Apart from above, Mahout has vast coverage of Machine Learning algorithms
with many utilities and API's as opposed to Spark.
And Mahout 1.0 providing support for Scala, Spark bindings.

I was trying to demarcate between Mahout and Spark?
Can you throw some light on key differences and uniqueness of Mahout
framework. Am I missing any important distinction which makes Mahout the
only choice for Scalable machine learning.

Best,
(Continue reading)

Yang | 21 Oct 23:13 2014
Picon

mahout kmeans gives a random result for short documents

we are trying to run kmeans  on some product titles
so that we could cluster together similar products
like "nike flex sneaker size 9" vs "nike flex sneaker size 8"
it works fine for most
but it turns out that a lot of the titles are very short (particularly
after filtering stopwords)
so I got many 1-word or 2-word titles
and somehow these got lumped together into a huge cluster
which does not have any similarly between the members at all
I followed some specific examples in this cluster,
it seems that the algorithm is indeed doing what it's supposed to do.

anybody has similar experience clustering particularly short "documents" ?
generally any tricks to force the members to "jump" out and join another
cluster ? (I do see other smaller clusters, with matching words)

Thanks
Yang
Aspasia Beneti | 20 Oct 16:31 2014
Picon

Questions concerning k-means

Hello all,

I am using k-means to cluster some data and I have the following two
questions:

1. In a Cluster what is the difference of  the centre and the centroid in
the specific implementation? I was trying to grasp the convergence
condition by looking at the code and I saw that the distance between the
centre and the centroid is calculated. I think I understand what the
centroid is but what is the centre then?

2. Why the k-means results have 2 final clusters? I compared the results
with R kmeans and it seems that the later final cluster is the right one.
But what is the first one then?

Sorry if this information is somewhere available but I couldn't find it so
far. Any help will be much appreciated.

Aspasia
Lee S | 20 Oct 04:14 2014
Picon

How to use naivebayes on ordinary data not on text files?

I hava an ordinary data file containing labels and  feature vectors.
How can I use naivebayes to classify it?
The example on the official website is used with text files. Can it be used
on ordinary files?

I wonder if *trainnb* can be directly used on data files only if the format
of data file is ok.
Pat Ferrel | 19 Oct 19:49 2014

Upgrade to Spark 1.1.0?

Several people have experienced problems using Spark 1.0.2 with Mahout so I tried it. Spark 1.0.1 is no
longer a recommended version and so is a little harder to get and people seem to be using newer versions. I
discovered that Mahout compiles with 1.0.2 in the pom and executes the tests but fails a simple test on a
cluster. It has an anonymous function name error, which causes a class not found. This looks like a Scala
thing but not sure. At first blush this means we can’t upgrade to Spark 1.0.2 without some relative deep
diving so I’m giving up on it for now and trying Spark 1.1.0, the current stable version that actually had
an RC cycle. It uses the same version of Scala as 1.0.1 

On Spark 1.1.0 Mahout builds and runs test fine but on a cluster I get a class not found for a random number
generator used in mahout common. I think it’s because it is never packaged as a dependency in a “job”
jar assembly so tried adding it to the spark pom. Not sure if this is the right way to solve this so if anyone
has a better idea please speak up.

Getting off the dubious Spark 1.0.1 version is turning out to be a bit of work. Does anyone object to
upgrading our Spark dependency? I’m not sure if Mahout built for Spark 1.1.0 will run on 1.0.1 so it may
mean upgrading your Spark cluster.   
Lee S | 19 Oct 17:55 2014
Picon

Why do seqdumper and clusterdumper poduce output in local disk?

When I run the two commands in hadoop mode , the output are all produced in
the disk. Why is the ouput in the hdfs in hadoop mode to perserve a
consistence?
parnab kumar | 19 Oct 14:56 2014
Picon

mahout in action source with mahout 0.9

Hi,
     The source code that comes with Mahout in Action seems to be
compatible with older versions of Mahout. Are the updated code with the
latest 0.9 version available anywhere.

Thanks,
Parnab
Henrique Lemos | 17 Oct 02:35 2014
Picon

Problem with clusterdump in Reuters Example

Hi,
I'm a newbie at both Mahout and Hadoop environments, so I started to
run some predefined examples, such as cluster reuters.
Everything went well until I've tried to visualize the results with
clusterdump command. I've tried this:

mahout clusterdump -i /teste/reuters-kmeans/clusters-*-final -o
/teste/clusterdump -d
/teste/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 -dt
sequencefile -b 100 -n 20 --evaluate -dm
org.apache.mahout.common.distance.CosineDistanceMeasure --pointsDir
/teste/reuters-kmeans/clusteredPoints

And, although I've changed the output directory to anything possible,
I always get this error:

Exception in thread "main" java.io.IOException: Unable to create
parent directories of /teste/clusterdump
    at com.google.common.io.Files.createParentDirs(Files.java:645)
    at org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:186)
(...)

I dont think it has anything to do with hdfs permissions, since all
the previous steps (seq2sparse, kmeans, etc) went well and created
their desired directories.

I would appreciate any help or tip. And sorry if I'm sending this
email to the wrong mail-list.

Thanks,
(Continue reading)


Gmane