Andrew Palumbo | 24 May 02:39 2016

Welcome Trevor Grant as a new Mahout Committer

In recognition of Trevor Grant's contributions to the Mahout project
notably his Zeppelin Integration work, the PMC has invited and is pleased
to announce that he has accepted our invitation to join the Mahout project
as a committer.

As is customary, I will leave it to Trevor to provide a little bit of
background about himself.

Congratulations and Welcome!

-Andrew Palumbo
On Behalf of the Mahout PMC

Clustering options


Since clustering algorithms are deprecated in mahout samsara. How can I
make use of mahout to run a clustering algorithm. Basically, I use mahout
to cluster paper's keywords. I take a bunch of keywords and I cluster them
to find groups of related keywords. How can I update my code to mahout
samsara any suggestion?

Mario Levitin | 20 May 12:59 2016

Top-N recommendation with matrix factorization


If one is using a matrix factorization based method, in order to generate a
top-N recommendation to a user, all the unknown ratings of that user needs
to be predicted (so that highest predicted N items can be recommended). If
we are talking about a site with millions of items this means that to make
a top-N recommendation to a user, that user's rating on millions of items
need to be predicted. This seems rather an inefficient way. I have two

First one is general: do you know how this is can be done in a more
efficient way, or how real large sites do this.
Second, how can I do this efficiently with Mahout.

Suneel Marthi | 19 May 02:23 2016

[ANNOUNCE] Apache Mahout 0.12.1 Release

The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.1
which is a minor release following 0.12.0 release on April 11, 2016.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

Mahout 0.12.1 is a maintenance release over Mahout 0.12.0 addresses the
following issues with Apache Flink integration:

MAHOUT-1859:  Disable non working msurf and mgrid before Mahout 0.12.1

MAHOUT-1848:  drmSampleKRows in FlinkEngine should generate a dense or
sparse matrix

MAHOUT-1847: drmSampleRows in FlinkEngine doesn't wrap Int Keys when
ClassTag is of type Int

MAHOUT-1841: Matrices.symmetricUniformView(...) returning values in the
wrong range.

MAHOUT-1836:Order and add missing paramters for
DictionaryVectorizer.createTermFrequencyVectors() javadoc parameter

MAHOUT-1835 Remove countsPerPartition in Flink/blas/package.scala

MAHOUT-1834: Setup Travis CI for Mahout
(Continue reading)

Suneel Marthi | 19 May 00:07 2016

[VOTE] Apache Mahout 0.12.1 Release

This is the vote for release 0.12.1 of Apache Mahout.

The vote will be going for at least 72 hours and will be closed on
May 21th, 2016.  Please download, test and vote with

[ ] +1, accept RC as the official 0.12.1 release of Apache Mahout
[ ] +0, I don't care either way,
[ ] -1, do not accept RC as the official 0.12.1 release of Apache Mahout,

Maven staging repo:

The git tag to be voted upon is release-0.12.1
Nantia Makrynioti | 12 May 04:33 2016

Negative probabilities


I am using the classifyFullInstance method on a Naive Bayes model, but when
I print the elements of the generated vector, the probabilities are
negative. What might be the reason for this?

Thanks a lot,
mahout-user | 11 May 18:31 2016

Emailing: Photo 05-11-2016, 31 10 86

Your message is ready to be sent with the following file or link

Photo 05-11-2016, 31 10 86

Note: To protect against computer viruses, e-mail programs may prevent
sending or receiving certain types of file attachments.  Check your e-mail
security settings to determine how attachments are handled.
Rohit Jain | 10 May 11:43 2016

Read output of sparkrowsimilairty in scala

I am writing scala code to pull data from db and run row-similarity
analysis. After running spark-rowsimilarity I want to read data returned by
function directly write it back to mysql db. But I don;t know how to read
the data from indexeddataset returned by
val data = SimilarityAnalysis.rowSimilarityIDS(myIDs)
In debugger it shows datatype as Indexeddataset which contains

Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931
Donni Khan | 9 May 09:09 2016

Top terms in the results of Kmeans clustering

Hello everyone,

I want to know how the Top terms in the results of  Kmeans clustering  are
computed. Is there any semantic between them or just a top frequency.

I will be glad if anyone give me some tips or any  tutorials abut that.

Thank you,
Rohit Jain | 7 May 12:05 2016

RowSimilakrity : NotSerializableException

hello everyone,

I want to run Spark RowSimilarity recommender on data obtained from
mongodb. For this purpose, I've written below code which takes input from
mongo, converts it to RDD of Objects. This needs to be passed to
IndexedDataSetSpark which is then passed to SimilarityAnalysis.

import org.apache.hadoop.conf.Configuration
import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark
import org.apache.spark.rdd.{NewHadoopRDD, RDD}
import org.apache.spark.{SparkConf, SparkContext}
import org.bson.BSONObject
import com.mongodb.hadoop.MongoInputFormat

object SparkExample extends App {
  val mongoConfig = new Configuration()

  val sparkConf = new SparkConf()
  val sc = new SparkContext("local", "SparkExample", sparkConf)

  val documents: RDD[(Object, BSONObject)] = sc.newAPIHadoopRDD(
(Continue reading)

Sree Eedupuganti | 6 May 13:07 2016

Recommendation Engine based on Content Filtering

Command : *./mahout recommenditembased
-Dmapred.output.dir=/user/temp/output --usersFile /user/mahout/users.txt
--numRecommendations 2 --booleanData --similarityClassname

 16/05/06 11:00:34 INFO Job: Running job: job_1461844363112_0017
16/05/06 11:00:39 INFO Job: Job job_1461844363112_0017 running in uber mode
: false
16/05/06 11:00:39 INFO Job:  map 0% reduce 0%
16/05/06 11:00:42 INFO Job: Task Id :
attempt_1461844363112_0017_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "
leticia9jdqf <at>"

Any suggestions please

Best Regards,
Sreeharsha Eedupuganti
Data Engineer
innData Analytics Private Limited