Viral Parikh | 26 Nov 00:27 2014

Clusters developed using Mahout 0.8 version, can I use clusterdump from Mahout 0.9 version?

To Whomsoever It May Concern -

I have kmeans clusters from Mahout 0.8 version. Since then we have upgraded our Hadoop stack and also Mahout
to 0.9 version.

But I wanted to clusterdump the clusters we created using 0.8 version. When I use clusterdump in 0.9
version, I get an error -

Exception in thread "main" java.lang.ClassCastException:
org.apache.mahout.clustering.classify.WeightedVectorWritable cannot be cast to org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable

In 0.9 Apache Mahout Release Notes below, I see that the ClusteredPoints should be
weightpropertyvectorwritable and not weightedvectorwritable.

What should I do in this case? I still want the clusterdump of my previous clusters developed from 0.8 version.

I look forward to your reply.

Thank you,

Jakub Stransky | 25 Nov 14:31 2014

algorithms Apriori, FPgrowth

Hello experienced mahout users,

I am new to mahout library and I have a bit trouble to find a starting
point for "associative rule mining"  as I don't see neither Apriori not
FPgrowth algorithm on the list of implemented algorithms. Contrary I found
several blog posts with referal to mahout library for implementation of
those algorithms.
I am a bit confused what the current state is and where to find appropriate

Any hint would be appreciated.

unmesha sreeveni | 25 Nov 09:47 2014

[blog] How to do Update operation in hive-0.14.0


Hope this link helps for those who are trying to do practise ACID
properties in hive 0.14.


*Thanks & Regards *

*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
Ashok Harnal | 23 Nov 04:00 2014

Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: cannot be cast to

I use mahout 0.7 installed in Cloudera. After creating user-feature and
item-feature matrix in hdfs, I run the following command:

 mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures $res_out_file/U/ --itemFeatures $res_out_file/M/
--numRecommendations 1 --output $reommendation --maxRating 1

After some time, I get the following error:

14/11/23 08:28:20 INFO mapred.LocalJobRunner: Map task executor complete.
14/11/23 08:28:20 WARN mapred.LocalJobRunner: job_local954305987_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.ClassCastException: cannot be cast to
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: cannot be cast to
    at org.apache.hadoop.mapred.MapTask.runNewMapper(
(Continue reading)

Ashok Harnal | 22 Nov 13:37 2014
Picon Mkdirs failed to create file--Mahout 0.9 parallelALS

I am using mahout ver 0.9.

I am running the following mahout command using matrix factorization for
recommender engine.

I get the following error:

mahout parallelALS --input $in_file --output $out_file --lambda 0.1
--implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10
--numThreadsPerSolver 1 --tempDir $tmp Mkdirs failed to create
    at org.apache.hadoop.fs.FileSystem.create(
(Continue reading)

Parimi Rohit | 20 Nov 01:34 2014

Bi-Factorization vs Tri-Factorization for recommender systems

Hi All,

Are there any (dis)advantages of using tri-factorization (||X - USV'||) as
opposed to bi-factorization ((||X - UV'||)) for recommender systems? I have
been reading a lot about tri-factorization and how they can be seen as
co-clustering of rows and columns and was wondering if such as technique is
implemented in Mahout?

Also, I am particularly interested in implicit-feedback datasets and the
only MF approach I am aware of is the ALS-WR for implicit feedback data
implemented in mahout. Are there any other MF techniques? If not, is it
possible (and useful) to extend some tri-factorization to handle
implicit-feedback along the lines of "Collaborative Filtering for Implicit
Feedback Datasets" (the approach implemented in Mahout).

I apologize for any inconvenience as this question is very general and
might not be relevant to Mahout and I would really appreciate any

Hersheeta Chandankar | 19 Nov 08:11 2014

Scores of Complimentary Naive Bayes Classifier

Hi ,

I've been working on Mahout Complimentary Naive Bayes Classifier for
categorization of documents.

Could anyone help me understand how exactly the ' scores ' given by the
classifier are calculated ? On what factors they depend?

Is there any way to set a threshold value for the score?

Thanks ,
Lee S | 19 Nov 04:12 2014

How to deal with catogrical and date data in mahout ?

Hi all:
 Do you hava any good practice when you deal with catogrical data?
 Does mahout have provided a tool class which can do the convertion?
Eyal Allweil | 18 Nov 15:12 2014

Using a predefined test set instead of using the split command when using the classifiers

Hi all,
I am trying to test Mahout's classifier, but skip the split step (as was asked in this question). I have three
categories and "other" for my test. When I use the split command and let Mahout split it (say, with
randomSelectionPct = 20) I get perfectly good results. But when I try splitting the input myself into test
and training (so I can compare with other solutions) the classifier chooses "other" for everything.
I am following the suggestion from this question. Does anyone have an idea of how I can do this, or what I am
doing wrong?
Thanks in advance!Eyal

Mahout 0.9: Using own test set instead of using split command
|   |
|   |  |   |   |   |   |   |
| Mahout 0.9: Using own test set instead of using split co...I have referred to these two links to run mahout
NB classifier [1]
[2] |
|  |
| View on | Preview by Yahoo |
|  |
|   |

Richard Hanson | 17 Nov 14:49 2014

Incremental refresh() with Psql

Hello, I'm trying to use mahout with PostgreSQL via the JDBC DataModel (ReloadFromJDBCDataModel wrapped around PostgreSQLBooleanPrefJDBCDataModel) and want to be able to read only new elements from the Database when calling the refresh() command. Is that possible? I have seen that the MongoDB implementation does something along those lines and the FileDataModel also via delta files.
The reason for doing this is that reading in the whole DB each time a refresh in called is very time consuming, partly because it is not necessarily local.

Thank you & regards,
Donni Khan | 17 Nov 14:01 2014

How to choose the intioal clusters for K-mean from Tf-IDF vectors

Hi All,

I'm working with text clustering. I want to select specific documents(as a
vectors) to be centroIDs fo k-means.
I have created the TF-IDF for my dataset by using Mahout, and I would like
to choose the initioal clusters from TFIDF vectors.

Anyone has an idea Hw I can do it by Mahout?

Many thanks in advance.