Yang | 31 Oct 00:37 2014
Picon

NaN produced by SSVD ?

we are running ssvd on a dataset (this one is relatively small, with 8000
rows, number of columns is 64 ),  we ran it with rank = 58, since sampling
p=5.

the result had NaN on multiple columns.

why would this appear ?

I am now running with lower rank=20 , to see if it goes away.

Thanks
Yang
Sean Farrell | 30 Oct 07:15 2014
Picon

Mahout documentation

Hi,

I'm new to Mahout and am currently coming to grips with the various
algorithms. Is there a repository somewhere of documentation that explains
what each of the parameters in each of the algorithms does? I know there is
a basic helpfile for each task, but I'm looking for something a bit more in
depth. For example, I'm trying to work out what setting the partial
implementation (i.e. -p) flag does in the buildforest task.

Thanks,

Sean
Tom LAMPERT | 28 Oct 22:21 2014
Picon

Lucene version compatibility

Hello,

Does anyone know whether it is possible to upgrade the version of lucene that mahout's lucene.vector and
luecene2seq functions are compatible with? Currently they require lucene 4.6.1 indexes but this
version is already quite dated...

Kind regards,

Tom
Niko Gamulin | 28 Oct 21:54 2014
Picon

Spectral clustering issue

Hi,

I have tried to run spectral clustering example
<http://mahout.apache.org/users/clustering/spectral-clustering.html> and
got the following error:

Exception in thread "main" java.io.FileNotFoundException: No such file
or directory 'hdfs://172.31.16.9:9000/user/hadoop/temp/calculations/unitvectors'
	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:759)
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:507)
	at org.apache.mahout.clustering.spectral.kmeans.EigenSeedGenerator.buildFromEigens(EigenSeedGenerator.java:68)
	at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:244)
	at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:127)
	at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:118)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:69)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

I tried to run it on Amazon EMR using version 0.8, 0.9 and then
compiled version from the repository.

Does anyone know whether there is any bug in the code or is the EMR
cluster not configured correctly?

Regards,
(Continue reading)

Marko Dinić | 28 Oct 15:35 2014

Mahout 0.9 on Hadoop 0.20.2

Hello,

I have Hadoop cluster on which Hadoop 0.20.2 is installed. Is there a 
way to use Mahout 0.9 on that cluster?

  I understand that Mahout 0.9 is based on Hadoop 1.2.1, but I have this 
constraint, so I cannot install another version of Hadoop on it.

Thanks,
Marko

sleefd | 27 Oct 15:47 2014
Picon

回复: Re: compatibility of hadoop and mahout version

you have to keep your hadoop installed version similar with the version defined in pom.xml or  the hadoop
 in lib/hadoop.If not,you  have to  compile  new  mahout  distribution according  to your
 hadoop version.This  can  be  done  with maven command.

从三星移动设备发送

-------- 原始邮件 --------
发件人: jyotiranjan panda <tell2jyoti <at> gmail.com> 
日期:2014-10-27  PM6:08  (GMT+08:00) 
收件人: user <at> mahout.apache.org 
主题: Re: compatibility of hadoop and mahout version 

Thanks Suneel,
Now I changed the hadoop version from 2.3.0 to 1.2.1. and getting new error.
-----------------------------------------------------------------------------------------------------------------
SLF4J: slf4j-api 1.6.x (or later) is incompatible with this binding.
SLF4J: Your binding is version 1.5.5 or earlier.
SLF4J: Upgrade your binding to version 1.6.x.
Exception in thread "main" java.lang.NoSuchMethodError:
org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
    at org.slf4j.LoggerFactory.bind(LoggerFactory.java:128)
    at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:107)
    at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:295)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:269)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281)
    at org.apache.mahout.common.AbstractJob.<clinit>(AbstractJob.java:90)
    at
com.jyoti.mahout.HelloWorldClustering.main(HelloWorldClustering.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
(Continue reading)

jyotiranjan panda | 27 Oct 06:34 2014
Picon

compatibility of hadoop and mahout version

Hi,
I have just started mahout learning last week.
I am facing lots of problem in executing sample examples in mahout, before
I ask those errors, I want to confirm about the version compatibility of
demons.

I am using Apache Hadoop-2.3.0 with Mahout-distrubution-0.9 in ubuntu14.04
32 bit Laptop.
while running Mahout on command I don’t get any errors and it gives me all
the valid program names as output.But while executing a clustering example
it gives error as below.

---------------------------------------------------------------------------
hduser <at> localhostjp:/usr/local/hadoop$ hadoop jar mahouttest.jar
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 3 more
-----------------------------------------------------------------------------------------------------------

Regards
(Continue reading)

Niko Gamulin | 26 Oct 22:19 2014
Picon

Problems with K-Means Spectral Clustering on EMR

Hi,

I tried to run Spectral clustering example from mahout website on EMR.

I uploaded to the bucket the following files:
affinity.txt (affinity matrix)
mahout-core-0.9-job.jar
mahout-core-0.9.jar
update-lucene.sh
lucene-4.3.0.tgz

The update-lucene.sh contains the following:

#!/bin/bash
cd /home/hadoop
wget https://s3.amazonaws.com/hellomahout/lucene-4.3.0.tgz
tar -xzf lucene-4.3.0.tgz
cd lib
rm lucene-*.jar
cd ..
cd lucene-4.3.0
find . | grep lucene- | grep jar$ | xargs -I {} cp {} ../lib

The Cluster configuration is the following:

Hadoop Distribution: Amazon, AMI version: 3.2.1

EC" instance types:
Master: m1.large, 1
Core: m1.large, 1
(Continue reading)

Gourav Khaneja | 24 Oct 19:00 2014
Picon

Dimensions with value "Zero" (0) are not appearing in the kmeans cluster output

Hello,

I have a set of 10 dimensional vectors, which I wanted to group into
clusters. I ran mahout kmeans clustering program as follows :

$ mahout kmeans --input input/  --output output/ --clusters clusters/ -k 20
-xm sequential --maxIter 10000 -ow  -cd 0.0000000000005

It produces clusters as follows:

gourav <at> mustang2:~$ mahout clusterdump -i output/clusters-*-final/ -o dump;
cat dump

VL-422383{n=29

                            c=[93.241, 0.241, 187383906066.860, 0.070,
0.057, 0.042, 0.000]

                            r=[237.392, 0.625, 29412153437.220, 0.236,
0.036, 0.049, 0.001]}

VL-344819{n=133921

                            c=[50.032, 775.298, -0.000, 300288032.310,
-0.043, 0.031, 0.016, 0.000]

                            r=[233.523, 142338.059, 0.007, 92781073.166,
0.267, 0.026, 0.018, 0.000]}

VL-344939{n=3
(Continue reading)

Hersheeta Chandankar | 24 Oct 17:58 2014
Picon

Categorization of documents using clustering and classification

Hi,

I have a collection of crawled text documents on different topics which I
want to categorize into pre-decided categories like travel,sports,education
etc.
For this I've firstly clustered these documents using k-means clustering
and then built a complimentary-naive bayes model of these clustered
documents.
The accuracy and reliability of the model was 83% & 63% respectively.
Now the problem is that, on deploying the model the results recorded are
absurd
(eg- A sports document is categorized under business category).
On analyzing the problem, I found that the clusters formed were not clean
(contained unrelated documents) which may have led to creation of wrong
dictionary file.

In order to avoid this, is there any other way to get the input data
preprocessed and clustered ?
or
Is there any other alternative approach that could be used for the
categorization?

Thanks,
-Hersheeta
Benjamin Eckstein | 24 Oct 00:51 2014
Picon

Invoking Mahout 0.9 with Lucene 4.6.1 ClassNotFoundException

Hello, i have 2 lines of code, that produces a class not found exception

Gmane