Re: RowSimilarity API -- illegal argument exception from org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio()
Pat Ferrel <pat <at> occamsmachete.com>
2015-07-17 17:21:51 GMT
I use it as a Trait to extend your Scala main object or wherever you have your entry point. Then just call the
method to get a SparkContext and MahoutDistributedContext created and made available as implicits.
This code creates a Spark context so do it before any distributed operations that need a context and do not
create one separately. You can just copy the code if you don’t want to use the trait.
BTW a new version of item and row similarity are going into the master branch this weekend that should run a
fair bit faster. The master now runs on Spark 1.3.1 and even 1.4 and includes many optimizations in the
On Jul 17, 2015, at 7:03 AM, Hegner, Travis <THegner <at> trilliumit.com> wrote:
I appreciate your trying to figure it out. I also have been unable to reproduce this error when using a local
(even threaded) master. I have only gotten it to occur when running via the yarn cluster. I am actually in
the process of building a new hadoop/yarn/spark cluster from scratch, and will test it out there also. My
old cluster is up to date, but has been upgraded many times. Perhaps I'll have some better luck with the new one.
I'm a little confused on where to put or how to use the snippet you provided (sorry still new to scala). Can you
describe that in the context of the RowSimTest project on github? Maybe even a pull request to it if you are
really feeling generous! Just something to give me an idea of how to integrate it, even if non working, I can
figure it out from there. I can then apply it to my actual codebase and see if it makes a difference with a full dataset.
For the time being, I have reverted to swapping my <tag>, <document_ids> in a map and running them through
cooccurrencesIDSs() just to move forward with my project. I do want to get this solved though for the
betterment of the community.