Mahout Clustering Help Please
David Kaplan <davkap92 <at> gmail.com>
2015-08-12 12:49:29 GMT
Hope someone can please point me in the right direction,
Very new to mahout..
Here's my scenario:
I have written a system that collects Classifieds items from multiple
websites - phones,cars,antiques and many more using scrapy, all the items
are then ingested into Solr - +- 3 million entries.
This is then the backend for my search engine
I want to be able to extract meaningful information to accurately
calculate realistic price average etc. I need guidance/perhaps examples in
accurate outlier detection, categorization etc extreme beginner in machine
learning so need to know if that's what I should be using
Part of my challenge is the broad range of items/categories, different
levels of skewed data etc. e.g. finding outliers with "iphone" results when
many of those are cheap iphone accessories.
Basically it seems i need to cluster/classify but not sure exactly how to
go about it, because i do already have the categories for 500K of the
entries, example category "Cell Phones & Accessories - Accessories"
And then actually connecting Mahout to Solr...