1 Aug 2011 05:29
Re: Parallel FPGrowth driver - what is a good demo?
Lance Norskog <goksron <at> gmail.com>
2011-08-01 03:29:05 GMT
2011-08-01 03:29:05 GMT
I've rewritten the FPGrowth wiki page. Is still a bit ragged. Please critique for content. https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining On Thu, Jul 28, 2011 at 12:59 AM, Lance Norskog <goksron <at> gmail.com> wrote: > Ok, now I've succeeded in running fpgrowth, both sequential and > mapreduce, from the 'fpg' job and the flag that chooses 'sequential' > from 'mapreduce'. I've done this from two different datasets, > accidents.dat and retail.dat. I only ran the first thousand lines of > both datasets for time reasons. > > Both sequential and mapreduce locate the same ids as being in > patterns. Examining the patterns in detail, they do not match but > patterns involving id X generally the same size. Successive runs of > each variant give exactly the same results, so having sequential and > mapreduce give different result sets is puzzling. Pulling the > distances is a little difficult with text processing. > > What can account for the different outputs of map/reduce and > sequential (pseudo-distributed) modes? > > > > > On 7/27/11, Lance Norskog <goksron <at> gmail.com> wrote: >> I'll prep a current version. >> >> On 7/27/11, Robin Anil <robin.anil <at> gmail.com> wrote: >>> On Tue, Jul 26, 2011 at 11:06 PM, Lance Norskog <goksron <at> gmail.com>(Continue reading)
RSS Feed