Re: Stream Classification of Data Set
2008-11-01 00:08:16 GMT
> Hi. I'm a student new to machine learning and interested in dealing with a > large data set. I've read the FAQ article concerning the topic > (http://weka.sourceforge.net/wiki/index.php/Classifying_large_datasets), and > I'm interested in stream, rather than batch methods of dealing with this > problem. You mention that the UpdateableClassifier interface handles this > but considering that this pretty much only covers Lazy and Bayesian > classifiers I'm guessing this is actually intended for incremental > classifiers (ie. train, use, then train some more), a subset of classifiers > that don't require the full data set to reside in memory. Unless I'm > mistaken several other algorithms such as neural networks, not to mention > the humble conjunctive rule should be able to handle input sequentially > (without random access), and hence have a 'buildClassifier' method that > handles a stream of data to reduce the memory burden. > > I thought this was rather basic so I've been scrounging around for a couple > days looking for such capabilities. Before I start mucking around the source > or resort to MOA I was wondering if: > a. I'm wrong and all non-Bayesian and non-Lazy classifiers truly need random > access to the training data or... Yes, most of the classifiers need that kind of access (or at least, the way they were implemented), for computing statistics for attributes, like infogain, etc. > b. Weka already has these basic capabilities and I'm just not spotting them Incremental classifier support is there, what's necessary are contributions. > or... > c. Weka simply lacks such capabilities.(Continue reading)
RSS Feed