Tricks to improve J48 model performance?
2007-07-31 23:06:07 GMT
Hello. First let me say that I'm something of a novice at data mining, so thank you in advance for bearing with me. I'm running J48 with default parameters on a table with a boolean independent variable (the class) and 2954 dependent variables, all of which are integer and decimal values.
I've been adding new dependent vars to try and improve the model performance, but the predictive accuracy never improves from 60%. This despite the fact that the new vars are
getting used in the model. Now here's the weird part. If I look through the tree and remove variables that don't seem to be predictive, the accuracy goes down.
I'm getting some good classification groups so I should be capable of getting better than 60% accuracy. The fact that it sticks at 60 makes me think that I've hit some plateau with the J48 algorithm and my data set size. Maybe a change in parameters or algorithm would help? Or maybe I need to use guided attribute selection? WEKA is so full of options that I'm a little confused about where to start. Any suggestions or pointers to references would be greatly appreciated.
_______________________________________________ Wekalist mailing list Wekalist <at> list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Read the FAQ "Can I use Weka from Python?" for a quick overview:
RSS Feed