Re: error measures of nominal attributes
P. Klaas-Welter <P.Klaas-Welter <at> web.de>
2005-07-01 09:35:42 GMT
Dear Eibe, dear Weka-users,
thank you a lot! You were right: I was mistaken with the vectors. But finally Ive got it J
So how are the values of the error measures to interpret? Here are some assumptions of what I understood (from your book, this mailing list etc.):
*) If the root mean squared error is much different from the absolute value it could point to that the data has big and/or many outliers.
*) The relative absolute error compares the actual result to the result from a simple calculation (ZeroR for nominals).
If this value is >100% the simple calculation does better.
*) If the root relative squared error is >100% and the relative absolute error <100% it points to that the actual algorithm has more problems
with outliers than the simple calculation (ZeroR for nominals).
Am I right with these assumptions?
But there are some points that I not clear to me. Please have a look on my example:Decision Table: Options: -R -I
Number of training instances: 17177
Number of Rules : 5049
Non matches covered by IB1.
Best first search for feature set,
terminated after 5 non improving subsets.
Evaluation (for feature selection): CV (leave one out)
Feature set: 1,2,3,4,5
Correctly Classified Instances 5531 32.2019 %
Incorrectly Classified Instances 11645 67.7981 %
Kappa statistic 0.1378
Mean absolute error 0.0073
Root mean squared error 0.0663
Relative absolute error 88.7076 %
Root relative squared error 103.7543 %
Total Number of Instances 17176
About each third instance was correctly classified. But the Kappa statistic is quite bad (possible best value: 1, worst value: 0).
On the other hand the mean absolute error is quite good (possible best value: 0, worst value: 1). How is this to interpret?
With best wishes, Petra
Eibe Frank <eibe <at> cs.waikato.ac.nz> schrieb am 30.06.05 23:10:27:
The formula for the sum is correct but it seems like you are misunderstanding how the two vectors are computed. One of the vectors contains the predicted class probabilities that are output by the model for a particular instance, the other vector contains the observed class probabilities for that particular instance. The latter(!) vector has one element that is 1 (the one for the actual class of the instance) and all other elements are 0.
Cheers, Eibe
On Jul 1, 2005, at 1:47 AM, P. Klaas-Welter wrote:
> I just noticed that the formula for the sum is not readable, therefore: > > dj = ¡Æi=1m | pi ¨C aji | means: The sum from i=1 to m over | pi - > aji | > > > "P. Klaas-Welter" <P.Klaas-Welter <at> web.de> schrieb am 30.06.05 12:06:43: > > > Dear Eibe, > > thank you very much! This was very helpful (and now I also found the > right point in the book  > > Just to be sure and because the error measures are so important I like > to describe the other error values and I like to please you to check > wether I'm right: > Let the nominal attribute have m different values. Let the vector P > contain all probabilities pi, that the nominal attribute has the value > i. Those probabilities come from the frequencies of each value i. Let > the vector A be the result from the model for the instance j. When k > is the value that the model computed for instance j then all entries > in A are zero but ajk, which is 1. > To compute the mean absolute error you have to compute the absolute > difference for each instance of vector P and vector Aj. This is done > component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | . > These differences dj (for the single instances) are then summed up > over all instances and then divided by the number of instances. > > And for the root mean squared error in dj you don¡¯t take the absolute > value but the square. And before dividing through the number of > instances you take the square root. > > Thank you very much! And with best regards, Petra > > > > > Eibe Frank schrieb am 29.06.05 23:35:59: > > > > > > On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote: > > > > > Could someone please help me to understand the mean absolute error, > > > root mean squared error, > > > relative absolute error and root relative squared error of nominal > > > attributes? > > > > > > I know that one can find this question several times in this > > > mailing-list. But none of these > > > could really help me. Or does someone know where to find a > > > comprehensive explanation? > > > > > > As far as I understood (with help from what I read from Eibe > Frank): > > > root relative squared error: Let Y be the root mean squared error > that > > > is computed for the > > > single class prior probabilities (frequencies). These probabilities > > > are estimated from the training data > > > with a simple Laplace estimator. Let X be the root mean squared > > > error that came from the prediction of the model. Then the ~ is > 100 * > > > X / Y. > > > So what is done with the mean value for numerical classes is done > with > > > estimated probabilities > > > for nominal classes. The same for the relative absolute error. > > > > Yes, thats correct. Y is the error obtained from the probability > > estimates generated by ZeroR (which just estimates the prior > > probabilities). > > > > The squared error for a particular instance is given by the > "quadratic > > loss" function mentioned in our book (where we talk about evaluating > > probability estimates). Its the sum of the squared differences > between > > the predicted class probabilities for a particular instance and the > > observed class probabilities for that instance (which are either 0 or > > 1). The absolute error is computed in the same way by taking the > > absolute value of each difference instead of the square. > > > > Cheers, > > Eibe > > >
|
_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist