Re: Can't have more folds than instances
Sebastian Luna Valero <sebastian <at> uma.es>
2012-06-01 14:30:54 GMT
Hi Stelios,
I think that the message that you get from WEKA is correct. I will try
to explain myself but correct me if I am wrong, please.
If you have a set U with 2 instances U = {1, 2}, you may apply 2-fold
cross-validation like this:
1. Select Te1 = {1} as testing set and Tr1 = {2} as training set, and
obtain the classification results.
2. Select Te2 = {2} as testing set and Tr2 = {2} as training set, and
obtain the classification results.
3. Average results of the above classifications.
However, are you able to apply 3-fold cross-validation with U? The
answer should be: "no", since you need one more instance, at least, to
select a new (testing set, training set) partition that have not been
selected yet.
In general, you have the following:
In k-fold cross-validation, you need x >= k instances in the original
set U, since you select k disjoint sets for testing (Te1, ..., Tek) and
their corresponding training sets (Tr1 = U - Te1, ..., Trk = U - Tek).
In fact, when x = k you are actually applying leave-one-out
cross-validation (LOOCV).
On the other hand, if you have x < k instances in your training set, it
is not possible to select k disjoint sets for testing and training.
Concretely in your case, you are applying 10-fold cross-validation in a
set that has less than 10 instances, and due to the previous
explanation, that is not possible. My advice is that you count the
number of instances in you dataset (x) and then select the number of
folds accordingly (x >= k). For example, if you have 7 instances in your
dataset, select k = 7, 6, 5, 4, 3 or 2 (depending on your needs).
Normally, when the number of instances is so low, I would recommend
LOOCV. (i.e., x = k = 7)
I hope my explanation was clear for you.
Regards,
Sebastian.
On Fri, 1 Jun 2012 14:40:34 +0200, Stelios Togias wrote:
> Hi,
>
> The javadocs for EM clusterer for weka 3.6.6 state:
>
> _"The number of folds is fixed to 10, as long as the number of
> instances in the training set is not smaller 10. If this is the case
> the number of folds is set equal to the number of instances."_
>
> Shouldnt this mean I should not be receiving this error when I use
> less than 10 instances.
>
> On the other hand I am not supplying a separate training and test
> set.
> Could this be the reason?
>
> Thanks
> Stelios
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html