1 Jun 16:19
[R-sig-eco] glm-model evaluation
> We've mostly gotten out of the area where I know enough statistically to > speak with confidence, but I'll risk some lumps anyway... > > I always thought that the idea of retaining a portion of the data for > validation was a good idea. I asked David Anderson about this personally > and > he said he couldn't see any reason to do that. Using likelihood, he > thought > the best approach was to use all the data to determine the best model. > > I'm pretty muddy on the difference between selecting a good model with AIC > (which is sometimes referred to as being predictive in nature) and what is > meant by post-hoc validation of predictive ability (aside from testing on > another data set). I've often seen the "leave-one-out" approach used to > "validate" a model. If anyone has a good reference that differentiates the > two with an example, I'd really appreciate it. I think it is a matter of principles. In my view statistical inference theory only covers estimation of parameters and prediction of new data GIVEN a model, whereas model selection requires a larger theory. The AIC fits very well in this view since Akaike´s theorem joins statistical inference theory with information theory. These two theories together provide the tools to make model selection (or model identification, sensu Akaike). I agree with Anderson that I would use always all my data to best fit my model with the likelihood. Cross-validation is ad hoc whereas the AIC is grounded on solid theory. Rubén
RSS Feed