Ruben Roa Ureta | 1 Jun 16:19
Picon
Favicon

[R-sig-eco] glm-model evaluation

> We've mostly gotten out of the area where I know enough statistically to
> speak with confidence, but I'll risk some lumps anyway...
>
> I always thought that the idea of retaining a portion of the data for
> validation was a good idea. I asked David Anderson about this personally
> and
> he said he couldn't see any reason to do that. Using likelihood, he
> thought
> the best approach was to use all the data to determine the best model.
>
> I'm pretty muddy on the difference between selecting a good model with AIC
> (which is sometimes referred to as being predictive in nature) and what is
> meant by post-hoc validation of predictive ability (aside from testing on
> another data set). I've often seen the "leave-one-out" approach used to
> "validate" a model. If anyone has a good reference that differentiates the
> two with an example, I'd really appreciate it.

I think it is a matter of principles. In my view statistical inference
theory only covers estimation of parameters and prediction of new data
GIVEN a model, whereas model selection requires a larger theory. The AIC
fits very well in this view since Akaike´s theorem joins statistical
inference theory with information theory. These two theories together
provide the tools to make model selection (or model identification, sensu
Akaike).
I agree with Anderson that I would use always all my data to best fit my
model with the likelihood. Cross-validation is ad hoc whereas the AIC is
grounded on solid theory.
Rubén
Favicon

Re: [R-sig-eco] Publication quality graphics in R

[Apologies if this is a duplicate; I seem to be having e-mail problems.]

I've also had trouble dealing with formatting 
issues from R to a format acceptable for 
journals.  But I found a really useful 
recommendation from Cadmus, the art folks for 
PNAS.  Here's a useful site: 
http://cpc.cadmus.com/da/tutorials.jsp and 
http://art.cadmus.com/da/instructions/ps80_win.jsp 
They're not specific to R, but there's some good general advice.

I, too, tend to save images as a .pdf 
(specificying final size and resolution), and 
then convert to TIF or EPS using the following (also advised from Cadmus):

For EPS:
Once you have a PDF file, you can open it with 
the “full” version of Acrobat and then do a “Save 
as EPS” – or – you can open your PDF with Illustrator and then “Save as EPS.”

For TIF:
Open the PDF file from within Photoshop. This 
will allow you to determine resolution 
(typically, 600 DPI is ideal for most figures). 
While in Photoshop, go to the menu and click 
"Layer>Flatten Image", crop (trim) excess white 
space around the figure, scale it to the correct 
size, and then "Save As…" a TIF file using LZW (not JPEG or ZIP) compression.

This usually works for the journals I've dealt 
(Continue reading)

Jeff Laake | 2 Jun 07:50
Picon
Favicon

Re: [R-sig-eco] AIC, R-Mark, and nest survival

This problem piqued my interest, so I read some of the papers.  It is 
fairly easy to use the logistic-exposure approach with glm to analyze 
nest success data. I replicated some simple models with the mallard data 
using logistic-exposure approach.  However, there is one advantage of 
the MARK approach over Shaffer's logistic-exposure approach which he 
discusses in his paper.  In MARK, the survival parameters are daily (or 
whatever time interval) so it is easy to create "continuous" time/age 
covariates whereas with the logistic-exposure model it is necessary to 
split the data into observation intervals and use the average time/age 
for the interval.  It is a minor nuisance to split up the data but using 
average values for observation intervals may not be reasonable if  there 
are long time intervals.

As Shaffer showed they are using the same likelihood and will get the 
same results if they use the same covariates,  Choosing a method will 
depend on the flexibility you want. Ultimately, the most flexible 
approach would be to take the basic likelihood and write your own code 
in R.

--jeff
T. Avery | 2 Jun 14:16
Picon
Picon

Re: [R-sig-eco] Publication quality graphics in R

Phil's and others suggestions for publication-quality graphics are all 
good ones. I would like to point out that The GIMP (www.gimp.org) for 
raster images, applications based on Ghostscript/Ghostview/GSview 
(http://pages.cs.wisc.edu/~ghost/) or those apps themselves, or 
applications like PDFCreator/PDFtoolkit/PDFtkBuilder for postscript and 
PDF manipulation, and, finally, Inkscape (www.inkscape.org) for vector 
images, will do everything that the expensive cousins (Acrobat, 
Illustrator, Photoshop) will do and more. Plus most are crossplatform 
and open-source. All are free!

The main trick in graphics is to create the graphic at the size and 
resolution required for final publication. That way the journal does not 
have to resize/resample (at least not too much) and the chance of 
messing up the graphics is reduced. Remember that screen resolution is 
72 dpi (96 dpi in some cases) and that print graphics are generally done 
at 300 dpi so your working image on screen will be 300/72=4.2 x the 
print size (at proper 300 dpi) - 72 or 96 really doesn't matter as it is 
the final graphic size/resolution that is key. Any lossy format (jpg, 
gif) will reduce quality because those formats use an algorithm to 'fill 
in' colour space etc.. The result is blurry images since a black line on 
a white background will have steps of grey produced by the algorithm 
(just zoom in to pixel size to see what I mean). Choose a lossless 
format (tiff, png), or vector formats (pdf, ps, eps) that are also 
lossless (by virtue of being vectorized) to guarantee that what you 
intend is seen. And, finally, use formats that are 
crossplatform/universal so that everyone can get along. Remember that 
eps embeds a tiff image of low quality for viewing/positioning purposes 
when using a layout program so don't get fooled into thinking that is 
your image!

(Continue reading)

Emmanuel Paradis | 2 Jun 15:34
Picon

Re: [R-sig-eco] GEE and AIC

Le 30.05.2008 02:18, tgarland <at> ucr.edu a écrit :
> This is not a problem for contrasts of GLS-type methods. It can be a
> pain in the a** to code a whole bunch of categorical variables as dummy
> variables and then compute contrasts (depending on your software), but
> it is not a problem from the perspective of the math/stats.

You're right to add "depending on your software" because it's very easy
with R (saving the user's a**), but few people seem to know it. The
function model.matrix is the tool here. Suppose you have a factor with
four levels and two repetitions for each level:

R> (x <- gl(4, 2))
[1] 1 1 2 2 3 3 4 4
Levels: 1 2 3 4

Then you just specify to model.matrix that x is a predictor in the usual
R formula notation:

R> model.matrix(~ x)
   (Intercept) x2 x3 x4
1           1  0  0  0
2           1  0  0  0
3           1  1  0  0
4           1  1  0  0
5           1  0  1  0
6           1  0  1  0
7           1  0  0  1
8           1  0  0  1
attr(,"assign")
[1] 0 1 1 1
(Continue reading)

Nicholas Lewin-Koh | 2 Jun 17:51
Favicon

Re: [R-sig-eco] Publication quality graphics in R

Hi,
following this thread I have seen several misunderstandings that I think 
should be cleared up. Firstly, we should be careful what is meant by 
"publication quality", on interpretation is for a particular journal,
a good resolution graphic in the format they require. In general, the
meaning refers to the quality and portability of the graphic for
publishing
in different media while retaining as much of the original detail as
possible.
Some journals require submission in MSworst, for importing graphics 
into a word document, wmf  is microsucks vector format, and is
probably the most suitable for most statistical graphics. For images
a bitmap format like png or tiff is  most suitable. I would avoid jpeg,
as the main purpose of jpeg is compression. If you need to edit
a graphic outside R, wmf, and svg will allow you to ungroup the graphics
components and edit them individually in most good drawing programs.
Personally
I have had good experiences with svg and inkscape. For color graphics
where colour gradients are important, I would recommend exporting
and viewing the graphics in a program with good colour management. R is
not tied to a colour management system and it is trial and error to
get colours printed correctly. There has been some discussion of
incorporating
little cms, but that is probably a good "google summer of code" project. 

In regards to the post below, as of R 2.7, alpha blending is supported
on most devices if R was compiled with cairo. This is the case
for the windows distribution, and the default for configure when
compiling
from source on linux.
(Continue reading)

Favicon

[R-sig-eco] Optimization problem.

Hi, 

I am using R for estimation of salmon historical abundance (39 years) using 10 escapements and harvest data
by constructing likelihood function with and minimizing log lilkelihood (with parameter constraints)
using optim ().  This will estimate total of 49 parameters.  I do the same with EXCEL with Solver. 

My problem using optim() that optim() is slower to converge and produces inaccurate parameter estimates
than that done by EXCEL with Solver. 

I appreciate any suggestions to speed up and produce better parameter estimates.   

Here is the log-likelihood function 

nyear = length(Year);

w = c(3, 2, 1, 1, 1, 2, 3, 3, 2, 2, 0.5);

#          Log Likelihood function

  mylikelihood=function(x){

            Nh=x[1:nyear];

            q1=x[nyear+1];

            q2=x[nyear+2];

            q3=x[nyear+3];

            q4=x[nyear+4];
(Continue reading)

Kingsford Jones | 2 Jun 23:10
Picon

Re: [R-sig-eco] glm-model evaluation

I was hoping that someone well versed in the theory at the interface
of statistics and machine learning would take over, but since there
were no responders I'll give it a go, relying heavily on a quick
re-reading of Ch 7 of:

@book{hastie2001esl,
  title={{The Elements of Statistical Learning: Data Mining,
Inference, and Prediction}},
  author={Hastie, T. and Tibshirani, R. and Friedman, J.},
  year={2001},
  publisher={Springer}
}

I'll make a few comments in-line below, and then discuss some of the
main issues as I understand them.  I'll try to wrap it all up so we
stay relevant to the original question.

On Fri, May 30, 2008 at 9:15 PM, David Hewitt <dhewitt37@...> wrote:
>
> We've mostly gotten out of the area where I know enough statistically to
> speak with confidence, but I'll risk some lumps anyway...
>
> I always thought that the idea of retaining a portion of the data for
> validation was a good idea. I asked David Anderson about this personally and
> he said he couldn't see any reason to do that. Using likelihood, he thought
> the best approach was to use all the data to determine the best model.

I agree that all of the data should be used to fit the best model, but
ideally not all of it used to select the best model.

(Continue reading)

Simon Blomberg | 3 Jun 03:16
Picon
Picon
Favicon

Re: [R-sig-eco] GEE and AIC

On Mon, 2008-06-02 at 15:34 +0200, Emmanuel Paradis wrote:
[snip]

> R> model.matrix(~ x)[, -1]
>    x2 x3 x4
> 1  0  0  0
> 2  0  0  0
> 3  1  0  0
> 4  1  0  0
> 5  0  1  0
> 6  0  1  0
> 7  0  0  1
> 8  0  0  1

Or even just model.matrix(~ x-1)

Simon.

--

-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

(Continue reading)

Andrew Rominger | 3 Jun 06:38
Picon
Favicon

[R-sig-eco] Inference, logistic regression

Dear list,

Please pardon this beginner's-level question, I feel it's not quite up  
to the same caliber as recent discussions.

I'm working with a simple logistic regression model comparing the  
presence/absence of an insect species against an index of plant  
species turnover:

> foo<-glm(bout.psol$pres.de~bout.psol$index,family=binomial)

The term bout.psol$pres.de is binary 0,1; and bout.psol$index is continuous.

I'd like to use a likelihood ratio statistic to test the significance  
of this regression, but I'm a little uncertain as how to proceed.   
When I call summary(foo), I get...

Call:
glm(formula = bout.psol$pres.de ~ bout.psol$index,
     family = binomial)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-1.7180  -1.1289   0.6314   1.0323   1.7499

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)
(Intercept)           0.30584    0.23095   1.324  0.18542
bout.psol.edit$index  0.04552    0.01439   3.163  0.00156 **
---
(Continue reading)


Gmane