mujeeb rahman | 1 May 05:30
Picon

Count data

Hi All
I am currently working on soil fauna, abundance of soil macrofauna
from different habita. Due to the overdispersion, the abundance (count
data) contains many extreme values (zeroes as well as 100s..!!!).

The data contains abundance soil fauna from 15 landuse type (coming
under four ecosystems), for each landuse type I have 4 spatially
different plots. I each plot 4 random soil monoliths are taken and
soil fauna were counted.

I think it is a nested model, monoliths are nested in plot, plots are
nested in landuse. Poisson, NBD of Zeroinflated NBD models may be
appropriate, but must be validated. Is there were any provision to
conduct such analysis in R?. If so please give me details of package
or examples, if any.

A line of communication is greatly appreciated, which is worthful..

Thanking all
Mujeeb Rahman P
KFRI, Peechi, Thrissur, Keraka, India
Roy Sanderson | 1 May 12:14
Picon
Picon
Favicon

Multivariate proportional response data

Hello All

I have been given a set of proportion data, that consists of three
variables that sum to 1.0 that are the response, with two explanatory
variables (one the day of the experiment, 1 to 30, the other a two-level
treatment factor).

Given that the three response variables are non-independent, what is the
best approach to analysing them?  I'm reluctant to arcsine them, and
analyse each one independently and the raw data are not available, only
the proportions.  Presumably some sort of multinomial technique might be
appropriate?

Many thanks for any hints.

Roy

--

-- 
Roy Sanderson
School of Biology
Devonshire Building
Newcastle University
Newcastle upon Tyne NE1 7RU
r.a.sanderson@...
0191 246 4835
Thomas Petzoldt | 1 May 13:09
Picon
Favicon

Re: Count data

Hi Mujeeb,

I'm not yet completely sure about the particular question of your work, 
but you may read the following paper that appeared in the Journal of 
Statistical Software two months ago. It describes a package for 
organizing and processing ecological count data and contains also an 
overview over other ecologically relevant packages:

Solymos, P. (2009) Processing Ecological Data in R with the mefa 
Package. Journal of Statistical Software Vol. 29, Issue 8, Feb 2009.
http://www.jstatsoft.org/v29/i08

Note that JSS is an ISI listed OpenAccess Journal, so you can access the 
PDFs freely without cost.

Hope it helps.

Thomas Petzoldt

mujeeb rahman wrote:
> Hi All
> I am currently working on soil fauna, abundance of soil macrofauna
> from different habita. Due to the overdispersion, the abundance (count
> data) contains many extreme values (zeroes as well as 100s..!!!).
> 
> The data contains abundance soil fauna from 15 landuse type (coming
> under four ecosystems), for each landuse type I have 4 spatially
> different plots. I each plot 4 random soil monoliths are taken and
> soil fauna were counted.
> 
(Continue reading)

Devin Johnson | 1 May 18:53
Picon
Favicon

Re: Multivariate proportional response data

Roy,
You have compositional data. Check out any references by John  
Aitchison. He has a book "The Statistical Analysis of Compositional  
Data" and a host of journal articles. The main approach is to use a  
multivariate logit transform
[y1,...y_{k-1}] = [log(x1/x_k),...,log(x_{k-1}/x_k)]
where [x_1,...,x_k] is your "sum-to-one" composition. Then analyze  
[y_1,...,y_{k-1}] using multivatiate linear models.

--Devin

On May 1, 2009, at 3:14 AM, Roy Sanderson wrote:

> Hello All
>
> I have been given a set of proportion data, that consists of three
> variables that sum to 1.0 that are the response, with two explanatory
> variables (one the day of the experiment, 1 to 30, the other a two- 
> level
> treatment factor).
>
> Given that the three response variables are non-independent, what is  
> the
> best approach to analysing them?  I'm reluctant to arcsine them, and
> analyse each one independently and the raw data are not available,  
> only
> the proportions.  Presumably some sort of multinomial technique  
> might be
> appropriate?
>
(Continue reading)

Thomas Petzoldt | 4 May 09:58
Picon
Favicon

Re: Simultaneous estimation of nonlinear models

Hi Fred,

the new development package "FME" (Flexible Modelling Environment) 
contains (among others) functions to support fitting nonlinear models. 
It is focused on numerically solved differential equation models in 
ecology and related sciences, but it works also for other models.

The package was not yet uploaded on CRAN but can be installed from the 
R-Forge development server:

 >  install.packages("FME", repos="http://r-forge.r-project.org")

Note that the latest binary version requires to have R version 2.9.0 (or 
newer).

Look at the first example ("Fitted with analytical solution") for 
function "modFit" or read chapter "Fitting the model to data" of the 
package vignette:

http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/pkg/FME/inst/doc/FME.pdf?root=fme

Hope it helps

Thomas Petzoldt

Fred Takahashi wrote:
 > Hello
 >
 > I work with plant ecophysiology and use R for about 5 years. Now I
 > need to fit nonlinear data of response of photosynthetic rate to CO2
(Continue reading)

Cortney Watt | 6 May 15:54
Picon

Statistical question regarding interaction terms

Hi All

I am a master’s student and I have a statistical question.  I have an
experiment evaluating the effects of intertidal elevation (fixed; 3 levels)
(iz) and seaweed canopy cover (fixed; 2 levels) (c_nc) on species richness
(richness). Quadrats are randomly distributed across 5 sites (random factor)
(site)with 4 replicates in each elevation and canopy treatment combination
per site; therefore, I am using a nested model.  The model is described by
Underwood in his Experiments in Ecology textbook (1997, page 367) and is:

 df
            Mean square denominator

Intertidal Zone                                                    2
            S(I*C)

Canopy Cover                                                    1
            S(I*C)

Intertidal zone * Canopy cover                             2
S(I*C)

Site (Intertidal zone * Canopy Cover)                   24
Error

Error                                                                 90

The usual procedure is to run this “main effects” model and, when there is a
significant interaction term, to run simple effects at each level of A)
intertidal zone and B) canopy cover using this model (as an example for each
(Continue reading)

Philip Dixon | 7 May 12:37
Favicon

Re: Statistical question regarding interaction terms (Cortney Watt)

Courtney,

There are at least two possible explanations:
1) Tests of interactions are less powerful than tests of simple effects (i.e. 
tests of canopy cover in each zone).  This is easy to see by defining 
contrasts among cell means in the 2 x 2 table of means (zone * canopy) and 
calculating the se of a simple effect and the se of a difference in simple 
effects (i.e. the interaction).
2) The denominator variances, site(I*C), are not similar in the three zones.  
The test of interaction in the overall anova is based on a pooled site(I*C) 
variance.  The analysis of each separate zone estimates three site(I*C) 
variances.  If the site(I*C) variance is large at your third elevation and 
smaller at the other two elevation, the tests of canopy effects can give 
different p-values, even if the canopy effects (differences in means) are 
similar in the three zones.

Best wishes,
Philip Dixon
Capelle, Jacob | 12 May 17:57
Picon
Favicon

testing for distribution

Dear all,

I have a kind of a theoretical question from which I hope it might interest you and hopefully can help me a bit.

In order to obtain ecological (surrvey) data, I try to make a prediction about the accuracy of a sampling
tool to estimate mussel density. For this reason I took a lot of samples at a certain fixed location and
counted the amount of mussels in each sample. Because mussels are aggregated on the sediment, I had a lot of
zero values. To estimate the sample size I used a binomial distribution and obtained the k value and the mu
from the fitdistr(x,"negative binomial") (MASS).

The question I have is: how can I test if this distribution accurately described my (zero inflated count) data?

I am a bit familiar with the AIC but since I only have counts on one variable I cannot perform a GLS. 
Creating a vector with rnbinom() using the k and mu from the fitdistr() I plotted a histogram and compared it
with my data, this showed that is was roughly comparable, but I want to quantify this.

I have a biological background not a statistical one, so I realize I can ask silly questions.
But I hope someone can give me some hints. 

Kind regards,

Jacob Capelle

PhD student
Wageningen Imares
The Netherlands
jacob.capelle@...
<mailto:jacob.capelle@...> 
Manuel Spínola | 13 May 13:13
Picon
Gravatar

Re: testing for distribution

Dear Jacob,

May be you can use cluster sampling or adaptive cluster sampling  
(Design-based estimation) to get a density estimate.
Best,

Manuel Spínola

Capelle, Jacob wrote:
> Dear all,
>  
> I have a kind of a theoretical question from which I hope it might interest you and hopefully can help me a bit.
>  
> In order to obtain ecological (surrvey) data, I try to make a prediction about the accuracy of a sampling
tool to estimate mussel density. For this reason I took a lot of samples at a certain fixed location and
counted the amount of mussels in each sample. Because mussels are aggregated on the sediment, I had a lot of
zero values. To estimate the sample size I used a binomial distribution and obtained the k value and the mu
from the fitdistr(x,"negative binomial") (MASS).
>  
> The question I have is: how can I test if this distribution accurately described my (zero inflated count) data?
>  
> I am a bit familiar with the AIC but since I only have counts on one variable I cannot perform a GLS. 
> Creating a vector with rnbinom() using the k and mu from the fitdistr() I plotted a histogram and compared
it with my data, this showed that is was roughly comparable, but I want to quantify this.
>  
> I have a biological background not a statistical one, so I realize I can ask silly questions.
> But I hope someone can give me some hints. 
>  
> Kind regards,
>  
(Continue reading)

Erika Mudrak | 13 May 20:17
Picon
Favicon

Re: testing for distribution

Jacob-  You can use a Chi-squared goodness of fit - chisq.test() for discrete distributions like the
negative binomial and a Kolmogorov-Smirnoff test- ks.test() for continuous distributions.      They will
both produce a p-value which tests the null hypothesis that your data come from the given distribution
with stated parameters.    Use the parameter estimates from your fitdistr() results. So if p>0.05 (or 0.1 or
whatever), your data come from that distribution. 

For Discrete distributions, try something like: 
fit=fitdistr(.....)
chisq.test(x=ActualData, y=rnbinom(n=length(ActualData), k=fit.k, mu=fit.mu))
#I think this is right, I haven't actually tried it...
# This is akin to quantitatively comparing your histograms...

For continous distributions (such as beta), the code would be this: 
fit=fitdistr(...)
ks.test(ActualData, "pbeta", shape1=fit$estimate[1],shape2=fit$estimate[2])
# I've done this successfully

You can use AIC to test if another distribution fits your data better than negative binomial does.  I think
it's possible for your data to "pass" the Chi-Squared/Kolmogorov-Smirnoff test for two different
distributions, but it will fit one better than another. 

Erika Mudrak

-------------------------------------------
Erika Mudrak
Graduate Student
Department of Botany
University of Wisconsin-Madison
430 Lincoln Dr
Madison WI, 53706
(Continue reading)


Gmane