Agustin Lobo | 4 Nov 14:18
Picon

[Fwd: Component merge in hclust objects]

-------- Original Message --------
Subject: Component merge in hclust objects
Date: Mon, 03 Nov 2008 10:39:57 +0100
From: Agustin Lobo <aloboaleu@...>
Reply-To: Agustin.Lobo@...
To: r-help@... <r-help@...>

Having a non-standard problem of clustering, I'm making an "ad-hoc"
procedure in R but would like my object to be a list like
the one produced by hclust. I have a doubt regarding component
merge:

While the meaning of the negative elements is clear, I'm
confused about the next sentence in the hclust help page:
"If j is positive then the merge was with the cluster formed at the
(earlier) stage j of the algorithm"

As an example, I've made:

> hc <- hclust(dist(USArrests[1:10,]), "ave")
> hc$merge
       [,1] [,2]
  [1,]   -1   -8
  [2,]   -3   -5
  [3,]   -6  -10
  [4,]   -4    3
  [5,]    1    4
  [6,]   -2    2
  [7,]   -9    6
  [8,]    5    7
(Continue reading)

HolyI N.M | 7 Nov 05:24
Favicon

Academic assistance

Greetings to all R Helpers out there.
Please, I am new to this software.In attempt to analyse my data(here attached) I had the following problems:
1. spp richness could be calculated on seperate plots not on all.However, accumulation curve could be obtained;
2. No diversity index can be calculated
3. PCA, CCA, clustering, spatial distribution of spp. wrt environmental variables, spp. abundance, etc
could not be calculated;
The complain I got which I could not solve was, "Warning:1 variables of the community dataset (out of a total
of 367) are factors".
Please, someone should kindly help me to correct this so I can finish my work on time.
Thanks for your time.

Innocent Ndoh Mbue
Institute of Ecocolgy and
Environmental Sciences
s/c
International corporation office
China University of Geosciences
388 lumo road; 430074, Wuhan
Phone:0086 027 67885947/0086 13419615739
holyi@... 
Alt n.mholyi@... 
S.Q: meet her 
My Ans.:im schule

      
site	% Clay	%Sand	%silt	%Ca	%Mg	%K	%Na	%OC	% sat. bases	%OM	P (mg/Kg)	Effective
CEC	N(g/Kg)	pH-water	pH-Kcl	C/N	H + Al (meq/100g)	Elevation(m)
PF01	8	84	16	52.7	16.22	29.39	1.69	1.94	37	1.36	2.55	3.1	0.6	4.69	4.01	32	0.14	168
PF02	16	69	30	60.37	17.07	21.34	1.22	2.11	34	1.36	3.38	3.4	1.38	4.52	3.88	15	0.12	163
(Continue reading)

Picon
Favicon

Re: Academic assistance

HolyI,

To get help with something like this it is best to include the code 
along with a short description of what you are trying to do. Without 
that we are just guessing.

-Chris

HolyI N.M wrote:
> Greetings to all R Helpers out there.
> Please, I am new to this software.In attempt to analyse my data(here attached) I had the following problems:
> 1. spp richness could be calculated on seperate plots not on all.However, accumulation curve could be obtained;
> 2. No diversity index can be calculated
> 3. PCA, CCA, clustering, spatial distribution of spp. wrt environmental variables, spp. abundance, etc
could not be calculated;
> The complain I got which I could not solve was, "Warning:1 variables of the community dataset (out of a
total of 367) are factors".
> Please, someone should kindly help me to correct this so I can finish my work on time.
> Thanks for your time. 
>
>
> Innocent Ndoh Mbue
> Institute of Ecocolgy and
> Environmental Sciences
> s/c
> International corporation office
> China University of Geosciences
> 388 lumo road; 430074, Wuhan
> Phone:0086 027 67885947/0086 13419615739
> holyi@... 
(Continue reading)

Joe Simonis | 7 Nov 16:53
Picon
Favicon

Question About Syntax For Complex ANOVA Design

Hey Everyone,

    I'm helping a friend out with analyzing some of her data, and I 
haven't run an ANOVA like this in a while, and especially not in R.  I'm 
having a bit of trouble figuring out the correct syntax and so I was 
hoping to get feedback.  Any input would be welcomed.  As of now, I also 
don't have the data, but I've been told that sample size should be equal 
for all of the combinations (although that may not be true).  In any 
case, for now, let's assume all sample sizes are equal.

    The basics of the mensurative experiment are as follows:

    The study was looking at variation in physiological values (HSP) of 
intertidal mussels across a few different sites at three different times 
of year.  The sampling was done in New Zealand, with 2 sites sampled on 
each of the East and West coasts, and within each site, there were two 
sampling points (mussel bed location, MBL), one low in the intertidal 
one high in the intertidal.  There are two levels of MBL (low and high) 
at each site and there are two sites for each cost.  I see this as MBL 
nested in site, nested in coast.  However, it seems to me that only site 
is a random factor.  Both MBLs were picked specifically at that site and 
were done so in a way to compare high to low locations, so that seems 
fixed to me.  Site was picked more to look at site-to-site variation 
(i.e random factor).  And coasts were explicitly being compared (i.e. 
fixed factor). 

    So, I see that as a fixed factor nested in a random factor  nested 
in a fixed factor.  Does that make sense?  And then there's the bit 
about repeated measures, since they sampled mussels from each MBL 3 
times.  I don't think that necessarily complicates things too much, but 
(Continue reading)

Mike Dunbar | 7 Nov 17:27
Picon
Favicon

Re: Question About Syntax For Complex ANOVA Design

Hi Joe

I think the command you want is probably simpler than you think:

 lme(HSP~coast*MBL, random= ~1|site)     
or
 lme(HSP~coast+MBL, random= ~1|site)     

coast and MBL have distinct levels so are fixed and site is random as you say.
Having site as random will take into account that there are repeated measures through time at each point (MBL).
Each site has two points (MBLs). lme will treat coast and MBL correctly providing that they are coded
correctly. You can check this by running the analysis and looking at the degrees of freedom for the fixed
effects. It's best not to thing to hard about the structure in terms of fixed/random/fixed: even though
other stats packages might encourage this. Think of the random effects (including the error term) 
providing the structure and the fixed effects slotting into that structure accordingly.
If you write random = ~time|site then you are saying random slopes for the time fixed effect, i.e. there is an
overall time trend and each site responds differently around that trend. I don't think this is what you
want as you don't specifically mention time trends.

Or if time of year is a factor, something like
lme(HSP~coast+MBL+time, random= ~1|site)    
But the problem here is that you may run out of replication to estimate any of the fixed effects, each
combination of coast, MBL, site and time is unique. 

Also BUT BUT, even if you allow the three time samples to be replicates:
You are potentially going to have an issue if you only have four sites. This is not alot to estimate a random
effect. One option is to treat site as fixed. There is an argument that site is indeed random, so it should be
treated as random, in which case I'm not sure that lme (or the newer lmer) will handle the full uncertainty
for small sample sizes correctly). To do that would need a more fully Bayesian approach. But I'm writing
that from memory, I don't have the reference to hand. 
(Continue reading)

stephen sefick | 7 Nov 17:35
Picon

Re: Academic assistance

we need your data set up- read fake data with the problem built in...
Although my first guess would be that (3) one variable is a factor...
str(your.data.frame)  and see.  This may or may not solve all of your problems.

Stephen

On Fri, Nov 7, 2008 at 10:45 AM, Christian A. Parker <cparker@...> wrote:
> HolyI,
>
> To get help with something like this it is best to include the code along
> with a short description of what you are trying to do. Without that we are
> just guessing.
>
> -Chris
>
> HolyI N.M wrote:
>>
>> Greetings to all R Helpers out there.
>> Please, I am new to this software.In attempt to analyse my data(here
>> attached) I had the following problems:
>> 1. spp richness could be calculated on seperate plots not on all.However,
>> accumulation curve could be obtained;
>> 2. No diversity index can be calculated
>> 3. PCA, CCA, clustering, spatial distribution of spp. wrt environmental
>> variables, spp. abundance, etc could not be calculated;
>> The complain I got which I could not solve was, "Warning:1 variables of
>> the community dataset (out of a total of 367) are factors".
>> Please, someone should kindly help me to correct this so I can finish my
>> work on time.
>> Thanks for your time.
(Continue reading)

Joe Simonis | 7 Nov 20:13
Picon
Favicon

Re: Question About Syntax For Complex ANOVA Design

Hi Mike,

    Thanks for the response, hopefully we can get this thing figured 
out.  And I hope things will be easier than I think--that's usually a 
good thing!  However, I do wonder if I didn't explain myself fully (too 
often the case).  My apologies for not clearly stating the hypotheses.  
Basically, the question is 'does HSP differ between high and low 
intertidal locations, and does that relationship depend on the site, the 
coast, and the time of year?'  That can be broken up into a number of 
single-factor H0s. 

    I'm not a marine person, but I'm pretty sure that the high-low MBL 
difference is well understood, since mussels in the high intertidal are 
exposed more to temperature stress, so they should be producing more 
heat shock protein (HSP).  However, this relationship likely varies 
depending on a number of things that vary across sites (geology, tidal 
action, etc).  There is almost certainly a time signature in production 
related to climatic cycles, so that's why they sampled 3 times during 
the year.  So, I guess they're mostly interested in how the MBL 
difference changes across space and time. 

    So, yes, I do want to know if time trends are different at the 
different sites (sorry for not articulating that).  So, in that case, I 
should have 'time' in the random part of the equation, correct?

    As for the other terms, I am confused by the way you suggested 
including them in the model, especially with regards to their not being 
nested (in your formulation).  The sampling regime (MBL within site 
within coast) is nested, and so the data should be analyzed as such.  
 From the way I understand nestedness in R, I have to write it out as 
(Continue reading)

Mike Dunbar | 10 Nov 11:12
Picon
Favicon

Re: Question About Syntax For Complex ANOVA Design

Hi Joe

Is time a continuous variable or a factor?

The thing is that the terms ARE nested. The nesting is defined by the random effects structure. The fixed
effects slot into that. They way this happens is defined by the coding in the data. So I assume you have
something like (simplified):

site	coast	MBL	HSP
A	E	U	..
A	E	L	..
A	E	U	..
A	E	L	..
B	W	U	..
B	W	L	..
B	W	U	..
B	W	L	..

>From this it follows that coast is a site level covariate, in that each site can only have one coast. But MBL
is nested within site as the levels change within both A and B. lme is clever enough to work this out.

On the hypotheses, I'd be worried about the single factor H0's. Ignoring time as a fixed effect for a moment,
and hence treating the measures through time as replicates, you are basically interested in:

 lme(HSP~coast*MBL, random= ~1|site) 

The coast * ML term tests for HSP high/low dependent on coast. To test this fit the full model with method = ML
and compare it to  lme(HSP~coast+MBL, random= ~1|site, method ="ML") using anova(model1, model2).
There are alot of technical issues with testing both fixed and random effects in mixed models, for details
see past posts on the R-sig-ME list and also on the R Wiki
(Continue reading)

hadley wickham | 10 Nov 13:59
Picon
Gravatar

Re: Question About Syntax For Complex ANOVA Design

> The coast * ML term tests for HSP high/low dependent on coast. To test this fit the full model with method = ML
and compare it to  lme(HSP~coast+MBL, random= ~1|site, method ="ML") using anova(model1, model2).
There are alot of technical issues with testing both fixed and random effects in mixed models, for details
see past posts on the R-sig-ME list and also on the R Wiki
(http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests). But lets ignore those, this
should do OK.
>
> The if coast*ML is significant then no need to go any further. If it isn't then repeat from the coast+MBL,
deleting one of those fixed terms and repeating.

I thought that this was generally a bad idea.  You don't lose anything
by keeping the non-significant terms in the model, but if you drop
them out you can falsely inflate the significance of other terms.

Haley

--

-- 
http://had.co.nz/
Mike Dunbar | 10 Nov 16:22
Picon
Favicon

Re: Question About Syntax For Complex ANOVA Design

(apologies - I should have written coast * MBL not ML) 

I'm not sure of my ground here, but surely do lose something - you wouldn't retain coast:MBL if it's not
significant, as you lose degrees of freedom, and this gets worse the more terms and the more interactions
you consider. I think it's a different issue with the random effects, I can see a case for retaining a random
effect on design grounds even though it technically might not look significant, but I'm not so sure for
fixed effects. On that basis wouldn't we always be fitting the indecipherable A*B*C*D instead of
A+B+C+D, even if the additive effects are adequate?

Mike

>>> "hadley wickham" <h.wickham@...> 10/11/2008 12:59 >>>
> The coast * ML term tests for HSP high/low dependent on coast. To test this fit the full model with method = ML
and compare it to  lme(HSP~coast+MBL, random= ~1|site, method ="ML") using anova(model1, model2).
There are alot of technical issues with testing both fixed and random effects in mixed models, for details
see past posts on the R-sig-ME list and also on the R Wiki
(http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests). But lets ignore those, this
should do OK.
>
> The if coast*ML is significant then no need to go any further. If it isn't then repeat from the coast+MBL,
deleting one of those fixed terms and repeating.

I thought that this was generally a bad idea.  You don't lose anything
by keeping the non-significant terms in the model, but if you drop
them out you can falsely inflate the significance of other terms.

Haley

--

-- 
http://had.co.nz/
(Continue reading)


Gmane