Tim Sippel | 2 Oct 05:43
Picon
Gravatar

simm.levy in adehabitat

Hi-
I'm trying to use the simm.levy function in the package adehabitat(vers.
1.7, R version 2.7.2) to simulate Levy walks of an ltraj object (time
stamped GPS coordinates of a possum).  I'm having trouble with using the id
and burst of a given animal as the id and burst arguments of the simm.levy
function.  Below is a summary of my ltraj object and I have (object is named
'p.sett').

> p.sett
*********** List of class ltraj ***********
Type of the traject: Type II (time recorded)
Regular traject. Time lag between two locs: 900 seconds
Characteristics of the bursts:
       id      burst nb.reloc  NAs          date.begin            date.end
1   p1801 1988038937        6    1 2006-08-09 09:00:00 2006-08-09 10:15:00
2   p1801 1988038938        3    0 2006-08-10 02:00:00 2006-08-10 02:30:00
3   p1801 1988038939        9    1 2006-08-11 03:00:00 2006-08-11 05:00:00
4   p1801 1988038940       17    5 2006-08-12 02:00:00 2006-08-12 06:00:00
5   p1801 1988038941       23   10 2006-08-13 02:00:00 2006-08-13 07:30:00
...
237 p1801 1988039304      126  110 2007-08-10 01:15:00 2007-08-11 08:30:00
238 p1801 1988039305        1    0 2007-08-11 02:45:00 2007-08-11 02:45:00

Ideally what I'd like to do is define the simulation using the
characteristics of a given 'burst' from my ltraj object. I'd appreciate some
pointers on the syntax of calling the simulation using the the
characteristics of each row in p.sett. Sample code follows:

> sim.levy<-simm.levy(date=1:126, id=p.sett$id=="1801",
burst=p.sett$burst=="1988039304", x0=c(174.48255,-36.819058), typeII=T)
(Continue reading)

Clément Calenge | 2 Oct 10:57
Picon
Favicon

Re: simm.levy in adehabitat

Hi Tim,

>> p.sett
>>     
> *********** List of class ltraj ***********
> Type of the traject: Type II (time recorded)
> Regular traject. Time lag between two locs: 900 seconds
> Characteristics of the bursts:
>        id      burst nb.reloc  NAs          date.begin            date.end
> 1   p1801 1988038937        6    1 2006-08-09 09:00:00 2006-08-09 10:15:00
> 2   p1801 1988038938        3    0 2006-08-10 02:00:00 2006-08-10 02:30:00
> 3   p1801 1988038939        9    1 2006-08-11 03:00:00 2006-08-11 05:00:00
> 4   p1801 1988038940       17    5 2006-08-12 02:00:00 2006-08-12 06:00:00
> 5   p1801 1988038941       23   10 2006-08-13 02:00:00 2006-08-13 07:30:00
> ...
> 237 p1801 1988039304      126  110 2007-08-10 01:15:00 2007-08-11 08:30:00
> 238 p1801 1988039305        1    0 2007-08-11 02:45:00 2007-08-11 02:45:00
>
> Ideally what I'd like to do is define the simulation using the
> characteristics of a given 'burst' from my ltraj object. I'd appreciate some
> pointers on the syntax of calling the simulation using the the
> characteristics of each row in p.sett. Sample code follows:
>
>   
>> sim.levy<-simm.levy(date=1:126, id=p.sett$id=="1801",
>>     
> burst=p.sett$burst=="1988039304", x0=c(174.48255,-36.819058), typeII=T)
> Error in as.ltraj(data.frame(co, si), date, id, burst, typeII = typeII) :
>   id should be of the same length as xy, or of length 1
>
(Continue reading)

Lisa Mandle | 3 Oct 00:27
Favicon

Bias adjusted confidence intervals with popbio

Hello,

Does anyone know if it is possible to generate bias adjusted confidence
intervals for lambda from bootstrapped transition matrices within the popbio
package?

Any help or advice would be appreciated!

Thank you,

Lisa Mandle
Graduate Student
Botany Deparment
University of Hawaii at Manoa

	[[alternative HTML version deleted]]
ONKELINX, Thierry | 7 Oct 12:12
Picon
Favicon

Clustering large data

Dear all,

We have a problem with a large dataset that we want to cluster. The
dataset is in a long format: 1154024 rows with presence data. Each row
has the name of the species and the location. We have 1381 species and
6354 locations.
The main problem is that we need the data in wide format (one row for
each location, one column for each species) for the clustering
algorithms. But the 6354 x 1381 dataframe is too big to fit into the
memory. At least when we use cast from the reshape package to convert
the dataframe from a long to a wide format.

Are there any clustering tools available that can work with the data in
a long format or with sparse matrices (only 13% of the matrix is
non-zero)? If the work with sparse matrices: how to convert our dataset
to a sparse matrix? Other suggestions are welcome.

We are working with R 2.7.2 on WinXP with 2 GB RAM. --max-mem-size is
set to 2047M.

Thanks,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
(Continue reading)

tyler | 7 Oct 14:35
Picon

Re: Clustering large data

"ONKELINX, Thierry" <Thierry.ONKELINX@...>
writes:

> Dear all,
>
> We have a problem with a large dataset that we want to cluster. The
> dataset is in a long format: 1154024 rows with presence data. Each row
> has the name of the species and the location. We have 1381 species and
> 6354 locations.
> The main problem is that we need the data in wide format (one row for
> each location, one column for each species) for the clustering
> algorithms. But the 6354 x 1381 dataframe is too big to fit into the
> memory. At least when we use cast from the reshape package to convert
> the dataframe from a long to a wide format.
>
> Are there any clustering tools available that can work with the data in
> a long format or with sparse matrices (only 13% of the matrix is
> non-zero)? If the work with sparse matrices: how to convert our dataset
> to a sparse matrix? Other suggestions are welcome.
>

6354 x 1381 should be well within your memory limit, so I assume it's
the intermediate steps that are fouling you up. Maybe you can do it in
pieces: 

1. subset the original two-column matrix to include only the first 100 sites
2. convert this subset to wide form
3. repeat 63 times for different subsets
4. rbind the resulting matrices

(Continue reading)

Peter Solymos | 7 Oct 15:50
Picon

Re: Clustering large data

Dear Thierry,

the 'mefa' package should do this, and I am also interested in the
testing of the package for such a large number of species. I have used
it before with 75K records, but only with ~160 species and 1052 sites.
So please let me know if it worked!

You can do the clustering like this (SAMPLES and SPECIES are the two
column in the long format, have to be the same length):

x <- mefa(stcs(data.frame(SAMPLES,SPECIES)))
cl <- hclust(dist(x$xtab))

Hope this works,

Peter

Peter Solymos, PhD
Department of Mathematical and Statistical Sciences
University of Alberta
Edmonton, Alberta, T6G 2G1
CANADA

On Tue, Oct 7, 2008 at 4:12 AM, ONKELINX, Thierry
<Thierry.ONKELINX@...> wrote:
> Dear all,
>
> We have a problem with a large dataset that we want to cluster. The
> dataset is in a long format: 1154024 rows with presence data. Each row
> has the name of the species and the location. We have 1381 species and
(Continue reading)

Farrar.David | 7 Oct 15:56
Picon

Re: Clustering large data

Thierry, 

 Search of CRAN with "sparse clustering" yielded cluster.dist {cba}, 
defined as "Clustering a Sparse Symmetric Distance Matrix".  There were 
also sparse PCA packages and sparse matrix classes.  I have no experience 
with these procedures. 

As additional background, you might like to say what kind of clustering 
you want to do and whether some particular similarity/distance will be 
involved. 
Does your cluster analysis program take a data frame as input? 

However, it sounds like you are having problems with preliminary data 
processing, and may not yet know whether some cluster analysis procedure 
or other would choke on your matrix, once it is computed. 

It does seem surprising that you are having problems with a problem of 
this size.  I assume you have checked that you have a couple G or so free, 
at least. 

Farrar 

r-sig-ecology-bounces@... wrote on 10/07/2008 08:35:39 AM:

> "ONKELINX, Thierry" <Thierry.ONKELINX@...>
> writes:
> 
> > Dear all,
> >
> > We have a problem with a large dataset that we want to cluster. The
(Continue reading)

Picon
Favicon

Re: Clustering large data

This method for converting long to wide format seems to work well with 
pretty large datasets and it uses only base functions.

# this function will return a site*species matrix
# based on the formula variable. Data does not need 
# to be grouped, the xtabs function will take care of
# summing any rows that are equal according to the 
# formula.
### units are the cell value
### site is the row value
### spp is the column value
matrify<-function(datatable, formula = units~site+spp, relativize=F){
  tbl<-xtabs(formula,data=datatable)
  mx<-matrix(tbl,ncol=ncol(tbl))
  colnames(mx)<-colnames(tbl)
  rownames(mx)<-rownames(tbl)
  if (relativize==T) {mx<-mx/rowSums(mx)}
  return(mx)
}

ONKELINX, Thierry wrote:
> Dear all,
>
> We have a problem with a large dataset that we want to cluster. The
> dataset is in a long format: 1154024 rows with presence data. Each row
> has the name of the species and the location. We have 1381 species and
> 6354 locations.
> The main problem is that we need the data in wide format (one row for
> each location, one column for each species) for the clustering
> algorithms. But the 6354 x 1381 dataframe is too big to fit into the
(Continue reading)

Farrar.David | 7 Oct 17:22
Picon

Re: Clustering large data

Thanks for the illustration of xtabs. 

A quibble: Doesn't the following work, substituting as.matrix() for 
matrix()? 
(Does seem to conserve the dimensions and dimension names.) 

matrify<-function(datatable, formula = units~site+spp, relativize=F){
  tbl<-xtabs(formula,data=datatable)
  mx <-as.matrix(tbl)
  if (relativize==T) {mx<-mx/rowSums(mx)}
  return(mx)
}

"Christian A. Parker" <cparker@...> 
Sent by: r-sig-ecology-bounces@...
10/07/2008 11:04 AM

To
"ONKELINX, Thierry" <Thierry.ONKELINX@...>
cc
r-sig-ecology@...
Subject
Re: [R-sig-eco] Clustering large data

This method for converting long to wide format seems to work well with 
pretty large datasets and it uses only base functions.

# this function will return a site*species matrix
# based on the formula variable. Data does not need 
# to be grouped, the xtabs function will take care of
(Continue reading)

Brian Campbell | 7 Oct 19:18
Picon

Re: Clustering large data


I've recently been engaged in some exploratory data analysis also involving cluster analysis, albeit on a
much smaller dataset.  There are quite a few packages (e.g. ecodist(), vegan(), pvclust()) that include
functions for undertaking cluster analysis, but have you, or anyone else on here looked at alternative
clustering methods with bootstrap permutation tests of the nodes?  I've done this with pvclust but I don't
seem to recall this function including an argument for method="average").

Brian

> To: tyler.smith@...
> From: Farrar.David@...
> Date: Tue, 7 Oct 2008 09:56:15 -0400
> CC: r-sig-ecology-bounces@...; r-sig-ecology@...
> Subject: Re: [R-sig-eco] Clustering large data
> 
> Thierry, 
> 
>  Search of CRAN with "sparse clustering" yielded cluster.dist {cba}, 
> defined as "Clustering a Sparse Symmetric Distance Matrix".  There were 
> also sparse PCA packages and sparse matrix classes.  I have no experience 
> with these procedures. 
> 
> As additional background, you might like to say what kind of clustering 
> you want to do and whether some particular similarity/distance will be 
> involved. 
> Does your cluster analysis program take a data frame as input? 
> 
> However, it sounds like you are having problems with preliminary data 
> processing, and may not yet know whether some cluster analysis procedure 
> or other would choke on your matrix, once it is computed. 
(Continue reading)


Gmane