John Lewis | 4 May 16:20 2016

Fwd: R-sig-Geo Digest, Vol 153, Issue 4

I think there is a more fundamental problem that you should consider. The
use of p values and set significant levels are seriously being questioned
as good statistical modelling practises. You might want to look at the
recent policy statement by the American Statistical Association on this
topic. There is even a strong movement to convince editors not to review
papers which base the results on the above methods or at least advise the
use of different procedures.
If you are interested in reading this policy statement I would be happy to
send a pdf copy.
John Lewis


Message: 1
Date: Tue, 3 May 2016 23:11:29 +0000 (UTC)
From: "Thiago V. dos Santos" <thi_veloso <at>>
To: R-sig-geo Mailing List <r-sig-geo <at>>
Subject: [R-sig-Geo] Mask a map using statistical significance
        < <at>>
Content-Type: text/plain; charset=UTF-8

Dear all,

In climate studies, it is a common practice to perform some statistical
test between two fields (maps), and then plot the resulting map using a
significance mask. This masking is usually done by adding some kind of
pattern (shading, crosshatching etc) on top of the actual color palette.
(Continue reading)

Mask a map using statistical significance

Dear all,

In climate studies, it is a common practice to perform some statistical test between two fields (maps), and
then plot the resulting map using a significance mask. This masking is usually done by adding some kind of
pattern (shading, crosshatching etc) on top of the actual color palette.

Examples can be seen here in this image and in the left
images of this panel:
In my case, I ran a statistical test for detecting trend on a time-series raster and I now have one raster with
the trend for rainfall (in degree C per year) and one with the p-values associated to the test.

My data looks roughly like this:


## scratch raster objects and fill them with some random values
r.trend <- raster(nrows=50, ncols=50, xmn=-58, xmx=-48, ymn=-33, ymx=-22)
r.trend[] <- round(runif(50 * 50, min=0, max=3), digits=2)

r.pvalue <- raster(nrows=50, ncols=50, xmn=-58, xmx=-48, ymn=-33, ymx=-22)
r.pvalue[] <- round(runif(50 * 50, min=0.0001, max=0.1), digits=5)

What I would like to do is to plot r.trend, and on top of it plot r.pvalue (as a pattern) where r.pvalues < 0.01
(corresponding to a significance level of 99%).

Has anyone here ever tried to do a similar plot and can point me to any direction?

I usually use rasterVis and ggplot2 to plot maps, but I would be open to try some other package and even other
(Continue reading)

Juta Kawalerowicz | 3 May 10:30 2016

Re: spatialpoints: each dot represents 100 individuals?

Thanks for all suggestions! I was thinking that in the end a plasible
approach would be to sample points, say if I have 100,000 and sample 1,000
then I can say that each point represents 100 people.


On Wed, Apr 27, 2016 at 11:13 PM, rubenfcasal <rubenfcasal <at>> wrote:

> Alternatively, you might also consider data binning (implemented in
> several packages: KernSmooth, ks, sm, npsp ,...). This technique is
> commonly used in nonparametric statistics to reduce the computational
> time (see e.g. Wand, M. P. (1994), Fast Computation of Multivariate
> Kernel Estimators, Journal of Computational and Graphical Statistics, 3,
> 433-445).
> For instance, using the npsp package (maintained by me...), you could do
> something like this:
> library(npsp)
> bin <- binning(earthquakes[, c("lon", "lat")], nbin = c(50,50))
> # ‘bin$binw’ will contain the binning weights (aggregations) at
> locations ‘coords(bin)’
> simage(bin)
> Additionally, you could estimate (nonparametrically) the spatial density:
> h <-, ncv = 2)$h
(Continue reading)

Hakim Abdi | 2 May 23:11 2016

MODIS package's runGdal() returns error: dataFormat='GTiff', format not supported

Hello everyone,

I'm having problems with the MODIS package, specifically, the runGdal()
command after the recent update to R 3.2.5. The error I get is this:

Error in runGdal(product = product, begin = dates$beginDOY, end =
dates$endDOY, : in argument dataFormat='GTiff', format not supported by
GDAL type: 'gdalWriteDriver()' (column 'name') to list available inputs

I'm not quite sure why this is happening, and I haven't found a solution
online. Does anyone have an idea? The package was working fine before I
upgraded R from 3.1.1. to 3.2.5, QGIS from 2.12.3 to 2.14.1 and RStudio to
version 0.99.467.

My details are below, thanks for the assistance:

R version 3.2.5 (2016-04-14) -- "Very, Very Secure Dishes"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
(Continue reading)

Andy Bunn | 2 May 21:52 2016

Teaching example of autocorrelated errors affecting interpretation of OLS

HI all, thanks to all those to pointed me towards good environmental data sets for teaching spatial stats in
R. We are plugging along on point patterns this week.

This next query might be a bit of stretch but here goes.

This class I'm teaching is made up of master's students who are from a variety of environmental fields
(oceanography to toxicology to plant ecology). It's a fun group. A few of them get the gospel of thinking
about space in terms of how pattern drives process and some learn to appreciate a spatial perspective
because it is just a worthwhile thing in and of itself.

However, a lot of the students just want to make sure that spatial autocorrelation isn't breaking their
regressions. Many of them are doing some kind of regression analysis in their thesis work and are worried
about spatial autocorrelation violating the regression assumptions (via non iid errors). I have them
read (in order):

  1.  Hawkins et al. 2007 (DOI: 10.1111/j.0906-7590.2007.05117.x)
  2.  Hawkins 2012 (DOI: 10.1111/j.1365-2699.2011.02637.x)
  3.  Kuhn & Dormann 2012 (DOI: 10.1111/j.1365-2699.2012.02716.x)

This both ameliorates some of their worries and worries them more. I also show them via simulation where
autocorrelation can lead to trouble.   E.g., I have an example where I simulate a SAR process with varying
levels of autocorrelation and show them how an OLS model of y~x with spatially autocorrelated residuals 
gives an inefficient estimate of beta. (You do need very high levels of autocorrelation to do this I note.)

What would be better would be to show them a worked example where autocorrelation has led to incorrect
interpretation of some ecological process. Do any of you know of a case study like this? Something along
the lines of "Smith et al thought Y was modeled well by X but when you consider  the spatial structure of the
residuals it turns out that their model was interpreted incorrectly."

By the end of the course I want to push more of them over to appreciating spatial analysis for its own sake but
(Continue reading)

Kaushik Jana | 2 May 10:03 2016

How the "gSymdifference" of rgoes package calculating the symmetric distance between two polygon

Dear all,

I want to calculate the symmetric distance between two polygon. I learned
that "gSymdifference" and "gArea" of "rgoes" can be used for that purpose.
But I am not getting how this difference is calculated (what are R-codes).

The code I am getting by straight forward typing the "gSymdifference" is
giving me:
function (spgeom1, spgeom2, byid = FALSE, id = NULL, drop_lower_td = FALSE,
unaryUnion_if_byid_false = TRUE, checkValidity = FALSE)
if (checkValidity) {
val1 <- gIsValid(spgeom1)
val2 <- gIsValid(spgeom2)
if (!val1)
message(deparse(substitute(spgeom1)), " is invalid")
if (!val2)
message(deparse(substitute(spgeom2)), " is invalid")
if (!all(c(val1, val2)))
stop("Invalid objects found")
return(RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td,
unaryUnion_if_byid_false, "rgeos_symdifference"))
<environment: namespace:rgeos>

The above is not helping for this purpose. Where from I can get the codes
of the functions and get to know how the "gSymdifference" and "gArea" is
calculating the difference?

(Continue reading)

Miluji Sb | 1 May 17:30 2016

Aggregate FIPS data to State and Census divisions

Dear all,

I have the following data by US FIPS code. Is there a package to aggregate
the data by State and Census divisions?

temp <- dput(head(pop1,5))
structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009"
), death_2050A1 = c(18.19158, 101.63088, 13.18896, 10.30068,
131.91798), death_2050A2 = c(22.16349, 116.58387, 15.85324, 12.78564,
155.20506), death_2050B1 = c(21.38906, 76.23018, 21.38218, 17.14269,
151.64466), death_2050B2 = c(23.43543, 81.39378, 22.96802, 18.76926,
161.86404), death_2050BC = c(21.89947, 93.88002, 18.60352, 15.1032,
152.43414)), .Names = c("FIPS", "death_2050A1", "death_2050A2",
"death_2050B1", "death_2050B2", "death_2050BC"), row.names = c(NA,
5L), class = "data.frame")

Thank you!



	[[alternative HTML version deleted]]
Gonzalez-Mirelis Genoveva | 30 Apr 12:32 2016

problem with predict() in package raster and factor variables

Dear list,
I am trying to use the function predict() (in package raster), where I supply: the new data as a RasterBrick,
the model (as fit in previous steps and using a different dataset), and a few more arguments including the
levels of my only one categorical value. Here is the code I'm using:

 r1 <- predict(subbrick,
              type="response", OOB=T, factors=f)

But I keep getting the following error:

Error in checkData(oldData, RET) :

  Classes of new data do not match original data

Here are more details:


         Random Forest using Conditional Inference Trees

Number of trees:  1000

Response:  PA
Inputs:  bathy20_1, TerClass, Smax_ann, Smean_ann, Smin_ann, SPDmax_ann, SPDmean_ann, Tmax_ann,
Tmean_ann, Tmin_ann
Number of observations:  986

Where 'TerClass' is a categorical variable.

(Continue reading)

Isaque Daniel | 28 Apr 18:34 2016

How can I apply function using stackApply or Calc, and save the results as raster object in R?

Hi everyone, 
I have been working in a list of function to process MODIS vegetation
 index timeseries and return crop cycle paramenters. It's for PhD thesis
 and I'll process all South America, so I need to use all forms to 
reduce the time to process.

I found the rasterEngine from the package to use 
paralell processing to speed up the process. but, before that I prepare 
some of the functions to compute by pixel over the raster stack my 
variables measures. 

I develop the functions which will produce 7 different outputs, I 
tried tu use my function "CropAnalysis" to compute by each pixel, in the
 code in the post I try to save a raster brick with 2 layers (each one 
with one of variables produced by the function "CropAnalysis"). 

My code dont save the results in two raster layers. 

Attached are the data (one small portion) and the code, any idea? 

My data:
Modis stack

# My code

# loading the data 
limit <- 3000 # minimum value betweem maximum and minimum to be crop
(Continue reading)

Vinh Nguyen | 26 Apr 20:18 2016

best practice for reading large shapefiles?


I have a very large shapefile that I would like to read into R
(dbf=5.6gb and shp=2.3gb).

For reference, I downloaded the 30 shapefiles of the [Public Land
Survey System](
and combined them into a single national file via gdal (ogr2ogr) as
described [here](;
I originally attempted to combine the files in R as described
but ran out of memory about 80% in, but luckily discovered ogr2ogr.

I'm reading in the combined file in R via readOGR, and it's been over
an hour and R appears to hang.  When I check the task manager, the R
session currently consumes <10% CPU and 245MB.  Not sure if any
productive activity is going on, so I'm just waiting it out.
thread describes that readOGR can be slow for large shapefiles, and
suggested that the SpatialDataFrame be saved in an R format.  My
problem is getting the entire shapefile read in the first place before
I could save it as an R object.

Does anyone have any suggestions for reading this large shapefile into
R?  Thank you for your help.

-- Vinh
Adrian Baddeley | 26 Apr 04:36 2016

Re: [R-sig-geo] Data sets for teaching spatial analysis in R

Andy Bunn <Andy.Bunn <at>> writes:

> I'd love to hear about data sets that some of you might use to teach
>spatial in R. 

For spatial point patterns, the Œspatstat' package contains over 50
datasets, with documentation and examples. They are also presented and
discussed in the Œspatstat book¹ <

Adrian Baddeley