James Hirschorn | 26 May 19:01 2016

data.table exact matching numeric keys

I have looked over the documentation and did not see an answer to this 
seemingly basic question:

I noticed that if I have a data.table with a key containing numeric 
values, then a close number is considered a match when searching. For 

 > dt[J(a)]
#     certainty probability
# 1: 0.8596491          -1
# 2: 0.8596491          -1

But in fact the two certainty values differ by 5.797585e-13.

Is there a way to only return exact matches, or set the epsilon value 
(to something like machine epsilon)?

Duncan Murdoch | 26 May 17:12 2016

Re: Shaded areas in R

On 26/05/2016 11:03 AM, Óscar Jiménez wrote:
> Hi Duncan,
> Thanks for the quick reply  :)
> Does the function Sys.Date return a time series (created with the 
> function POSIXct), with numerical values?

It returns a Date object.  The str() function will show you that. But 
Date, POSIXct, and even POSIXlt are supported by many of the graphics 
functions, so you don't need to worry about the conversions.

> I mean... I think the best option is to convert the time series 
> (plotted as characters), into numerical values, right?

No, that is not needed.

> Or is there any other function that allows me to draw the shade under 
> the curve using time series as the "x" variable?

polygon() does.

You should try things; R won't break.
> If you need more info I can share my script and my database and you 
> can see it properly.

You should always do that when you pose your question; you should also 
send your questions to R-help, not privately.  I've cc'd my response there.

Duncan Murdoch
(Continue reading)

Óscar Jiménez | 26 May 11:37 2016

Shaded areas in R


I'm working with R language, and plotting some parameters over time. I need
to draw a shaded area under the curve of eacj parameter.

For that, I might use the polygon (x,y) function, assigning coordinates
(x,y) to each vertex of my polygon. To do so, "x" and "y" must be vectors
with numerical values, but since my x-axis is a time series, I cannot
assing a numerical value to my "x" coordinate, because time variable is a
"character" variable.

Is there any option to use the function polygin (x,y) in this case, or any
other function that allows me to draw a shaded area under the curve on a
time series basis?

Thank you in advance for your help

Best regards


	[[alternative HTML version deleted]]

R-help <at> r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Sebastian Salentin | 26 May 11:49 2016

Segmentation Fault with large dataframes and packages using rJava

Dear all,

I have been trying to perform machine learning/feature selection tasks 
in R using various packages (e.g. mlr and FSelector).
However, when giving larger data frames as input for the functions, I 
get a segmentation fault (memory not mapped).

This happened first when using the mlr benchmark function with 
dataframes in the order of 200 rows x 10,000 columns (all integer values).

I prepared a minimal working example where I get a segmentation fault 
trying to calculate the information gain with FSelector package.

# Random dataframe 200 rows * 25,000 cols
large.df <- data.frame(replicate(25000,sample(0:1,200,rep=TRUE)))
weights <- information.gain(X24978~., large.df)

I am using R version 3.3.0 64-bit on Ubuntu 14.04.4 LTS with FSelector 
v0.20 and rJava v0.9.8 on a machine with 32 core Intel i7 and 250 GB 
Ram. Java is OpenJDK 1.7 74bit.

I would highly appreciate if you could give me any hint on how to solve 
the problem.



(Continue reading)

Bert Gunter | 26 May 13:24 2016

Re: R help- fit distribution "fitdistr"


On Thursday, May 26, 2016, Jessica Wang <25695076 <at> qq.com> wrote:

> Hello, I just start using R. I want to use “fitdistr” to fit distribution
> of the data. Then how can I verify if the data really fit the distribution?
> Thanks [data is attached]
> You can't. There are many ways to judge/quantify the **quality** of the
fit, but your question indicates a complete misunderstanding of basic
statistical concepts (imo of course) . I suggest you spend time with a
local statistical resource, take some courses, do some reading, etc.


> res<-fitdistr(data$Report.delay, "Poisson")
> h<-hist(data$Report.delay)
> xfit<-floor(seq(0, 250, 50))
> yfit<-dpois(xfit,res[[1]][1])
> yfit<-yfit*diff(h$mids[1:2])*length(xfit)
> lines(xfit, yfit, col="blue", lwd=2)
> ______________________________________________
> R-help <at> r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
(Continue reading)

KMNanus | 26 May 00:37 2016

Computing means of multiple variables based on a condition

I have a large dataset, a sample of which is:

a<- c(“A”, “B”,“A”, “B”,“A”, “B”,“A”, “B”,“A”, “B”)
b <-c(15, 35, 20,  99, 75, 64, 33, 78, 45, 20)
c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179)
d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3) 

df <- data.frame(a,b,c,d)

I’m trying to construct a data frame that shows the means of c & b based on the condition of d and grouped by a.

I want to create the data frame below, then use ggplot2 to create a line plot of b at various conditions of d.

I can compute the grouped means (d>=2, d>=4, etc.) one at a time using dplyr but haven’t figured out how to
put them all together or put them in one data frame.

I’d rather not use a loop and am relatively new to R.  Is there a way i can use tapply and set it to the
conditions above so that I can create the df below?

        condition    mean(b)     mean(c)    
A        d>=2          ____         _____
B        d>=2          ____         _____
A        d>=4          ____         _____
B        d>=4         ____         _____
A        d>=6         ____         _____
B       d>=6         ____         _____

kmnanus <at> gmail.com
914-450-0816 (tel)
(Continue reading)

Jeff Newmiller | 26 May 01:23 2016

Re: mixed models

Please keep the mailing list in the loop by using reply-all.

I don't think there is a requirement that the number of levels is equal, but there may be problems if you don't
have the minimum number of records corresponding to each combination of levels specified in your model. 

You can change the csv extension to txt and attach for the mailing list. Or, better yet, you can use the dput
function to embed the data directly in your sample code. 

Also,  please learn to post plain text email to avoid corruption of R code by the HTML formatting. 
Sent from my phone. Please excuse my brevity.

On May 25, 2016 2:26:54 PM PDT, James Henson <jfhenson1 <at> gmail.com> wrote:
>Good afternoon Jeff,
>The sample sizes for levels of the factor "Irrigation" are not equal.
>'nlme' requires equal sample sizes this may be the problem. The same
>frame runs in 'lme4' without a problem.
>Best regards,
>On Wed, May 25, 2016 at 3:41 PM, James Henson <jfhenson1 <at> gmail.com>
>> Good afternoon Jeff,
>> When working with this data frame, I just open the .csv file in R
(Continue reading)

Diego Cuellar | 25 May 22:32 2016

svymean in multistage desing

Thanks for your attention. I have been using  your R library, survey, I
made an example for two stage sampling  SI – SI, an estimate the total and
the mean,  (the point estimation y de SE)

and also, I remaking the estimation.

For the total, point estimation and estimation of the variance, we have
exactly the same number. But for the mean, the estimated sample variance I
found different number.

I have been using  the formula   8.6.6 of Särdall,  -Model Assisted Survey
Sampling-.  That for the  case of  Simple Random sample without replacement
is equal, to the Taylor approximation for two stage sample, Example 5.6.3
of Särdall,  -Model Assisted Survey Sampling-

Can you please help me telling me which one is the formula that function
svymean or svratio are using.

I copy the code.

mydata <- read.table( text =

                               "id UPS str_UPS USS str_USS hou85 ue91 lab91
clu_1 clu_2 uno

             3  1 1 1a 1 9230  1623  13727 62 95  1

             4  1 1 2a 1 4896  760   5919  62 95  1

             5  1 1 3a 1 4264  767   5823  62 95  1
(Continue reading)

alicekalkuhl | 25 May 18:56 2016

strange error

Hello everyone,
almost every time I try to plot something R gives me the following mistake:
Error in plot.new() : figure margins too large
One example would be, when I tried to run a function, somebody published to create a Lorenz Attractor:
parameters <- c(s = 10, r = 28, b = 8/3) state <- c(X = 0, Y = 1, Z = 1)
Lorenz <- function(t, state, parameters) {   with(as.list(c(state, parameters)), {     dX <- s * (Y - X)     dY <- X *
(r - Z) - Y     dZ <- X * Y - b * Z     list(c(dX, dY, dZ))   }) }
times <- seq(0, 50, by = 0.01) library(deSolve) out <- ode(y = state, times = times, func = Lorenz, parms = parameters)
par(oma = c(0, 0, 3, 0)) plot(out, xlab = "time", ylab = "-") plot(out[, "Y"], out[, "Z"], pch = ".", type =
"l") mtext(outer = TRUE, side = 3, "Lorenz model", cex = 1.5)
It turns out to be really problematic, because there are barely functions I can plot.
My version of RStudio is R version 3.2.3 (2015-12-10) -- "Wooden Christmas Tree" and my computer uses
Windows 8.1.
Would it be possible to avoid the problem by using Windows 10?
Or is there anything else I can do?
Thank you in advance,
Alice de Sampaio Kalkuhl

Alle Postfächer an einem Ort. Jetzt wechseln und E-Mail-Adresse mitnehmen! Rundum glücklich mit freenetMail

	[[alternative HTML version deleted]]

R-help <at> r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
(Continue reading)

James Henson | 25 May 20:59 2016

mixed models

Greetings R community,

My aim is to analyze a mixed-effects model with temporal pseudo-replication
(repeated measures on the same experimental unit) using ‘nlme’.  However,
my code returns the error message “Error in na.fail.default’, even though
the data frame does not contain missing values. My code is below, and the
data file is attached as ‘Eboni2.txt.




model1 <- lme(preDawn ~ Irrigation, random=~season_order|treeNo,

I am genuinely confused.  Hope someone can help.

Best regards,

James F. Henson
number	Location	Season	season_order	Month	treeID	treeNo	preDawn	midday	Irrigation	Pnet	Gs	E	WUE	d15N	d13C 	Nper	Cper	include2
1	UCC	November	5	Nov	UCCLO 1	60	1.4	1.3	N	9	0.290700373	3.766207481	2.38967185					no
2	UCC	November	5	Nov	UCCLO 2	72	1.2	1.3	N	11	0.326258186	3.120573618	3.524992949					no
3	UCC	November	5	Nov	UCCLO 3	78	1.1	1.2	N	8	0.287095701	1.693820753	4.723049937	3	-27.44	2.12	52.12	yes
4	UCC	November	5	Nov	UCCLO 4	79	1.1	2.1	N	10	0.247517983	1.83934285	5.436724317	3.61	-29.5	1.42	51.97	yes
5	UCC	November	5	Nov	UCCLO 5	80	1.4	1.3	N	13	0.300922817	3.082277827	4.217660032					no
(Continue reading)

C W | 25 May 18:33 2016

What are some toy models I can use in R?

Hi everyone,

I am searching for some toy models in R.  My goal is do to model checking.

For example,

My data come from statistical model N(5, 2), with n=100, call this model_1
Then, I add bias to that data with N(3, 1), with n=100, call this model_2

Ultimately, I want to see model_1+ model_2 gives good prediction or perhaps
parameter estimation.

I think this is a pretty standard statistical analysis problem?

How do people on this list deal with it?  Any suggestions?

Thank you,


	[[alternative HTML version deleted]]