sezgin ozcan | 10 Feb 04:20
Picon
Gravatar

Importing a CSV file

I have been trying to import a csv file to r. but I get the same message everytime. the message is

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such file or directory

I use mac.
I tried this command also
a<-read.table("clipboard",sep=”\t”,row.names=1,header=T)
Error: unexpected input in "a<-read.table("clipboard",sep=‚"

I will appreciate if you help me before I get crazy.
thank you

Abraham Mathew | 10 Feb 00:58
Picon
Gravatar

Finding all the coefficients for a logit model

Let's say I have a variable, day, which is saved as a factor with 7 levels,
and I use it in a
logistic regression model. I ran the model using the car package in R and
printed out the
results.

mod1 = glm(factor(status1) ~ factor(day), data=mydat,
family=binomial(link="logit"))
print(summary(mod1))

The result I get is:

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)
(Intercept)           -0.4350     0.0379  -11.48   <2e-16 ***
factor(day)Monday     -0.6072     0.0479  -12.69   <2e-16 ***
factor(day)Saturday    0.5964     0.0559   10.67   <2e-16 ***
factor(day)Sunday      1.1140     0.0627   17.78   <2e-16 ***
factor(day)Thursday   -0.4492     0.0516   -8.71   <2e-16 ***
factor(day)Tuesday    -0.9331     0.0496  -18.82   <2e-16 ***
factor(day)Wednesday  -0.8575     0.0486  -17.63   <2e-16 ***

It seems that Friday is being used as the baseline, but I want to know

how I can acquire the coefficient for the baseline (friday)?

I ran mod1$coefficients, but that didn't do the trick.

Can anyone help.

(Continue reading)

FU-WEN LIANG | 9 Feb 23:56
Picon
Gravatar

Constraint on one of parameters.

Dear all,

I have a function to optimize for a set of parameters and want to set a
constraint on only one parameter. Here is my function. What I want to do is
estimate the parameters of a bivariate normal distribution where the
correlation has to be between -1 and 1. Would you please advise how to
revise it?

ex=function(s,prob,theta1,theta,xa,xb,xc,xd,t,delta) {

expo1= exp(s[3]*xa+s[4]*xb+s[5]*xc+s[6]*xd)
expo2= exp(s[9]*xa+s[10]*xb+s[11]*xc+s[12]*xd)
expo3= exp(s[15]*xa+s[16]*xb+s[17]*xc+s[18]*xd)
expo4= exp(s[21]*xa+s[22]*xb+s[23]*xc+s[24]*xd)
expo5= exp(s[27]*xa+s[28]*xb+s[29]*xc+s[30]*xd)

nume1=prob[1]*(s[2]^-s[1]*s[1]*t^(s[1]-1)*expo1)^delta*exp(-s[2]^-s[1]*t^s[1]*expo1)*
theta1[1]^xa*(1-theta1[1])^(1-xa)*theta1[2]^xb*(1-theta1[2])^(1-xb)*(1+theta1[11]*(xa-theta1[1])*(xb-theta1[2])/sqrt(theta1[1]*(1-theta1[1]))/sqrt(theta1[2]*(1-theta1[2])))/
(2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4]))

nume2=prob[2]*(s[8]^-s[7]*s[7]*t^(s[7]-1)*expo2)^delta*exp(-s[8]^-s[7]*t^s[7]*expo2)*
theta1[3]^xa*(1-theta1[3])^(1-xa)*theta1[4]^xb*(1-theta1[4])^(1-xb)*(1+theta1[11]*(xa-theta1[3])*(xb-theta1[4])/sqrt(theta1[3]*(1-theta1[3]))/sqrt(theta1[4]*(1-theta1[4])))/
(2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8]))

nume3=prob[3]*(s[14]^-s[13]*s[13]*t^(s[13]-1)*expo3)^delta*exp(-s[14]^-s[13]*t^s[13]*expo3)*
theta1[5]^xa*(1-theta1[5])^(1-xa)*theta1[6]^xb*(1-theta1[6])^(1-xb)*(1+theta1[11]*(xa-theta1[5])*(xb-theta1[6])/sqrt(theta1[5]*(1-theta1[5]))/sqrt(theta1[6]*(1-theta1[6])))/
(2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12]))

nume4=prob[4]*(s[20]^-s[19]*s[19]*t^(s[19]-1)*expo4)^delta*exp(-s[20]^-s[19]*t^s[19]*expo4)*
theta1[7]^xa*(1-theta1[7])^(1-xa)*theta1[8]^xb*(1-theta1[8])^(1-xb)*(1+theta1[11]*(xa-theta1[7])*(xb-theta1[8])/sqrt(theta1[7]*(1-theta1[7]))/sqrt(theta1[8]*(1-theta1[8])))/
(Continue reading)

Melrose2012 | 10 Feb 05:00
Gravatar

making multiple lines using qqplot

Hi Everyone,

I want to make 3 lines on the same graph (not as subplots, all in the same
window, one on top of each other) and I want them to be quantile-quantile
plots (qqplot).  Essentially, I am looking for the equivalent of Matlab's
"hold on" command to use with qqplot.  I know I can use 'points' or 'lines',
but these do not give me a qqplot (only appear to work as scatter plots).  I
found the syntax 'par(new=TRUE)' but that only seems to work for two lines,
not for three.

My script currently looks like:

qqplot(nq.n5,tq.n5,col="red",xlab="Normal Distribution Quantiles",ylab="t
Distribution Quantiles",main="Quantile-Quantile Plot of Normal vs
t-Distribution for Various Sample Sizes",pch=20)
par(new=TRUE)     
qqplot(nq.n50,tq.n50,col="blue",xlab="",ylab="",pch=20).
par(new=TRUE)     
qqplot(nq.n500,tq.n500,col="green",xlab="",ylab="",pch=20)
legend("topleft",c("n=5","n=50","n=500"),fill=c("red","blue","green"))

I realize that this only plots the first and the third qqplot because by
doing par(new=TRUE) again, it gets rid of the middle one.  I don't know how
to get around this and get all 3 lines on the same plot.

Can anyone please help me with this syntax?

Thank you very much for your time and advice!

Cheers,
(Continue reading)

R. Michael Weylandt | 10 Feb 04:53
Picon

colnames documentation

Consider the following in R 2.14.1 (seems to still be the case in Rdevel):

x <- matrix(1:9, 3)
colnames(x) # NULL as expected
colnames(x, do.NULL = TRUE) # NULL -- since we didn't change the default
colnames(x, do.NULL = FALSE) # "col1" "col2" "col3"

This doesn't really seem to square with the documentation which reads:

do.NULL: logical.  Should this create names if they are ‘NULL’?

The details section expounds and says:

If ‘do.NULL’ is ‘FALSE’, a character vector (of length ‘NROW(x)’
     or ‘NCOL(x)’) is returned in any case, prepending ‘prefix’ to
     simple numbers, if there are no dimnames or the corresponding
     component of the dimnames is ‘NULL’.

But I have to admit that I don't really get it. (The interpretation of
the docs; I understand the functionality) Could someone enlighten me?
Given what the details section says (and the behavior of the function
is), I'd expect something more like:

do.NULL: logical.  Is NULL an acceptable return value? If FALSE,
column names derived from prefix are returned.

Michael

PS -- In my searching, I think the link to the svn on the developer
page (http://developer.r-project.org/) is wrong: clicking it takes one
(Continue reading)

Sam Steingold | 10 Feb 03:44
Picon

the value of the last expression

Is there an analogue of common lisp "*" variable which contains the
value of the last expression?
E.g., in lisp:
> (+ 1 2)
3
> *
3

I wish I could recover the value of the last expression without
re-evaluating it.

thanks

--

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://camera.org http://ffii.org
http://truepeace.org http://memri.org http://americancensorship.org
The early bird may get the worm, but the second mouse gets the cheese.

Sam Steingold | 10 Feb 03:43
Picon

naiveBayes: slow predict, weird results

I did this:
nb <- naiveBayes(users, platform)
pl <- predict(nb,users)
nrow(users) ==> 314781
ncol(users) ==> 109

1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
(tens of minutes).  why?

2. the predict results were completely off the mark (quite the opposite
of the expected overfitting).  suffice it to show the tables:

pl:

   android blackberry       ipad     iphone         lg      linux        mac 
         3          5         11         14     312723          5         11 
    mobile      nokia    samsung    symbian    unknown    windows 
      1864         17         16        112          0          0 

platform:
   android blackberry       ipad     iphone         lg      linux        mac 
     18013       1221       2647       1328          4       2936      34336 
    mobile      nokia    samsung    symbian    unknown    windows 
        18         88         39        103       2660     251388 

i.e., nb classified nearly everything as "lg" while in the actual data
"lg" is virtually nonexistent.

3. when I print "nb", I see "A-priori probabilities" (which are what I
expected) and "Conditional probabilities" which are confusing because
(Continue reading)

Sebastián Daza | 10 Feb 03:25
Picon
Gravatar

calculations combining values from different rows

Hi everyone,
I looking for functions or systematic ways to do calculations between
different columns and rows in R, as one can do easily in Excel. For
example, I have two variables, a and b, where a1 represents an a value
in row 1, and b2 represents a b value in row 2, etc.

a  <- c(4,3,5,5,6,7,3,2,1,4)
b  <- c(2,4,1,2,5,3,1,8,7,5)
data  <- cbind(a,b)

I have to calculate something like this:

x1 = NA
x2 = -b1 /24 * a1 + b2 /2 * a2 + b3 /24 * a3
x3 = -b2 /24 *a2 + b3 /2 * a3 + b4 /24 * a4
x4 = -b3 /24 *a3 + b4 /2 * a4 + b5 /24 * a5
...
x9 = -b8 /24* a8 + b9 /2 * a9 + b10 /24 * a10
x10= NA

For example, x2 would be equal to: -2/24*4 +4/2*3 + 1/24 *5

Any ideas?
Thank you in advance.

--

-- 
Sebastián Daza

Janko Thyson | 10 Feb 01:56
Gravatar

Bug with memory allocation when loading Rdata files iteratively?

Dear list,

when iterating over a set of Rdata files that are loaded, analyzed and 
then removed from memory again, I experience a *significant* increase in 
an R process' memory consumption (killing the process eventually).

It just seems like removing the object via |rm()| and firing |gc()| do 
not have any effect, so the memory consumption of each loaded R object 
cumulates until there's no more memory left :-/

Possibly, this is also related to XML package functionality (mainly 
|htmlTreeParse| and |getNodeSet|), but I also experience the described 
behavior when simply iteratively loading and removing Rdata files.

I've put together a little example that illustrates the memory 
ballooning mentioned above which you can find here: 
http://stackoverflow.com/questions/9220849/significant-memory-issue-in-r-when-iteratively-loading-rdata-files-killing-the

Is this a bug? Any chance of working around this?

Thanks a lot and best regards,
Janko

	[[alternative HTML version deleted]]

Yang Zhang | 10 Feb 01:30
Picon
Gravatar

Custom caret metric based on prob-predictions/rankings

I'm dealing with classification problems, and I'm trying to specify a
custom scoring metric (recall <at> p, ROC, etc.) that depends on not just
the class output but the probability estimates, so that caret::train
can choose the optimal tuning parameters based on this metric.

However, when I supply a trainControl summaryFunction, the data given
to it contains only class predictions, so the only metrics possible
are things like accuracy, kappa, etc.

Is there any way to do this that I'm looking?  If not, could I put
this in as a feature request?  Thanks!

--

-- 
Yang Zhang
http://yz.mit.edu/

Yang Zhang | 10 Feb 01:00
Picon
Gravatar

Choosing glmnet lambda values via caret

Usually when using raw glmnet I let the implementation choose the
lambdas.  However when training via caret::train the lambda values are
predetermined.  Is there any way to have caret defer the lambda
choices to caret::train and thus choose the optimal lambda
dynamically?

--

-- 
Yang Zhang
http://yz.mit.edu/


Gmane