Kristi Glover | 26 Nov 01:57 2014
Picon

covariate or predictor

Hi, 
I am wondering how I can separate whether it is covariate or predictor in the ANOVA analysis. For example
 A<-structure(list(Machine = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L), Diameter = c(20L, 25L, 24L, 25L, 32L, 
22L, 28L, 22L, 30L, 28L, 21L, 23L, 26L, 21L, 15L), Strength = c(36L, 
41L, 39L, 42L, 49L, 40L, 48L, 39L, 45L, 44L, 35L, 37L, 42L, 34L, 
32L)), .Names = c("Machine", "Diameter", "Strength"), class = "data.frame", row.names = c(NA, 
-15L))
attach(A)
b<-aov(Strength~Diameter)
summary(b)
c<-aov(Strength~Diameter+as.factor(Machine))
summary(c)
I am confused here whether the "Mechine" is covariate or predictor.  How do I know which one is covariate and
predictor? 
If Machine is predictor (just like Diameter), how I am supposed to write in the model?
is the equation (below) for this one in the condition that the Machine is predictor? 
c1<-aov(Strength~Diameter+Machine), ?????. If it is so, it means that co-variate is  dummy variable, right????
Your help will really help me to clear the concept. 

Thanks 

KG

 		 	   		  
	[[alternative HTML version deleted]]

Rolf Turner | 25 Nov 21:18 2014
Picon
Picon

Re: plot.hclust point to older version

On 26/11/14 08:53, Michael Mason wrote:
> Here you are. I expect most folks won't get the error.
>
> N   = 100; M = 1000
> mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M)
> h   = hclust(as.dist(1-cor(mat)))
> plot(h)
>
> Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
>    there is no .Internal function 'dend.window'
>
>
>
> Thanks again
>
>
> On 11/25/14 11:29 AM, "Rolf Turner" <r.turner <at> auckland.ac.nz> wrote:
>
>>
>>
>> Reproducible example???
>>
>> (I know from noddink about hclust, but I tried the example from the help
>> page and it plotted without any problem.)
>>
>> cheers,
>>
>> Rolf Turner
>>
>> On 26/11/14 06:13, Michael Mason wrote:
(Continue reading)

~Stack~ | 25 Nov 19:44 2014
Picon

Re: RStuido seg faults

On 11/25/2014 12:01 PM, Mark Sharp wrote:
> Have you tried the current version of R, 3.1.2?

I have not. I haven't had many issues in the past using what was in the
EPEL repos. Let me take one of my dev boxes and give it a try.

I will post back what I find.

Thanks!

On 11/25/2014 12:01 PM, Mark Sharp wrote:
> Have you tried the current version of R, 3.1.2?

I have not. I haven't had many issues in the past using what was in the
EPEL repos. Let me take one of my dev boxes and give it a try.

I will post back what I find.

Thanks!

Tom Wright | 25 Nov 21:12 2014

Presentation tables in R (knitr)

Hi,
This problem has me stumped so I thought I'd ask the experts. I'm trying
to create a pretty summary table of some data (which patients have had
what tests at what times). Ideally I'd like to knitr this into a pretty
PDF for presentation.
If anyone has pointers I'll be grateful.

require(tables)
require(reshape2)

data<-data.frame('ID'=paste0('pat',c(rep(1,8),rep(2,8))),
                 'Time'=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
                 'Eye'=rep(c('OS','OS','OD','OD'),4),
                 'Measure'=rep(c('Height','Weight'),8))

tabular(Measure~factor(ID)*factor(Time)*factor(Eye),data)
#All levels of Time are repeated for all IDs, I'd prefer to just show
the relevant times.

tabular(Measure~factor(ID)*Time*factor(Eye),data)
#Time is getting collapsed by ID

data$value=1
dcast(data,Measure~ID+Time+Eye)
#close but not very pretty

ravi | 25 Nov 18:41 2014
Picon

porting an access database to sqlite

Hi,All my data is presently locked in a Microsoft access database. This has huge data in a number of large
tables. Using RODBC and connecting to it takes too long a time, sometimes making the system to hang up. 

To make things more manageable, I have tried to transfer the data to manageable .RData or .csv files. But I am
not able to do this with some of the larger files. I am currently stuck in the one of the preliminary steps. I
am not able to find the number of rows in a table. If I know this, I can transfer chunks of the tables to a sqlite database.
I am able to connect to connect to the access data base with :
library(RODBC)
con<-odbcConnect("TestDB")d1<-sqlFetch(ch,"table1",max=1e5,as.is=TRUE)d2<-sqlFetchMore(ch,max=1e5,as.is=TRUE)
d3<-rbind(d1,d2)I wanted to develop this into a loop to get a concatenated data frame, which I wanted to
save either as a binary file, or transfer it to a sqlite data base. I would like to have some help on the
simplest route to follow. But first, I will proceed in describing my immediate problem. In some of the
tables, I find that the sqlFetchMore returns a value of -1L, meaning that the end has been reached. In
RODBC, I find no command for getting the row count in a table. I have found this in DBI. But I have not been able
to figure out how I should specify the connection to an acccess database (con in the following).

while(!dbHasCompleted(con)){   print(dbGetRowCount(con))}
I would appreciate help on the following points :1. How can I get the row count (and size) of a table in an
access data base? With RODBC, DBI or any other way.2. I have found that saving the tables as .RData files
reduces the file size and and reduces the reading time. Is there some way of appending to an already saved
data frame with this method?3. I have come across alternative ways of saving data - using writeBin and
packages like saves, rhdf5 etc. Would they be useful alternatives?
4. Is there an advantage in combining binary files and databases like sqlite? Or, are files already saved in
a binary format in databases like sqlite?5. What is the simplest method of porting from the access to the
sqlite database? With RSQlite and RODBC, can I have connections to the access and sqlite databases open at
the same time? Or, should I close one and then open the other? It would help if I can get a detailed bit if code
for doing this in a simple way.

I would appreciate all help that I can get.Thanks,Ravi

(Continue reading)

Michael Mason | 25 Nov 18:13 2014

plot.hclust point to older version

Hello fellow R users,

I have recently updated to R 3.1.2. When trying to plot an hclust object to generate the dendrogram I get the
following error:

Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
  there is no .Internal function 'dend.window'

I am indeed using R3.1.2 but my understanding is that the .Internal API to the C code is no longer used. I have
tried detaching the stats package and restarting R to no avail.
I would love any help from any wiser guRus.
Thanks in advance,
Mike
________________________________
--CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of
the addressee and may contain confidential information. If you are not the intended recipient, you are
hereby notified that any form of dissemination of this communication is strictly prohibited. www.benaroyaresearch.org

	[[alternative HTML version deleted]]

Charlotte Whitham | 25 Nov 17:21 2014
Picon

Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Dear list,

I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal
categorical response variable with 15 continuous explanatory variables.
I have used the code (shown below) to check that my model meets the proportional odds assumption following
advice provided at (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely
helpful, thank you to the authors! However, I’m a little worried about the output implying that not only
are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below).

Here is the code I used (and see attached for the output graphic)

FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))

b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect +
FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village +
FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL +
FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)

#Checking the assumption. So the following code will estimate the values to be graphed. First it shows us
#the logit transformations of the probabilities of being greater than or equal to each value of the target #variable

FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat) 

sf <- function(y) {

  c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),

    'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),

    'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),
(Continue reading)

jeeth ghambole | 25 Nov 14:06 2014
Picon

Packages for Handling Large Data Set

Hello All,

I am working on BackTesting Strategies on stocks using daily prices.

Initially the size of data was very limited and can be easily handled using
R and SQL, but now my analysis has been extending on large set of data. Can
anyone suggest me the best packages available for handling large datasets.

Thank you.

With Regards,
Jeeth G.

	[[alternative HTML version deleted]]

Massimiliano Tripoli | 25 Nov 13:07 2014
Picon

Converting list to character


Dear all,

I can't convert the result of aggregate function in a dataframe. My data
looks like:

mydata <- structure(list(ID = c(11, 11, 460, 460, 986, 986, 986, 986, 1251,
1251, 1251, 1251, 1251, 1251, 1251, 1251, 1801, 1801, 1801, 1801
), YEAR = c(2009, 2010, 2010, 2011, 2008, 2009, 2010, 2011, 2008,
2008, 2009, 2009, 2010, 2010, 2011, 2011, 2008, 2009, 2010, 2011
), Y = c(158126, 153015, 3701, 5880, 718663, 661112, 527233,
558281, 450, 131714, 427, 124648, 425, 116500, 434, 123853, 17400,
16493, 8057, 8329), CODE = c("GR.3.7", "GR.3.7", "GR.3.1", "GR.3.1",
"GR.3.8", "GR.3.8", "GR.3.8", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1",
"GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.8", "GR.3.8",
"GR.3.8", "GR.3.8")), .Names = c("ID", "YEAR", "Y", "CODE"), row.names = c(NA,
20L), class = "data.frame")

and by using aggregate function

TAB <- aggregate(mydata$CODE,by=list(ID=mydata$ID,YEAR=mydata$YEAR),FUN=paste0)

What I want is a dataframe like of printing TAB:
> TAB
     ID YEAR              x
1   986 2008         GR.3.8
2  1251 2008 GR.3.1, GR.3.8
3  1801 2008         GR.3.8
4    11 2009         GR.3.7
5   986 2009         GR.3.8
(Continue reading)

Stack Kororā | 25 Nov 00:58 2014
Picon

RStuido seg faults

Greetings,
I am having a big issue with RStudio segfaulting recently. It is
becoming a very big problem for me. I have attached most of the
information to the support site but no one has responded there. Can
someone please help me fix RStudio?

https://support.rstudio.com/hc/communities/public/questions/203655446-Core-Crash-on-SL6-6-and-rstudio-0-98-1091-1

Thank you!

Jack Luo | 25 Nov 17:36 2014
Picon

question regarding rmvDAG in package pcalg

Hi,

I am trying to use rmvDAG in pcalg package to generate data from DAG
structure. One thing I found is that when the number of variables gets
large, there can be really large numbers in the data matrix. I played
around with different parameters and it looks like the same case.

library(pcalg)
> p = 20
> n = 100
> rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1)
> d.normMat <- rmvDAG(n, rDAG, errDist="normal")
> max(d.normMat)
[1] 5.763518
> p = 200
> n = 100
> rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1)
> d.normMat <- rmvDAG(n, rDAG, errDist="normal")
> max(d.normMat)
[1] 365099508
> p = 2000
> n = 100
> rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1)
> d.normMat <- rmvDAG(n, rDAG, errDist="normal")
> max(d.normMat)
[1] 3.880373e+90

Did anyone know how to fix this?

Thanks a lot!
(Continue reading)


Gmane