Matt Pickard | 1 Sep 06:14 2015
Picon

reshape: melt and cast

Hi,

I have data that looks like this:

*> head(ratings)  QCode  PID  RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8 SI9
SI10 SI111 GUILT 1123 cwormhoudt   2   2   3   1   1   1   3   3   3
2    12  LOVE 1123 cwormhoudt   1   2   3   2   1   1   1   1   1    1
33 GUILT 1136 cwormhoudt   1   2   3   1   1   1   2   3   2    2    14
LOVE 1136 cwormhoudt   1   2   3   1   1   1   1   1   1    1    25 GUILT
1137 cwormhoudt   2   2   2   1   1   1   2   3   1    2    16  LOVE 1137
cwormhoudt   1   3   4   1   1   1   1   1   1    1    4*

*> tail(ratings)      QCode  PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8
SI9 SI10 SI112456    FUN 1555  zspeidel   1   3   3   1   1   1   1   1
1    1    12457    FUN 1556  zspeidel   1   2   2   1   1   1   1   1
1    1    12458    FUN 1558  zspeidel   1   2   3   1   1   1   1   1
1    1    12459 APPEAR 1558  zspeidel   1   3   3   1   1   1   2   1
1    2    12460 APPEAR 1559  zspeidel   1   3   3   1   1   1   2   1
1    2    12461    FUN 1559  zspeidel   1   2   2   1   1   1   1   1
1    1    1*
I am trying to use the melt and cast functions to re-arrange it to look
like this:

*   QCode  PID sItem cwormhoudt zspeidel1 APPEAR 1123   SI1
1        12 APPEAR 1123   SI2          4        13 APPEAR 1123
SI3          1        24 APPEAR 1123   SI4          3        15 APPEAR
1123   SI5          1        16 APPEAR 1123   SI6          1        3*
So, I melt the data like this:

*mratings = melt(ratings, variable_name="sItem")*
(Continue reading)

iclozm | 31 Aug 22:22 2015
Picon

PLS Regression: Non-Metric

Hello,

I was wondering if anyone has successfully implemented the non-metric pls
regression algorithm presented by Giorgio Russolillo in the appendix of his
PhD dissertation. 

I am having issues calling the function with his example matrices (tea data
set) and I think there may be a few mistakes in the R-Code given in the
appendix. 

The function is defined as:

myPLSQQ(Y = NA, Yc = NA, X = NA, Xc = NA, ncomp)

however nowhere does he specify what Xc and Yc refer to (I am assuming data
containing the non metric variables). Furthermore, I do not understand how
to call a function with arguments initialized to NA. 

Can I simply say: 

Y <- Y_Matrix
Xc <- X_Matrix
ncomp <- 3

data <- myPLSQQ(Y = Y, Yc = NA, X = NA, Xc = Xc, ncomp)

Kind Regards,
Chris

--
(Continue reading)

Dominic Roye | 31 Aug 22:11 2015
Picon

Non zero padding dates

Hello,

How can I convert date-time in which month and day have non zero padding?

For example: "2015119_06"  ("2015-01-19 06:00")

Thanks

Dominic

	[[alternative HTML version deleted]]

shawin | 31 Aug 21:46 2015
Picon

Re: Median on second group of CSV file produce Na

I have an issue ans i posted it , so i would like to receive a solution
please

On Mon, Aug 31, 2015 at 8:35 PM, shawin [via R] <
ml-node+s789695n4711689h58 <at> n4.nabble.com> wrote:

> I have a data frame  csv file and I'm trying to calculate median for each
> group separately row by row . When I separate the data frame in two groups
> and calculate the median for each one, I am getting an NA result for the
> second group :
> the data
>   x1  x2  x3  x4  x5  x6  x7  y1  y2  y3  y4  y5  y6  y7  y8
> 9.488404158 9.470895414 9.282433728 9.366707445 9.955383045 9.640816474
> 9.606262272   9.329651027 9.434541611 9.473922432 9.311412966 9.3154885
> 9.434977488 9.470895414 9.764258059
> 8.630629966 8.55831075  8.788391003 8.576231135 8.671587906 8.842979993
> 8.861958856 8.58330436  8.603596508 8.570129609 8.59798922  8.572686772
> 8.679751791 8.663950953 8.432875347
> 9.354748885 9.367668838 9.259952558 9.421538213 9.554635162 9.603744578
> 9.452197983 9.284228877 9.404607878 9.317737979 9.343115301 9.310644266
> 9.27227486  9.360337823 9.44706281
> 9.944863964 9.950427516 10.19101759 10.07350804 10.03269879 10.1307908
>  10.03487287 9.74609383  9.886379007 9.775472567 10.036596   9.544738458
> 9.699611598 9.911962567
> 9.625804277
>
>
>                                    Code:
>
>        rowN <- nrow(AT1)
(Continue reading)

Joaquín Aldabe | 31 Aug 23:06 2015
Picon

Interpreting interaction terms in a glmm

Hi, I would appreciate if someone could send me or indicate where to find
information on how to interpret second order interactions (especially when
variables are both continuous) in the context of glmm R output.

Thanks in advanced,
Joaquín

-- 
*Joaquín Aldabe*

*Grupo Biodiversidad, Ambiente y Sociedad*
Centro Universitario de la Región Este, Universidad de la República
Ruta 15 (y Ruta 9), Km 28.500, Departamento de Rocha

*Departamento de Conservación*
Aves Uruguay
BirdLife International
Canelones 1164, Montevideo

https://sites.google.com/site/joaquin.aldabe
<https://sites.google.com/site/perfilprofesionaljoaquinaldabe>

	[[alternative HTML version deleted]]

______________________________________________
R-help <at> r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Luigi Marongiu | 31 Aug 22:49 2015
Picon

Conditional replacement and removal of data frame values

Dear all,
I have a data frame and I would like to do the following:
a) replace value of one variable "a" according to the value of another one "b"
b) remove all the instances of the variable "b"

For the sake of argument, let's say I have the following data frame:
test <- rep(c("Adenovirus", "Rotavirus", "Norovirus", "Rotarix",
"Sapovirus"), 3)
res <- c(0, 1, 0, 0, 1,
         1, 0, 1, 1, 0,
         0, 1, 0, 1, 0)
samp <- c(rep(1, 5), rep(2, 5), rep(3, 5))
df <- data.frame(test, res, samp, stringsAsFactors = FALSE)

The task I need is to coerce the results of the "Rotavirus" to
negative (0) if and only if "Rotarix" is positive (1). In this
example, the results shows that for "samp" 3 "Rotavirus" should be 0:
    test           res samp
2  Rotavirus   1    1
4  Rotarix       0    1
7  Rotavirus    0    2
9  Rotarix       1    2
12 Rotavirus   1    3
14 Rotarix       1    3

I can't use the subset function because then I would work on a
separate object and I don't know how to implement the conditions for
the replacements.
Finally, all the "Rotarix" entries should be removed from the data frame.
Thank you.
(Continue reading)

Luigi Marongiu | 31 Aug 22:17 2015
Picon

modify strip labels with given text using lattice package

Dear all,
I am drawing a barchart plot with lattice and the resulting strips are
taking the value of the variable being compared (in this example
"assay"). However I would like to write myself the value to place into
the strips, let's say I want to call the variables as "molecular test"
and "serological test" the values "a" and "b" respectively within
"assay". I have tried different approaches taken from the web but
nothing worked.
Would you have any tip?
Best regards
Luigi

>>>
test <- rep(c("Adenovirus", "Rotavirus", "Norovirus", "Rotarix",
"Sapovirus"), 2)
res <- c(0, 1, 0, 1,0, 1,0, 1,0, 1, 0, 1, 0, 1,0, 1,0, 1,0, 1)
count <- rnorm(20)
assay <- c(rep("a", 10), rep("b", 10))

df <- data.frame(test, res, count, assay, stringsAsFactors = FALSE)

library(lattice)
barchart(
    test ~ count|assay,
    df,
    groups = res,
    stack = TRUE,
    main = "Comparison of test results",
    xlab = "Count",
    col = c("yellow", "blue"),
(Continue reading)

Lorenzo Isella | 31 Aug 21:25 2015
Picon

Trouble with Caret and C5.0

Dear All,
I am trying to mine a small dataset.
Admittedly, it is a bit odd since it is an example of
multi-classification task where I have more than 300 different classes for about 600
observations.
Having said that, the problem is not the output of my script, but the
fact that it gets stuck, without an error message, when I use C5.0 and
caret.
I recycled another script of mine which never gave me any headache, so
I do not know what is going on.
The small training set can be downloaded from

https://www.dropbox.com/s/4yseukqqvssvh63/training.csv?dl=0

whereas I paste my script at the end of the email.
C5.0 without caret completes in seconds, so I must be making some
mistakes with Caret.
Any suggestion is appreciated.

Lorenzo

####################################################

library(caret)
library(readr)
library(C50)
library(doMC)
library(digest)

train <- read_csv("training.csv")
(Continue reading)

Mark Edmondson | 30 Aug 20:29 2015

[R-pkgs] New packages on CRAN: googleAuthR and searchConsoleR

Hi R package users,

You may be interested in these two new packages now available on CRAN:

*googleAuthR* lets you easily authenticate with Google OAuth2 APIs and make
your own packages with them.  It also is multi-user Shiny compatible, so
you can publish your apps and users can work with their own data.  With
Google APIs including Gmail, Google Predict and Google Drive this offers
access to some nice resources.

*searchConsoleR* is the first package released using googleAuthR, working
with the Google Search Console.  This data includes what keywords people
have used to find your website.

Hope they are of interest, do let me know what you build with them!
Yours sincerely,
Mark

	[[alternative HTML version deleted]]

_______________________________________________
R-packages mailing list
R-packages <at> r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

Fox, John | 31 Aug 17:09 2015
Picon
Picon

Re: using survreg() in survival package with "long" data

Dear Terry,

Thank you for the extended explanation -- it's helpful. 

Best,
 John

________________________________________
From: Therneau, Terry M., Ph.D. [therneau <at> mayo.edu]
Sent: August 31, 2015 9:56 AM
To: r-help <at> r-project.org; Fox, John; Göran Broström
Subject: Re: using survreg() in survival package with "long" data

On 08/30/2015 05:00 AM, r-help-request <at> r-project.org wrote:
> I'm unable to fit a parametric survival regression using survreg() in the survival package with data in
"counting-process" ("long") form.
>
> To illustrate using a scaled-down problem with 10 subjects (with data placed on the web):
>

As usual I'm a day late since I read digests, and Goran has already clarified things.  A
discussion of this is badly needed in my as yet unwrritten book on using the survival
package.  From a higher level view:
   If an observation is interval censored (a,b) then one knows that the event happened
between time "a" and time "b", but not when.  The survreg routine can handle interval
censored data since it is parametric (you need to integrate over the interval).  The
interval (-infinity, b) is called 'left censored' and the interval (a, infinity) is 'right
censored'.  Left censored data is rare in medical work, an example might be a chronic
disease like rhuematoid arthritis where we know that the true disease onset was some time
before the date it was first detected, and one is trying to deduce the duration of disease.
(Continue reading)

Therneau, Terry M., Ph.D. | 31 Aug 15:56 2015

Re: using survreg() in survival package with "long" data


On 08/30/2015 05:00 AM, r-help-request <at> r-project.org wrote:
> I'm unable to fit a parametric survival regression using survreg() in the survival package with data in
"counting-process" ("long") form.
>
> To illustrate using a scaled-down problem with 10 subjects (with data placed on the web):
>

As usual I'm a day late since I read digests, and Goran has already clarified things.  A 
discussion of this is badly needed in my as yet unwrritten book on using the survival 
package.  From a higher level view:
   If an observation is interval censored (a,b) then one knows that the event happened 
between time "a" and time "b", but not when.  The survreg routine can handle interval 
censored data since it is parametric (you need to integrate over the interval).  The 
interval (-infinity, b) is called 'left censored' and the interval (a, infinity) is 'right 
censored'.  Left censored data is rare in medical work, an example might be a chronic 
disease like rhuematoid arthritis where we know that the true disease onset was some time 
before the date it was first detected, and one is trying to deduce the duration of disease.

   Left truncation at time 'a' means that any events before time "a" are not in the data 
set.  In a referral center like mine this includes any subjects who die before they come 
to us.  The coxph model handles left truncation naturally via its counting process 
formulation.  That same formulation also allows it to deal with time dependent 
covariates.   Accelerated failure time models like survreg can handle left truncation in 
principle, but they require that the values of any covariates are known from time 0 -- 
even for a truncated subject.   I have never added left-truncation to the survreg code, 
mostly because I have never needed it myself, but also because users would immediately 
think that they could accomplish time-dependent covariates by simply using a long format 
data set. Rather, each subject needs to be linked to a full covariate history, which is a 
bit more work.
(Continue reading)


Gmane