Noah Silverman | 1 Aug 2009 01:04

Re: scale subset of data

That works perfectly.

Thanks!

-N

On 7/31/09 2:04 PM, Steve Lianoglou wrote:
> Hi,
>
> On Jul 31, 2009, at 4:13 PM, Noah Silverman wrote:
>
>> Hi,
>>
>> This should be an easy one, but I have some trouble formatting the data
>> right
>>
>> I'm trying to replace the column of a subset of a dataframe with the
>> scaled data for that column of the subset
>>
>> subset(rawdata, code== "foo", select = a) <- scale( subset(rawdata,
>> code== "foo", select = a) )
>>
>> It returns:   could not find function "subset<-"
>>
>> The scale command works individually and the subset command works
>> individually.
>>
>> Can someone help me format this command line correctly?
>
> How about:
(Continue reading)

Noah Silverman | 1 Aug 2009 01:17

scale subsets of grouped data in data frame

Hello,

I'm trying to duplicate what's an easy process in RapidMiner.

In RM, we can simply use two operators:
     subgroup iteration
     attribute value selection (Can use a regex for the attrribute name.)

I can do this in R with a lot of code and manual steps.  It would be 
really nice to find a more automated way.

My data looks like this

group 	group_height 	group_weight 	height 	weight
g22 	3.2 	8.896 	3.2 	8.896
g22 	2.5 	6.95 	2.5 	6.95
g22 	3.1 	8.618 	3.1 	8.618
g49 	2.4 	6.672 	2.4 	6.672
g49 	4.2 	11.676 	4.2 	11.676
g49 	2.5 	6.95 	2.5 	6.95
g55 	2.6 	7.228 	2.6 	7.228
g55 	3.4 	9.452 	3.4 	9.452
g55 	3.3 	9.174 	3.3 	9.174

What I want to do is scale the data by each group
So in pseudo-code
     for(group in groups){
         if(column_name = regex(group_.*)){
             data[column_name] = scale(data[group,column_name])
         }
(Continue reading)

Mark Na | 1 Aug 2009 01:41
Picon

Compare lm() to glm(family=poisson)

Dear R-helpers,
I would like to compare the fit of two models, one of which I fit using lm()
and the other using glm(family=poisson). The latter doesn't provide
r-squared, so I wonder how to go about comparing these
models (they have the same formula).

Thanks very much,

Mark Na

	[[alternative HTML version deleted]]

ws | 1 Aug 2009 03:25
Picon

Re: pyramid.plot: x-axis scale

> Hi ws,
> You could tweak pyramid.plot in the plotrix package to do this.

I guess I will live without...

Unless you can spell the process out for doing that -- where is the source,
where would the package download be on my machine before (Mac OS X), who would I
send a working patch to.  If (the maintainer, right?) think it matters I can do
that, but I probably need a little handholding.

Wensui Liu | 1 Aug 2009 03:29
Picon

Re: Compare lm() to glm(family=poisson)

i don't understand how you can fit a poisson model with lm() function.
otherwise, how could you compare lm() with glm(...family=poisson)?

On Fri, Jul 31, 2009 at 7:41 PM, Mark Na<mtb954 <at> gmail.com> wrote:
> Dear R-helpers,
> I would like to compare the fit of two models, one of which I fit using lm()
> and the other using glm(family=poisson). The latter doesn't provide
> r-squared, so I wonder how to go about comparing these
> models (they have the same formula).
>
> Thanks very much,
>
> Mark Na
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help <at> r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--

-- 
==============================
WenSui Liu
Blog   : statcompute.spaces.live.com
Tough Times Never Last. But Tough People Do.  - Robert Schuller
==============================

(Continue reading)

Steve Lianoglou | 1 Aug 2009 03:38
Picon

Re: scale subsets of grouped data in data frame

Hi,

On Jul 31, 2009, at 7:17 PM, Noah Silverman wrote:

> Hello,
>
> I'm trying to duplicate what's an easy process in RapidMiner.
>
> In RM, we can simply use two operators:
>     subgroup iteration
>     attribute value selection (Can use a regex for the attrribute  
> name.)
>
> I can do this in R with a lot of code and manual steps.  It would be
> really nice to find a more automated way.
>
> My data looks like this
>
> group 	group_height 	group_weight 	height 	weight
> g22 	3.2 	8.896 	3.2 	8.896
> g22 	2.5 	6.95 	2.5 	6.95
> g22 	3.1 	8.618 	3.1 	8.618
> g49 	2.4 	6.672 	2.4 	6.672
> g49 	4.2 	11.676 	4.2 	11.676
> g49 	2.5 	6.95 	2.5 	6.95
> g55 	2.6 	7.228 	2.6 	7.228
> g55 	3.4 	9.452 	3.4 	9.452
> g55 	3.3 	9.174 	3.3 	9.174
>
> What I want to do is scale the data by each group
(Continue reading)

zhu yao | 1 Aug 2009 05:24
Picon

about the summary(cph.object)

Could someone explain the summary(cph.object)?

The example is in the help file of cph.

n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
              rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
dd <- datadist(age, sex)
options(datadist='dd')
Srv <- Surv(dt,e)

f <- cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
summary(f)

                                         Effects              Response : Srv

 Factor            Low    High   Diff.  Effect S.E. Lower 0.95 Upper 0.95
 age               40.872 57.385 16.513 1.21   0.21 0.80       1.62
  Hazard Ratio     40.872 57.385 16.513 3.35     NA 2.22       5.06
 sex - Female:Male  2.000  1.000     NA 0.64   0.15 0.35       0.94
(Continue reading)

Ken Knoblauch | 1 Aug 2009 09:40
Picon
Favicon

Re: Compare lm() to glm(family=poisson)

Mark Na <mtb954 <at> gmail.com> writes:
> 
> Dear R-helpers,
> I would like to compare the fit of two models, one of which I fit using lm()
> and the other using glm(family=poisson). The latter doesn't provide
> r-squared, so I wonder how to go about comparing these
> models (they have the same formula).
> 
> Thanks very much,
> 
> Mark Na
> 
I'm not sure what you are trying to do but it might be
informative to compare the diagnostic plots from the
fits.  Remember that Poisson distributed data is
heteroscedastic, mean = variance, which isn't the
default hypothesis when fitting with lm.  Also, the
default link function with the poisson family is log.
So, these are things to take into account in any potential 
comparison.  

Ken

--

-- 
Ken Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
(Continue reading)

Gabor Grothendieck | 1 Aug 2009 12:12
Picon

Re: superpose 2 time series with different time intervals

Try this:

# two simulated series
set.seed(123)
ts.sim <- arima.sim(list(order = c(1,1,0), ar = 0.7), n = 70)
ts.sim <- ts(c(ts.sim), start = 1940)
ts.sim2 <- arima.sim(list(order = c(1,1,0), ar = 0.7), n = 12*70)
ts.sim2 <- ts(c(ts.sim2), start = 1940, freq = 12)

# plot
plot(ts.sim2, type = "l", col = grey(0.5))
lines(ts.sim)
axis(1, time(ts.sim), lab = FALSE)

On Fri, Jul 31, 2009 at 3:15 PM, Gary Lewis<gary.m.lewis <at> gmail.com> wrote:
> I  could use some advice.
>
> I've got 2 time series. Both cover approximately the same period of
> time (ie, 1940 to 2009). But one series has annual data and the other
> has monthly data. One refers to university enrollment; the other to
> unemployment rates. Both are currently in the same data frame.
>
> I'd like to use the monthly times series as a light grayscale
> background for a plot of the annual time series, showing both series
> as type "l" (line). Naturally with all the NA's in the annual series,
> that plot disappears because points are not connected across missing
> values.
>
> I suppose I could make both series annual, but a lot of interesting
> detail would get lost this way. Or I guess I could interpolate values
(Continue reading)

baptiste auguie | 1 Aug 2009 12:33

Re: re moving intial numerals

Try this,

formatC(d %% 1e5, width=5, flag = "0", mode="integer")

 [1] "00735" "02019" "04131" "04217" "04629" "04822" "10115" "11605" "14477"
[10] "15314" "15438" "19040" "19603" "22735" "22853" "23415" "24227" "24423"

HTH,

baptiste

2009/7/31 PDXRugger <J_R_36 <at> hotmail.com>:
>
>
> I would like to recreate "data" so that only the last 5 digits of the below
> data are inlcuded as data so 2005000002019 would become 02019.  Any ideas.
>
> data=c(2005000000735,
> 2005000002019,
> 2005000004131,
> 2005000004217,
> 2005000004629,
> 2005000004822,
> 2005000010115,
> 2005000011605,
> 2005000014477,
> 2005000015314,
> 2005000015438,
> 2005000019040,
> 2005000019603,
(Continue reading)


Gmane