Overstreet, Jason (FL51 | 1 Mar 2006 20:29
Picon
Favicon

[S] Date column

I am trying to use the date given in a column listed in a data frame to create a factor in another column.  In this case, I have the following:

Date1

05/01/1990

06/02/1995

07/03/2001

08/03/2003


I would like to create a corresponding column that tests the Date1 column and creates a variable in another column that states eitherTwentieth Century orTwenty-First Century based on the value in Date1.

Desired output:

 Date1                  Period

05/01/1990              Twentieth Century

06/02/1995              Twentieth Century

07/03/2001              Twenty-First Century

08/03/2003              Twenty-First Century


Thanks in advance

Jason Overstreet

Statistician

Honeywell DSES

(727) 539-3964

Doing what it takes to be our Customers' 1st Choice.


Sundar Dorai-Raj | 1 Mar 2006 20:36

Re: [S] Date column


Overstreet, Jason (FL51) wrote:
> I am trying to use the date given in a column listed in a data frame to 
> create a factor in another column.  In this case, I have the following:
> 
> Date1
> 
> 05/01/1990
> 
> 06/02/1995
> 
> 07/03/2001
> 
> 08/03/2003
> 
> 
> I would like to create a corresponding column that tests the Date1 
> column and creates a variable in another column that states either 
> “Twentieth Century” or “Twenty-First Century” based on the value in Date1.
> 
> Desired output:
> 
>  Date1                  Period
> 
> 05/01/1990              Twentieth Century
> 
> 06/02/1995              Twentieth Century
> 
> 07/03/2001              Twenty-First Century
> 
> 08/03/2003              Twenty-First Century
> 
> 
> Thanks in advance…

Hi, Jason,

How about:

td <- timeDate(c("05/01/1990", "06/02/1995",
                  "07/03/2001", "08/03/2003"))
period <- ifelse(td < timeDate("01/01/2001"),
                  "Twentieth Century",
                  "Twenty-First Century")
data.frame(Date1 = td, Period = period)

--sundar
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Thompson, David (MNR | 1 Mar 2006 21:13
Picon

[S] trellis line plots with error bars

Hello,

I am wondering how to convert a tedious plotting routine to a trellis
method.
The data has six columns of treatment means, six columns of standard
errors,
one integer column 'year' (4 levels) and one factor column 'group' (4
levels).
(Data 'dump' below sig.)

At this time I, unfortunately, work mostly through the GUI and would use
a
routine such as:

1) with a single species group subset,
   individually select (control+click) year, then mean1, then stderr1,
2) select 'Error Bar' plot from the '2D Plots' toolbar,
3) select the graph generated,
4) individually select year, then mean2, then stderr2,
5) shift+click on 'Error Bar' plot button.
6) repeat 4-5 until all six mean-stderr pairs have been plotted.

7) repeat 1-6 until all four species group subsets have been processed.

8) arrange the four plots on a single page.

'AARRGH' I *know* there have to be better ways!

Any and all help will be *much* appreciated.

Thanks, DaveT.
**********************************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
Sault Ste. Marie, Ontario, Canada
david.thompson <at> mnr.gov.on.ca
**********************************************************

P.S. Why does the output of data.dump go to my original user directory
rather than my current working directory (chapter)?

The contents of data.dump:
## Dump S Version 4 Dump ##
test
.named_I
structure
2
data.frame
structure
5
.named_I
structure
2
list
list
14
numeric
numeric
16
54.666666666666664
55.444444444444443
57.444444444444443
57.888888888888886
24
24.444444444444443
25.666666666666668
26
16.222222222222221
16.333333333333332
16.444444444444443
16.444444444444443
14.444444444444445
14.666666666666666
15.333333333333334
15.444444444444445
numeric
numeric
16
40
40.200000000000003
44.600000000000001
45.200000000000003
14.6
14.6
17
17
12.800000000000001
13
13.6
13.6
12.6
12.6
14
14.6
numeric
numeric
16
39.399999999999999
39.799999999999997
37
37.399999999999999
15.800000000000001
15.800000000000001
14.199999999999999
14.6
12.6
13.199999999999999
11.4
11.4
11
10.800000000000001
11.4
11.4
numeric
numeric
16
34.111111111111114
34.666666666666664
37.777777777777779
38.333333333333336
11.111111111111111
11
12.555555555555555
13
12.666666666666666
13.333333333333334
14.222222222222221
14.222222222222221
10.333333333333334
10.333333333333334
11
11.111111111111111
numeric
numeric
16
22.222222222222221
22.111111111111111
25.666666666666668
26.888888888888889
7
6.8888888888888893
8.4444444444444446
8.8888888888888893
7.8888888888888893
7.8888888888888893
8.8888888888888893
9.5555555555555554
7.333333333333333
7.333333333333333
8.3333333333333339
8.4444444444444446
numeric
numeric
16
30
30.222222222222221
32.444444444444443
33.333333333333336
12
12
12.888888888888889
13.222222222222221
9
9.1111111111111107
9.4444444444444446
9.5555555555555554
9
9.1111111111111107
10.111111111111111
10.555555555555555
integer
integer
16
1996
1997
1998
2000
1996
1997
1998
2000
1996
1997
1998
2000
1996
1997
1998
2000
numeric
numeric
16
2.1343747458109497
2.0419973277210119
2.2919612605262523
2.2939803135625088
1.6749792701868149
1.625415426480866
1.4433756729740645
1.4907119849998598
1.4021588293471072
1.4142135623730949
1.405456737852613
1.3955396318978297
0.83518313213182827
0.74535599249992979
0.8498365855987976
0.86780551954518381
numeric
numeric
16
2.9495762407505253
3.1208973068654471
3.0099833886584824
3.2310988842807022
2.1587033144922905
2.1587033144922905
1.7606816861659009
1.7606816861659009
0.37416573867739411
0.54772255750516607
1.1224972160321824
1.1224972160321824
1.0295630140987
1.0295630140987
0.63245553203367588
0.97979589711327109
numeric
numeric
16
2.9427877939124323
3.2771939216347876
1.51657508881031
1.16619037896906
1.6552945357246849
1.6552945357246849
1.3190905958272918
1.2083045973594571
0.92736184954957024
1.1575836902790224
0.74833147735478822
0.74833147735478822
1.2247448713915889
1.1575836902790224
0.6782329983125267
0.92736184954957024
numeric
numeric
16
1.3170299545699664
1.3437096247164249
1.4508405396889552
1.6666666666666667
1.2521586299538494
1.2360330811826106
0.91455986850857374
0.89752746785575066
0.57735026918962562
0.57735026918962573
0.61864048475889133
0.66202084932294369
1
1
1.0408329997330663
1.0467468711898207
numeric
numeric
16
0.982878111830785
0.99225394997176408
1.2247448713915892
0.82402205412174023
0.81649658092772592
0.82402205412174057
0.66897747659957207
0.65499034014175528
0.48432210483785254
0.48432210483785254
0.78959280019732747
0.60348780506667865
0.57735026918962584
0.57735026918962584
0.23570226039551587
0.29397236789606557
numeric
numeric
16
2.6977356760397746
2.7017369812861838
2.2798093920759102
2.2360679774997898
1.7400510848184252
1.7400510848184252
1.5041095967423201
1.4886401118860293
0.78173595997057166
0.77180244385832253
0.72860428047800008
0.76578048622723449
0.81649658092772592
0.80699105812966387
0.58794473579213113
0.50307695211874537
.named_I
structure
2
factor
structure
3
integer
integer
16
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
character
character
4
all
cryp
herb
wood
character
character
1
factor
character
character
3
.Data
.Label
class
character
character
14
value
value
value
value
value
value
value
value
value
value
value
value
value
value
character
character
14
CN
WR
SM
BM
SC2
SC1
yr
CNse
WRse
SMse
BMse
SC2se
SC1se
grp
character
character
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
list
list
14
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-7
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
3
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
5
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Right
TXPROP_ObjectNote<-
TXPROP_ColFloatFormat<-Decimal
TXPROP_ColPrecision<-2
character
character
3
TXPROP_ColWidth<-6
TXPROP_ColJustification<-Left
TXPROP_ObjectNote<-
character
character
1
data.frame
character
character
5
.Data
.Names
row.names
.guiColInfo
class
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Overstreet, Jason (FL51 | 1 Mar 2006 21:14
Picon
Favicon

Re: [S] Date column

I would like to take this moment to thank my new buddy Sundar the Great
for his valuable help!  Thank you Sundar!

Jason Overstreet
Statistician
Honeywell DSES
(727) 539-3964 
Doing what it takes to be our Customers' 1st Choice.

Overstreet, Jason (FL51) wrote:
> I am trying to use the date given in a column listed in a data frame
to 
> create a factor in another column.  In this case, I have the
following:
> 
> Date1
> 
> 05/01/1990
> 
> 06/02/1995
> 
> 07/03/2001
> 
> 08/03/2003
> 
> 
> I would like to create a corresponding column that tests the Date1 
> column and creates a variable in another column that states either 
> "Twentieth Century" or "Twenty-First Century" based on the value in
Date1.
> 
> Desired output:
> 
>  Date1                  Period
> 
> 05/01/1990              Twentieth Century
> 
> 06/02/1995              Twentieth Century
> 
> 07/03/2001              Twenty-First Century
> 
> 08/03/2003              Twenty-First Century
> 
> 
> Thanks in advance...

Hi, Jason,

How about:

td <- timeDate(c("05/01/1990", "06/02/1995",
                  "07/03/2001", "08/03/2003"))
period <- ifelse(td < timeDate("01/01/2001"),
                  "Twentieth Century",
                  "Twenty-First Century")
data.frame(Date1 = td, Period = period)

--sundar
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Thompson, David (MNR | 3 Mar 2006 15:39
Picon

[S] using xYplot (was: trellis line plots with error bars)

Thanks to Matt Austin, Madeline Bauer and Henrik Aalborg Nielsen.
All suggested the Hmisc library xYplot function.

I am having increasing levels of success using xYplot and have
successfully resolved my previous post.
(Quite a daunting set of options available there.)

A new question:
Is there a way to generate different y-axis ranges for each panel?

I *know* this defies conventional wisdom but, this is what I am being
requested to produce.

Thanks again, DaveT.
**********************************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
Sault Ste. Marie, Ontario, Canada
david.thompson <at> mnr.gov.on.ca
**********************************************************
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Lisa Solomon | 3 Mar 2006 15:48
Favicon

[S] Schedule and Conference Highlights for San Diego Data Mining Conference, March 30-31, San Diego, CA

DATA MINING 2006 CONFERENCE sponsored by Salford Systems

San Diego, California: March 30 - March 31, 2006 (with a Welcome Reception on March 29th at 6PM)

Two Full Days of Real-World Case Study Presentations
Schedule: www.salforddatamining.com/docs/schedule06.pdf

Conference Highlights:

  • State-of-the-Art Research from Leading Academic Institutions: Keynote Presentations by Adele Cutler, co-creator of Random Forests, and
    Richard Olshen, Chief, Division of Biostatistics, Stanford Medical School and CART co-creator.
  • Networking Opportunities with Business and Technology Peers: Attendees at Prior Conferences included The International Monetary Fund, Barnes and Noble, Pfizer, Union Bank, Wells Fargo, Ciphergen, Stanford Linear Accelerator, Johns Hopkins Medical School, UC Berkeley, Cold Spring Harbor Laboratory, Novartis, Columbia University School of Public Health, Harvard Medical School, HSBC, International Steel Group (Bethlehem Steel), Cap Gemini, AT&T Labs-Research, and PricewaterhouseCoopers.
  • Pre-Conference Training (March 27-29): In-depth courses available for attendees who are new to data mining. Advanced Data Miners: Sharpen your expertise! Expect to pick up tips and advice you will never find in books or articles; bring your own data if you can.
  • Conference Sessions Include Mini-Tutorials:
    • Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds
    • Overcoming Obstacles of Publishing in the Best Journals: Explaining the Value of Data Mining to a Resistant Audience
    • Competition Wins: Novel and Creative Use of Data Mining to Win Three Top Data Mining Competitions
  • Real-World Case Study Presentations by Daily Practitioners of Business, Environmental and Bioinformatic Applications.
For full schedule, see:
www.salforddatamining.com/docs/schedule06.pdf

We invite you to join us from 6:00 PM to 7:00 PM on Wednesday, March 29th for a Welcome Reception preceding the Opening Session

Registration: www.salforddatamining.com/docs/regform06.pdf

To be placed on the conference mailing list, please click here: www.salforddatamining.com/2006InfoRequest.php


Amy Baldwin
abaldwin <at> salford-systems.com
(619) 543-8880
http://www.salforddatamining.com

Thompson, David (MNR | 3 Mar 2006 16:09
Picon

[S] Specifying model in article

Hello,

This is not an SPlus specific question but, I value the opinions of 
the subscribers here and hope I do not offend by getting off topic.

Say, I have an experimental design with: response Y, treatment T and
blocks B.
I have thus far been writing this model in the article text as:
	Y_ij = mu + T_i + B_j + epsilon_ij, i = 1:6, j = 1:3,

where the TB interaction is implicit in the epsilon error term.

However, the senior author insists on a form such as:
	Y_ij = mu + T_i + B_j + TB_ij + epsilon, i = 1:6, j = 1:3

Which form is correct, or *more* correct if you will?
I fear that explicitly stating the TB interaction is duplicating this
term.

The other opinion is that the epsilon (no subscripts) is the residual
error
not captured by the design and the TB_ij interaction term is the
experimental
error *manipulated* by the design.

Extending this to a split-plot design, does one declare two different
epsilons
(with different subscripts) in the first form?
Or one epsilon (no subscripts) in the second form?

Many thanks in advance, DaveT.
**********************************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
Sault Ste. Marie, Ontario, Canada
david.thompson <at> mnr.gov.on.ca
**********************************************************
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Austin, Matt | 3 Mar 2006 16:37

Re: [S] using xYplot (was: trellis line plots with error bars)

The trellis functions accept a 'scale' argument, so xYplot should also.

In the scale argument, you want to create a list for the y-components of the
scale and you want to set the relationship between the y-axes to be 'free'
between the different panels.

xYplot(y~x|z,  
       groups=k,
       scale=list(y=list(relation='free'))

--Matt

Matt Austin
Statistician
Amgen, Inc

-----Original Message-----
From: s-news-owner <at> lists.biostat.wustl.edu
[mailto:s-news-owner <at> lists.biostat.wustl.edu]On Behalf Of Thompson,
David (MNR)
Sent: Friday, March 03, 2006 6:40 AM
To: s-news
Subject: [S] using xYplot (was: trellis line plots with error bars)

Thanks to Matt Austin, Madeline Bauer and Henrik Aalborg Nielsen.
All suggested the Hmisc library xYplot function.

I am having increasing levels of success using xYplot and have
successfully resolved my previous post.
(Quite a daunting set of options available there.)

A new question:
Is there a way to generate different y-axis ranges for each panel?

I *know* this defies conventional wisdom but, this is what I am being
requested to produce.

Thanks again, DaveT.
**********************************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
Sault Ste. Marie, Ontario, Canada
david.thompson <at> mnr.gov.on.ca
**********************************************************
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Henrik Parn | 3 Mar 2006 19:20
Picon
Picon

[S] multiplication problem - what happens at 46341?

Dear all,

Why can't calculate 46341*46341 using '*'? I must have missed something 
very basic...

 > 46340.99999*46340.99999
[1] 2147488280
 > 46341*46341
[1] NA
 > 4.6341E4
[1] 46341
 > 4.6341E4*4.6341E4
[1] 2147488281
 > 46341^2
[1] 2147488281

My function of for unbiased skew (Sokal & Rohlf 3ed p 115)  involves a 
multiplication, and when my samples are 'big' the answer is just 'NA' 
due to the problem described above:

skew.sokal <- function(x){
    m3 <- length(x) * sum((x-mean(x))^3)/((length(x)-1)*(length(x)-2))
    s3 <- sqrt(var(x))^3
    m3/s3}

 > x <-  rnorm(46342)
 > skew.sokal(x)
[1] 0.0101318
 > x <-  rnorm(46342)
 > skew.sokal(x)
[1] -0.00613662
 > x <-  rnorm(46343)
 > skew.sokal(x)
[1] NA

Thanks in advance for any suggestion! And sorry if I have overseen 
something basic...
S-plus 6.2 Build 6713
WinXP

--

-- 
************************
Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)
************************

--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Sundar Dorai-Raj | 3 Mar 2006 21:03

Re: [S] multiplication problem - what happens at 46341?


Henrik Parn wrote:
> Dear all,
> 
> Why can't calculate 46341*46341 using '*'? I must have missed something 
> very basic...
> 
>  > 46340.99999*46340.99999
> [1] 2147488280
>  > 46341*46341
> [1] NA
>  > 4.6341E4
> [1] 46341
>  > 4.6341E4*4.6341E4
> [1] 2147488281
>  > 46341^2
> [1] 2147488281
> 
> My function of for unbiased skew (Sokal & Rohlf 3ed p 115)  involves a 
> multiplication, and when my samples are 'big' the answer is just 'NA' 
> due to the problem described above:
> 
> skew.sokal <- function(x){
>    m3 <- length(x) * sum((x-mean(x))^3)/((length(x)-1)*(length(x)-2))
>    s3 <- sqrt(var(x))^3
>    m3/s3}
> 
>  > x <-  rnorm(46342)
>  > skew.sokal(x)
> [1] 0.0101318
>  > x <-  rnorm(46342)
>  > skew.sokal(x)
> [1] -0.00613662
>  > x <-  rnorm(46343)
>  > skew.sokal(x)
> [1] NA
> 
> Thanks in advance for any suggestion! And sorry if I have overseen 
> something basic...
> S-plus 6.2 Build 6713
> WinXP
> 

Hi, Henrik,

You've overlooked how big an integer S-PLUS can store as an integer. 
That's why:

46342.*46342

will work, but

46342*46342

won't. This is becuase the first forces the storage to double.

As for your function, length(x) returns a integer. So, try:

skew.sokal <- function(x) {
   n <- as.numeric(length(x))
   m3 <- n * sum((x - mean(x))^3)/((n - 1) * (n - 2))
   s3 <- sqrt(var(x))^3
   m3/s3
}

set.seed(42)
x <- rnorm(46343)
skew.sokal(x)
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


Gmane