David L Lorenz | 3 Jan 15:52 2011
Picon

Re: working with groups in a data frame


Winnie,
  I've used the by() function and the output by object to manage and analyze data like this.
  I do not know if you know that you need to do exactly the same thing on each group--in which case I'd suggest doing that with by, or if the analysis might change from case to case, in which case I'd organize the data using by() and then process each group based on its own particular characteristics.
Dave


From: Crawford.Winnie <crawford.winnie <at> ensco.com>
To: S-PLUS Newsgroup <s-news <at> lists.biostat.wustl.edu>
Date: 12/30/2010 02:24 PM
Subject: [S] working with groups in a data frame
Sent by: s-news-owner <at> lists.biostat.wustl.edu




I’m using S-PLUS 8.0 for Windows in the Windows 7 operating system.
 
I have a data frame with 44,640 records. I need to operate on these data in groups of 6 (7,440 groups). Each 6-record group is defined by Year, Day, Hour, and Min, although that may not be important. I just need to work with the data in certain columns in groups of 6 beginning with the first record. If anyone can tell me a good function or two to check out, I’d appreciate it.
 
Happy New Year to all.
 
*****************************************************************
Winifred C. Crawford  Staff Scientist/Senior Meteorologist
ENSCO, Inc.
Aerospace Sciences and Engineering Division
1980 N. Atlantic Ave., Suite 830
Cocoa Beach, FL  32931
VOICE:  321.853.8130  FAX:  321.853.8415
EMAIL:  crawford.winnie <at> ensco.com
 
AMU Quarterly Reports are available online:
http://science.ksc.nasa.gov/amu
****************************************************************
 


The information contained in this email message is intended only for the use of the individual(s) to whom it is addressed and may contain information that is privileged and sensitive. If you are not the intended recipient, or otherwise have received this communication in error, please notify the sender immediately by email at the above referenced address and note that any further dissemination, distribution or copying of this communication is strictly prohibited.

The U.S. Export Control Laws regulate the export and re-export of technology originating in the United States. This includes the electronic transmission of information and software to foreign countries and to certain foreign nationals. Recipient agrees to abide by these laws and their regulations -- including the U.S. Department of Commerce Export Administration Regulations and the U.S. Department of State International Traffic in Arms Regulations -- and not to transfer, by electronic transmission or otherwise, any content derived from this email to either a foreign national or a foreign destination in violation of such laws.

gerald.jean | 25 Jan 22:49 2011
Picon

Merging mystery?


Hello,

I have two data frames I'd like to merge based on five variables.  The
combinations of those five variables are unique in both data frames.

sum(duplicated(TxSoum.traite.SNR.fac[, c("no.ind", "auto.police",
"transit",
                               "numero", "Rejoint")]))
[1] 0
sum(duplicated(TxSoum.traite.fac.geo[, c("no.ind", "auto.police",
"transit",
                               "numero", "Rejoint")]))
[1] 0

"Most" of the combinations of the merging variables of the smaller data
frame are elements of the larger data frame.

sum(!(TxSoum.traite.SNR.fac[, c("no.ind", "auto.police", "transit",
"numero", "Rejoint")] %in%
    TxSoum.traite.fac.geo[, c("no.ind", "auto.police", "transit", "numero",
"Rejoint")]))

[1] 1058  ## Hence, we should loose only 1058 observations from the smaller
data frame.

TxSoum.traite.SNR.fac.geo <- merge(x = TxSoum.traite.fac.geo,
                                   y = TxSoum.traite.SNR.fac,
                                   by = c("no.ind", "auto.police",
"transit",
                                   "numero", "Rejoint"))
dim(TxSoum.traite.fac.geo)
[1] 314885    208             ## The larger data frame, except for some
variables I don't care about it.
dim(TxSoum.traite.SNR.fac)
[1] 160773    310             ## That's the one I'm interested in plus, of
courses the variables of the other.
dim(TxSoum.traite.SNR.fac.geo)
[1] 130979    513  ## We loose nearly 30K records???

What am I missing?

Thanks,

Gérald Jean
Conseiller senior en statistiques,
VP Actuariat et Solutions d'assurances,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean <at> dgag.ca

"We believe in God, others must bring Data."

W. Edwards Deming

--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


Gmane