Suranga Kasthurirathne | 30 Oct 17:26 2014
Picon

Setting a class / outcome variable for Weka Principal Components Analysis

Hi everyone,

I've relatively new to R, and i'm trying to use it to perform a Principal
Components analysis (PCA)
I've done this using WEKA previously, and now i'm trying to do so using R's
prcomp and princomp (both options would work for me).

One problem i've found is that while WEKA PCA allows us to specify a class
/ outcome variable / column for the dataset, apparently R project (both
prcomp and princomp) don't.

I've read through a number of documents including this
<http://cran.r-project.org/web/packages/HSAUR/vignettes/Ch_principal_components_analysis.pdf>
with limited success, so wanted to raise this question here. How does one
set the class variable when performing a PCA ?
Any advice would be  greatly appreciated !

--

-- 
Best Regards,
Suranga

	[[alternative HTML version deleted]]

John Wasige | 30 Oct 17:06 2014
Picon

PCA on stacked raster (multiple bands/ layers) in R

Hello community, I need help on how I can perform PCA on stacked raster
(multiple bands/ layers) in R. Does any body have an idea or script? Thanks
John

	[[alternative HTML version deleted]]

______________________________________________
R-help <at> r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Kuma Raj | 30 Oct 17:03 2014
Picon

How can I merge data with differing length?

How can I merge  data frame df and "tem" shown below by filling the
head of "tem"  with missing values?

a<- rnorm(1825, 20)
b<- rnorm(1825, 30)
date<-seq(as.Date("2000/1/1"), by = "day", length.out = 1825)

df<-data.frame(date,a,b)

tem<- rpois(1095,  lambda=21)

Thanks

Ryszard Czermiński | 30 Oct 14:10 2014
Picon

RForcecom VERY SLOW on large data sets

I started using RForcecom package to extract data from SalesForce.
It works very well for data sets with ~< 100K records, however
for a data set with ~400K records it becomes VERY SLOW
and export takes ~14h.

For comparison exporting the same table as csv using "report export" in
SalesForce
takes about 1 minute.

Any ideas why? and how to make it faster?

Best regards,
Ryszard

	[[alternative HTML version deleted]]

Thomas Nyberg | 30 Oct 16:17 2014
Picon

How to speed up list access in R?

Hello,

I want to do the following: Given a set of (number, value) pairs, I want 
to create a list l so that l[[toString(number)]] returns the vector of 
values associated to that number. It is hundreds of times slower than 
the equivalent that I would write in python. I'm pretty new to R so I 
bet I'm using its data structures inefficiently, but I've tried more or 
less everything I can think of and can't really speed it up. I have done 
some profiling which helped me find problem areas, but I couldn't speed 
things up even with that information. I'm thinking I'm just 
fundamentally using R in a silly way.

I've included code for the different versions. I wrote the python code 
in a style to make it as clear to R programmers as possible. Thanks a 
lot! Any help would be greatly appreciated!

Cheers,
Thomas

R code (with two versions depending on commenting):

-----

numbers <- numeric(0)
for (i in 1:5) {
     numbers <- c(numbers, sample(1:30000, 10000))
}

values <- numeric(0)
for (i in 1:length(numbers)) {
(Continue reading)

Patrick Connolly | 30 Oct 09:50 2014
Picon

Oddity using multcompView package

The multcompView has some useful features but I'm sure this isn't
intentional.  Excuse the size.  This is about the smallest
reproducible example I can do:

require("multcompView")

> mm <- structure(list(TempNom = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L), .Label = c("15", "18", "21", "22", "24", "27"), class = "factor"), 
    Days = c(14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
(Continue reading)

Alemu Tadesse | 29 Oct 20:20 2014
Picon

reading data from a web

Dear All,

I have data of the format shown in the link
http://www.data.jma.go.jp/gmd/env/data/radiation/data/geppo/201004/DR201004_sap.txt
 that I need to read. I have downloaded all the data from the link and I
have it on my computer. I used the following script (got it from web) and
was able to read the data. But, it is not in the format that I wanted it to
be. I want it a data frame and clean numbers.
asNumeric <- function(x) as.numeric(as.character(x))
factorsNumeric <- function(data) modifyList(data, lapply(data[,
sapply(data, is.logical)],asNumeric))

data=read.fwf(filename, widths=c(c1),skip=18, header=FALSE)
data$V2<-as.numeric(gsub(" ","", as.character(data$V2) , fixed=TRUE))
f <- factorsNumeric(data)

Any help is appreciated.

Best,

Alemu

	[[alternative HTML version deleted]]

Pedro Segurado | 29 Oct 18:20 2014
Picon

problem in loop using windows executable

Dear all,

I am trying to develop a R script that basically uses a loop that includes 5
main steps: (1) it runs a windows executable file outside R that requires a
set of *.txt files using the shell function (Note: I have tried system and
system(shhQuote()) and the problem remains), (2) it imports the output txt
file of the executable, (3) it deletes the existing input txt files from the
windows folder, (4) it uses values of the imported output file to help
producing new tables to be used as input files to the executable and (5) it
exports those tables in txt format to the windows folder.

Now the problem. I am running this script in a laptop. I have change the
energy saving settings to the highest performance possible, it no energy
saving options at all. If I move the mouse frequently there is no problem,
the whole process is not interrupted. However, if I do not move the mouse,
the process stops (not always in the same loop and the same file) with the
following error:

Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'nodes_29.txt': Permission denied

The file "nodes_29.txt" is one of the executable input files that is
produced in the loop, which can have the same name as previous txt files
that were meanwhile deleted in previous loops (again, please note that the
error is not always in the same loop number and the same file). 

Any idea?
(Continue reading)

CJ Davies | 29 Oct 18:12 2014
Picon

Variance of multiple non-contiguous time periods?

I am trying to show that the red line ('yaw') in the upper of the two 
plots here;

http://i.imgur.com/N4Xxb4f.png

varies more within the pink sections ('transition 1') than in the light 
blue sections ('real').

I tried to use var.test() however this runs into a problem because 
although the red line doesn't vary much *within* any particular light 
blue section, it does vary a lot *between* light blue sections.

For example, in the light blue section around t=90 the red line doesn't 
move much & likewise in the light blue section around t=160 the red line 
doesn't move much. But between these two sections the red line has moved 
substantially.

So if I simply subset the data according to pink/light blue & then put 
those resultant subsets into var.test(), the answer does not show the 
relationship that I want it to.

Can anybody shed some light on a sensible method of solving this?

Regards,
CJ Davies

Jan Vanvinkenroye | 29 Oct 17:56 2014
Picon

Fwd: Combining stacked bar charts for logfile analysis


Anfang der weitergeleiteten Nachricht:

Von: Jan Vanvinkenroye <jan.vanvinkenroye <at> tik.uni-stuttgart.de>
Datum: 29. Oktober 2014 17:52:06 MEZ
Betreff: Combining stacked bar charts for logfile analysis
An: r-help <at> r-project.org

Hello Everyone,

in order to assess webserver response time i would like to combine some information from
a apache logfile. [1] This is my first project using R and I would be very gratefull if someone
could help me or point me in the right direction :): 

So far I managed to read the file to a dataframe, factorize the response time (duration_microseconds) to 
three discrete classes ("gut", "ok", "schlecht") <=50000,<=200000ms,<20000ms. 

barplot(table(access_log$bewertung), beside = FALSE, width = 1, xlab="Response Time",
ylab="Percentage", col=c("green", "yellow", "red"))

gives me an aggregated percentage of response time of every request in the logfile and

qplot(time, duration_microseconds, data=access_log, shape=bewertung)

returns plot of the response times over time. 

How can i combine both plots with a stacked bar chart/hour/day? The given example only contains only 20 lines
the original log file serveral thousand. A plot of this information led to a somehow crowded (=mostly
black) 
plot. 
(Continue reading)

Thomas Nyberg | 29 Oct 16:23 2014
Picon

Using readLines on a file without resetting internal file offset parameter?

Hi everyone,

I would like to read a file line by line, but I would rather not load 
all lines into memory first. I've tried using readLines with n = 1, but 
that seems to reset the internal file descriptor's file offset after 
each call. I.e. this is the current behavior:

-------

bash $ echo 1 > testfile
bash $ echo 2 >> testfile
bash $ cat testfile
1
2

bash > R
R > f <- file('testfile')
R > readLines(f, n = 1)
[1] "1"
R > readLines(f, n = 1)
[1] "1"

-------

I would like the behavior to be:

-------

bash > R
R > f <- file('testfile')
(Continue reading)


Gmane