Stephen Weston | 3 Apr 02:14 2011
Picon

Re: .combine in nested foreach loops !

I'm not sure exactly what you're trying to do, but I'm guessing that each
of the 3 lists that you speak of is supposed to contain two 3x3xn arrays,
where "two" is the number of lists of matrices that you're processing
("a" and "b"), and "n" is the length of "a" and "b".

If that's the case, then you can fix your code with a little bit of
post-processing to the results of the inner-foreach loop.  The inner-
foreach loop is returning a list containing two lists of n matrices.
You need to convert that into a list containing two 3x3xn arrays.
A simple way to do that is with a ".final" function.  The input to
the .final function is the final result from the .combine function.
The output from the .final function is returned by foreach.  That is
particularly useful with nested foreach loops.

So here's what I came up with, using the abind function from
the abind package to create the 3x3xn arrays:

library(abind)
library(foreach)
library(doMC)
registerDoMC()

cvec <- c(1, 2, 3)

n <- 2
s <- seq(length=n)
a <- lapply(s, function(i) matrix(rnorm(9), 3))
b <- lapply(s, function(i) matrix(rnorm(9), 3))

init <- list(a=list(), b=list())
(Continue reading)

Brian Smith | 6 Apr 19:32 2011
Picon

Rsge

Hi,

I was trying some of the tests with Rsge. I get an error, and I'm not sure I
understand why. It works ok for a single node, but gives errors when
multiple nodes are involved. Any help/comments would be appreciated!

############ code block ###########
###### r hpc sig
library(Rsge)
library(nlme)

ncols <- 500
nrows <- 50
response <- runif(ncols,1,1000)
d <- matrix(data = runif(n=nrows*ncols,1,1000),
        ncol=ncols,
        nrow=nrows,
        dimnames=list(paste(sep="","R",1:nrows),paste(sep="","C",1:ncols)))

f2x <- as.factor(sample(1:300,500,replace=T))

f1lme <- function(x,y,f2x){
    lmex <- lme(y~x, random = ~1|factor(f2x))
    return(lmex)
}

### Single node and sge
re    <- apply(d,1,f1lme,y=response,f2x)
rep <- sge.parRapply(d, f1lme,y=response,f2x, njobs = 3, join.method=c)

(Continue reading)

Brian Smith | 6 Apr 20:46 2011
Picon

Re: Rsge

However, if the package is loaded within the function call, then everything
seems to be ok. ...?? Where am I going wrong?

###### code #########

library(Rsge)
library(nlme)

ncols <- 500
nrows <- 50
response <- runif(ncols,1,1000)
d <- matrix(data = runif(n=nrows*ncols,1,1000),
        ncol=ncols,
        nrow=nrows,
        dimnames=list(paste(sep="","R",1:nrows),paste(sep="","C",1:ncols)))

f2x <- sample(1:300,500,replace=T)

f1lme <- function(x,y,f2x){

### LOAD LIBRARY INSIDE FUNCTION ?!?
    library(nlme)
    lmex <- lme(y~x, random = ~1|factor(f2x))
    return(lmex)
}

### Single node and sge
re  <- apply(d,1,f1lme,y=response,f2x)
rep <- sge.parRapply(d, f1lme,y=response,f2x, njobs = 3, join.method=c)

(Continue reading)

Brian G. Peterson | 6 Apr 21:16 2011

Re: Rsge

On 04/06/2011 01:46 PM, Brian Smith wrote:
> However, if the package is loaded within the function call, then everything
> seems to be ok. ...?? Where am I going wrong?

did you try using the 'packages' argument to the sge.parRapply function?
--

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
Brian Smith | 7 Apr 13:56 2011
Picon

Re: Rsge - specify queue?

Thanks Brian. That worked.

Is there a way to specify the queue that the job will be submitted to? (e.g.
qsub -q long.q xx)

On Wed, Apr 6, 2011 at 3:16 PM, Brian G. Peterson <brian@...>wrote:

> On 04/06/2011 01:46 PM, Brian Smith wrote:
>
>> However, if the package is loaded within the function call, then
>> everything
>> seems to be ok. ...?? Where am I going wrong?
>>
>
> did you try using the 'packages' argument to the sge.parRapply function?
> --
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc@...
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>

	[[alternative HTML version deleted]]
bart | 7 Apr 14:30 2011
Picon

Re: Rsge - specify queue?

Hi you could look in sge.options there is and option called 
sge.qsub.options i think using that you can select a queue

Bart

On 04/07/2011 01:56 PM, Brian Smith wrote:
> Thanks Brian. That worked.
>
> Is there a way to specify the queue that the job will be submitted to? (e.g.
> qsub -q long.q xx)
>
>
>
> On Wed, Apr 6, 2011 at 3:16 PM, Brian G. Peterson<brian@...>wrote:
>
>> On 04/06/2011 01:46 PM, Brian Smith wrote:
>>
>>> However, if the package is loaded within the function call, then
>>> everything
>>> seems to be ok. ...?? Where am I going wrong?
>>>
>>
>> did you try using the 'packages' argument to the sge.parRapply function?
>> --
>> Brian G. Peterson
>> http://braverock.com/brian/
>> Ph: 773-459-4973
>> IM: bgpbraverock
>>
>> _______________________________________________
(Continue reading)

Paul Johnson | 19 Apr 18:21 2011
Picon

creating many separate streams (more streams than nodes)

Hi, everybody:

I sat down to write a quick question, and it is turning into a very
complicated mess.  I will give you the blunt question. And then I'll
leave all those other words I had written so that future Googlers
might learn from my experience. Or  you might understand me better.

I want to spawn 8000 random generator streams and save their initial
states. Then I want to be able to re-run simulations against those
generators and get the same results, whether or not I'm on just one
machine or on a cluster.

The treatment of random seeds in packages like SNOW or Rmpi is aimed
at initializing the cluster nodes, rather than initializing the
separate streams.  My cluster has 60 nodes, and so the 60 separate
streams from SNOW will not suffice.  So I'm reading code in rlecuyer
and rstream to see how this might be done.  My background in
agent-based simulation gives me some experience with serialization of
PRNG objects, but R makes it very tough to see through the veil
because there are hidden global objects like the base generator and
the interface to it requires considerable care.

Now the "blah blah blah". After hacking on this for a couple of hours,
I think it boils down to 4 points.

1. John Chambers seems to agree this would be a good idea.

I got the idea that my plan should be do-able from John Chambers'
book, Software for Data Analysis, where he discusses replication of
simulations. For replication, he says one shold save the state of the
(Continue reading)

Hana Sevcikova | 19 Apr 18:43 2011

Re: creating many separate streams (more streams than nodes)

Paul,

The functionality that you're describing is implemented in the snowFT 
package. By default, the function
performParallel (and the underlying clusterApplyFT) initializes one 
stream per replicate, and thus provides
reproducible results. The function that does the RNG initialization is 
called clusterSetupRNGstreamRepli.

Regarding your comment about the names of the rlecuyer functions: Yes, 
our original intention was to
keep those functions internal and let snow and snowFT to deal with them, 
but there is no reason why
you couldn't use them directly.

Hana

On 4/19/11 9:21 AM, Paul Johnson wrote:
> Hi, everybody:
>
> I sat down to write a quick question, and it is turning into a very
> complicated mess.  I will give you the blunt question. And then I'll
> leave all those other words I had written so that future Googlers
> might learn from my experience. Or  you might understand me better.
>
> I want to spawn 8000 random generator streams and save their initial
> states. Then I want to be able to re-run simulations against those
> generators and get the same results, whether or not I'm on just one
> machine or on a cluster.
>
(Continue reading)

Paul Johnson | 19 Apr 18:54 2011
Picon

Re: creating many separate streams (more streams than nodes)

Dear Hana

WOW. Thanks very much.  Your code in rlecuyer is very clear and easy
to follow, but now I will read snowFT to see how you do it.

My experience recently has been that when things go wrong in the use
of the higher level packages like SNOW or NWS, it is very tough for me
to find what is wrong.  If I write things directly on top of Rmpi, it
cuts out one or two layers of possible trouble, so finding problems is
easier.

Am I alone in this experience?

pj

On Tue, Apr 19, 2011 at 11:43 AM, Hana Sevcikova <hanas@...> wrote:
> Paul,
>
> The functionality that you're describing is implemented in the snowFT
> package. By default, the function
> performParallel (and the underlying clusterApplyFT) initializes one stream
> per replicate, and thus provides
> reproducible results. The function that does the RNG initialization is
> called clusterSetupRNGstreamRepli.
>
> Regarding your comment about the names of the rlecuyer functions: Yes, our
> original intention was to
> keep those functions internal and let snow and snowFT to deal with them, but
> there is no reason why
> you couldn't use them directly.
(Continue reading)

Ross Boylan | 20 Apr 00:17 2011
Picon

Re: creating many separate streams (more streams than nodes)

On Tue, 2011-04-19 at 11:21 -0500, Paul Johnson wrote:
> Hi, everybody:
> 
> I sat down to write a quick question, and it is turning into a very
> complicated mess.  I will give you the blunt question. And then I'll
> leave all those other words I had written so that future Googlers
> might learn from my experience. Or  you might understand me better.
> 
> I want to spawn 8000 random generator streams and save their initial
> states. Then I want to be able to re-run simulations against those
> generators and get the same results, whether or not I'm on just one
> machine or on a cluster.
> 
> The treatment of random seeds in packages like SNOW or Rmpi is aimed
> at initializing the cluster nodes, rather than initializing the
> separate streams.  My cluster has 60 nodes, and so the 60 separate
> streams from SNOW will not suffice.  So I'm reading code in rlecuyer
> and rstream to see how this might be done.  My background in
> agent-based simulation gives me some experience with serialization of
> PRNG objects, but R makes it very tough to see through the veil
> because there are hidden global objects like the
I went down this path; the trick is to avoid thinking of initializing
the streams on a per machine basis; do it for each chunk/job/stream.  I
used rsrpng, but others said rlecuyer would be the same.  To be careful,
I freed the stream generator at the end of the job.

So, if you have 8,000 jobs you use the job number as an index, and
initialize its stream by saying this is rank (job #) of a total of 8,000
streams.

(Continue reading)


Gmane