mats pistol | 1 Dec 17:59 2007
Picon

Re: Unexpected behavior within a script ; returning "status" to the operating system

I have seen the same type of behaviour with S+ v6 and
v7 in loops; S+ does execute commands which should not
be executed.

Your summary example is very good;

I have had this problem occur in Calculations which
affect the final answer!!  Nowadays I festoon S+ code
with numerous checks to catch this type of problem.

You have raised a critical issue which has, for some
time, damaged my confidence in S+. 
Repeated suggestions to Insightful have been met with
'its difficult'.

--- Dennis Fisher <fisher <at> plessthan.com> wrote:

> Colleagues
> 
> I have encountered unexpected behavior manifested
> through the quit  
> command (version 8 in both Linux and Windows Vista).
>  Basically, I  
> want to exit a script when certain conditions occur.
>  The file  
> testfile does not exist so the loop is entered. 
> Having executed the q 
> () command, I expected the final cat statement to
> not be executed  
> (which is the case in R).  I have solved the
(Continue reading)

Marc Pelath | 3 Dec 16:50 2007

running out of dynamic memory when using predict.bdGlm

Hi everybody,

 

I’m estimating a binomial GLM on a large dataset (about 2.5M records, 100 variables).  The model itself has about 20 variables, many of which are categorical, so the model itself has (at the moment) just over 100 parameters.  SPLUS seems to estimate the model just fine, although of course it takes a while, and it produces the (sensible-looking) in-sample fits without complaints.  However, when I try to generate out-of-sample predictions using predict (where newdata = OutData has about 500k records), I get the dreaded “unable to obtain requested dynamic memory” error.  Traceback follows:

 

---

15: eval(action, sys.parent())

14: doErrorAction("Problem in bd.internal.exec.node(engine.class = \"com.insightful.miner.BDLManager$BDLSplusScri..: BDLManager$BDLSplusScriptEngineNode (0): Proble m in model.matrix.default(args$terms.object, IM$in1, args$contrasts.arg, args$xlevels): Unable to obtain requested dynamic memory",

13: stop(ret$error)

12: bd.internal.exec.node(engine.class = "com.insightful.miner.BDLManager$BDLSplusScriptEngineNode", node.props = node.props, inputs = in.bdFrame.lst, num.outputs =

 

11: list(

10: NULL

9: bd.block.apply(data, FUN = bd.internal.model.matrix.script, test = F, one.block = F, sample = F)

8: bd.internal.model.matrix(terms(pform), mf, contrasts = object$contrasts, xlevels = object$xlevels)

7: predict.bdGlm(sub.glm, OutData, type = "response")

6: predict(sub.glm, OutData, type = "response")

5: eval(i, local)

4: source(auto.print = auto.print, exprs = substitute(exprs.literal))

3: script.run(exprs.literal = {

2: eval(expression(script.run(exprs.literal = {

1:

Message: Problem in bd.internal.exec.node(engine.class = "com.insightful.miner.BDLManager$BDLSplusScri..: BDLManager$BDLSplusScriptEngineNode (0): Problem in model.

matrix.default(args$terms.object, IM$in1, args$contrasts.arg, args$xlevels): Unable to obtain requested dynamic memory

---

 

I’m at a loss to explain this, since it is using predict.bdGlm, and my understanding is that this is exactly the limitation that the bigdata library is supposed to address.  Clearly it’s able to produce such results on a larger data set (namely, the sample used to estimate the model), so why would it choke on a smaller data set?

 

I’m running SPLUS 8.0 under Windows XP.  My RAM is 2G, and page file is about 3G, although since it’s supposed to be using bigdata routines, I’m not sure how this matters.  I also have about 100G free disk space.

 

I’m going to try chopping down the number of variables in the dataset to see if that helps, but I feel like I shouldn’t have to.  Any ideas?  I’m hoping somebody has run into this problem before – it doesn’t seem like an unusual situation.  I’ve searched the archives but couldn’t find any guidance.

 

Thanks in advance, and hope I can return the favor someday,

Marc Pelath

 

Marc Pelath | 3 Dec 17:14 2007

Re: running out of dynamic memory when using predict.bdGlm

FYI, this was resolved by closing and reopening SPLUS, apparently to free up some memory used in the
original estimation.  Thanks to Wayne Thogmartin at the USGS.

From: s-news-owner <at> lists.biostat.wustl.edu [mailto:s-news-owner <at> lists.biostat.wustl.edu] On
Behalf Of Marc Pelath
Sent: Monday, December 03, 2007 9:50 AM
To: 's-news <at> lists.biostat.wustl.edu'
Subject: [S] running out of dynamic memory when using predict.bdGlm

Hi everybody,

I'm estimating a binomial GLM on a large dataset (about 2.5M records, 100 variables).  The model itself has
about 20 variables, many of which are categorical, so the model itself has (at the moment) just over 100
parameters.  SPLUS seems to estimate the model just fine, although of course it takes a while, and it
produces the (sensible-looking) in-sample fits without complaints.  However, when I try to generate
out-of-sample predictions using predict (where newdata = OutData has about 500k records), I get the
dreaded "unable to obtain requested dynamic memory" error.  Traceback follows:

---
15: eval(action, sys.parent())
14: doErrorAction("Problem in bd.internal.exec.node(engine.class =
\"com.insightful.miner.BDLManager$BDLSplusScri..: BDLManager$BDLSplusScriptEngineNode (0):
Proble m in model.matrix.default(args$terms.object, IM$in1, args$contrasts.arg, args$xlevels):
Unable to obtain requested dynamic memory",
13: stop(ret$error)
12: bd.internal.exec.node(engine.class =
"com.insightful.miner.BDLManager$BDLSplusScriptEngineNode", node.props = node.props, inputs =
in.bdFrame.lst, num.outputs =

11: list(
10: NULL
9: bd.block.apply(data, FUN = bd.internal.model.matrix.script, test = F, one.block = F, sample = F)
8: bd.internal.model.matrix(terms(pform), mf, contrasts = object$contrasts, xlevels = object$xlevels)
7: predict.bdGlm(sub.glm, OutData, type = "response")
6: predict(sub.glm, OutData, type = "response")
5: eval(i, local)
4: source(auto.print = auto.print, exprs = substitute(exprs.literal))
3: script.run(exprs.literal = {
2: eval(expression(script.run(exprs.literal = {
1:
Message: Problem in bd.internal.exec.node(engine.class =
"com.insightful.miner.BDLManager$BDLSplusScri..: BDLManager$BDLSplusScriptEngineNode (0):
Problem in model.
matrix.default(args$terms.object, IM$in1, args$contrasts.arg, args$xlevels): Unable to obtain
requested dynamic memory
---

I'm at a loss to explain this, since it is using predict.bdGlm, and my understanding is that this is exactly
the limitation that the bigdata library is supposed to address.  Clearly it's able to produce such results
on a larger data set (namely, the sample used to estimate the model), so why would it choke on a smaller data set?

I'm running SPLUS 8.0 under Windows XP.  My RAM is 2G, and page file is about 3G, although since it's supposed
to be using bigdata routines, I'm not sure how this matters.  I also have about 100G free disk space.

I'm going to try chopping down the number of variables in the dataset to see if that helps, but I feel like I
shouldn't have to.  Any ideas?  I'm hoping somebody has run into this problem before - it doesn't seem like an
unusual situation.  I've searched the archives but couldn't find any guidance.

Thanks in advance, and hope I can return the favor someday,
Marc Pelath

--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Tony Plate | 3 Dec 19:06 2007

Re: Unexpected behavior within a script ; returning "status" to the operating system

q() does not seem to take effect until control passes back to the top level 
prompt.

I use this kind of pattern:

if (!file.exists("testfile"))
         {
         cat("This could should be executed\n")
         q(n=123)
         stop("Need to stop now!")
         cat("This command should never be executed (and in fact won't be)\n")
         }

Then, after returning to the top level, sqpe will stop, and the return 
status will 123.

For return status under Windows, in the DOS command prompt, use 
%ERRORLEVEL%.  I use cygwin bash, where I can access the return status of 
sqpe via $?, or something like ${PIPESTATUS[1]} if sqpe was in a piped command.

-- Tony Plate

mats pistol wrote:
> I have seen the same type of behaviour with S+ v6 and
> v7 in loops; S+ does execute commands which should not
> be executed.
> 
> Your summary example is very good;
> 
> I have had this problem occur in Calculations which
> affect the final answer!!  Nowadays I festoon S+ code
> with numerous checks to catch this type of problem.
> 
> You have raised a critical issue which has, for some
> time, damaged my confidence in S+. 
> Repeated suggestions to Insightful have been met with
> 'its difficult'.
> 
> 
> --- Dennis Fisher <fisher <at> plessthan.com> wrote:
> 
>> Colleagues
>>
>> I have encountered unexpected behavior manifested
>> through the quit  
>> command (version 8 in both Linux and Windows Vista).
>>  Basically, I  
>> want to exit a script when certain conditions occur.
>>  The file  
>> testfile does not exist so the loop is entered. 
>> Having executed the q 
>> () command, I expected the final cat statement to
>> not be executed  
>> (which is the case in R).  I have solved the
>> particular situation in  
>> which this occurred.  However, I am interested in
>> understanding why  
>> Splus behaves this way.  It appears to be that the
>> problem resides in  
>> how the contents of the loop are executed: if the
>> three commands  
>> presently in the loop appear outside of the loop,
>> the quit commands  
>> is executed at the time that it is called.  So, it
>> appears that Splus  
>> executes all the contents of the loop before the
>> quit is implemented  
>> whereas R executes the commands in sequence.
>>
>> if (!file.exists("testfile"))
>>          {
>>          cat("This could should be executed\n")
>>          q(n=1)
>>          cat("This command should never be
>> executed\n")
>>          }
>>
>> Also, n=1 in the q() command returns "status".  In
>> Linux, this can be  
>> recovered at the command line with echo $status. 
>> However, I can't  
>> get this to work in Windows (i.e., %status% does not
>> contain the  
>> value) and I have been led to believe that this
>> command is not  
>> implemented in Windows.  Has anyone been successful
>> with this?
>>
>> Dennis
>>
>>
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone: 1-866-PLessThan (1-866-753-7784)
>> Fax: 1-415-564-2220
>> www.PLessThan.com
>>
>>
> 
> 
> 
>       __________________________________________________________
> Sent from Yahoo! - the World's favourite mail http://uk.mail.yahoo.com
> 
> --------------------------------------------------------------------
> This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
> 

--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Hunsicker, Lawrence | 3 Dec 23:21 2007
Picon

Logistic regression with random effects using glmmPQL

Good evening, all:

 

Back again.  I have now found, set up, and run glmmPQL.  I need some help in reading the output.  

 

I am trying to find whether there is meaningful residual center-to-center variability among dialysis centers in the odds that a patient will have a type of diagnostic test, corrected for the patients’ baseline characteristics.  The focus here is on whether there is real center-to-center variability, not on the baseline predictors.  I ran:

 

test.glmm <- glmmPQL(fixed = [two sided formula of fixed effects], random = ~1 | Center, family=binomial(link=logit), data=[data.frame], dispersion = NULL,…)

 

This ran and converged.

 

The output included a lot of stuff that I understand, but also the following:

 

Random effects:

  Formula:  ~1 | Center

                        (Intercept)          (Residual)

StdDev:             0.4090162         0.9714077

 

This is the part that I don’t quite get.  

 

a.  The intercept, I assume, is the StdDev of the random effect.  How do I test whether the StdDev is significantly different from 0?  What is its scale?  (The naïve median of the chance of the test at each center is about 19%, with a range from 0 to 100%.)  

 

b:  What is the Residual?  Is this on the same scale as the Intercept?  I.e., can one say that 0.4090162/(0.4090162+0.9714077) of the variability is “explained” by including the random effect?  Does the fact that the Residual is very close to 1.0 mean that there is essentially no over dispersion once I include the random Center effect?

 

c.  Should I be testing the significance of the random effect by using a likelihood ratio test comparing with the same model above, but setting: 

            random = ~ 1 (without the “| Center”)?  This gives me a huge difference in log likelihood. 

 

d.  Is there somewhere that I can read up to understand this function?

 

Many thanks in advance for any help you can give me.

 

Larry Hunsicker

 

 

 

From: Jimenez-Leal William [mailto:william.jimenezleal <at> lsc.gov.uk]
Sent: Tuesday, April 10, 2007 10:32 AM
To: Hunsicker, Lawrence; s-news <at> lists.biostat.wustl.edu
Subject: RE: [S] Logistic regression with random effects

 

Did you try glmmPQL() ? It actually works by calling the lme.

 

William

 

 

From: s-news-owner <at> lists.biostat.wustl.edu [mailto:s-news-owner <at> lists.biostat.wustl.edu] On Behalf Of Hunsicker, Lawrence
Sent: 10 April 2007 16:15
To: s-news <at> lists.biostat.wustl.edu
Subject: [S] Logistic regression with random effects

 

Good morning, all:

I suppose that this must be the thousandth time someone has asked this question, and I apologize that I don’t know how to look up past questions and answers.  I am trying to study whether there is a significant difference in outcomes among centers providing a kind of medical service.  The outcome is binary.  There are a batch of individually varying covariates of importance, but the real focus is on whether, after correcting for these covariates, there is a meaningful variability among centers in outcome.  I am not interested in the specific centers, but rather in the overall distribution of underlying center effect.  It would seem that the appropriate statistical method is logistic regression with a random center effect.  I can do this with Egret, but I am wondering whether it is possible to do this in S-Plus.  In S-Plus we have lme for linear random effects, and we have glm for estimating logistic regression.  But is there something that combines these two?

As always, thanks in advance to anyone that can help me.

Larry Hunsicker

Confidentiality and Disclaimer: This email and its attachments are intended for the addressee
only and may be confidential or the subject of legal privilege.
If this email and its attachments have come to you in error you must take no action based on
them, nor must you copy them, distribute them or show them to anyone.
Please contact the sender to notify them of the error.
This email and any attached files have been scanned for the presence of computer viruses.
However, you are advised that you open any attachments at your own risk.
Please note that electronic mail may be monitored in accordance with the
Telecommunications (Lawful Business Practices)(Interception of Communications) Regulations
2000.
Lisa Solomon | 4 Dec 18:37 2007

Free Online: Data Mining Intro for Beginners, Vendor-Neutral, December 13

ONLINE VENDOR NEUTRAL INTRO TO DATA MINING FOR ABSOLUTE BEGINNERS
(no charge)

A non-technical data mining introduction for beginners
December 13, 2007
US and European Timezones:
To register: http://salford.webex.com

This one-hour webinar is a perfect place to start if you are new to data mining and have little-to-no
background in statistics or machine learning. 

In one hour, we will discuss:

**Data basics: what kind of data is required for data mining and predictive analytics; In what format must
the data be; what steps are necessary to prepare data appropriately 

**What kinds of questions can we answer with data mining

**How data mining models work: the inputs, the outputs, and the nature of the predictive mechanism 

**Evaluation criteria: how predictive models can be assessed and their value measured 

**Specific background knowledge to prepare you to begin a data mining project.

To register: http://salford.webex.com

Contact me if the December 13th date/time is inconvenient and you wish to be put on our webinar notification list.

Sincerely,
Lisa Solomon
lisas <at> salford-systems.com
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Data Analytics Corp. | 5 Dec 14:11 2007
Picon
Picon

Printing GraphSheet to a pdf file

Good morning,

I want to do something very simple.  I created a function to produce 
multiple graphs, say 25.  All 25 are visible as separate pages.  Using 
some previous suggestions for how to move about the pages 
programmatically, I can write another function to select specific pages, 
say page 15.  I now want to print that specific page to a pdf file.  The 
function I tried is:

fn.pdf <- function(){
   pdf.graph(file = "test.pdf")
   guiModify("GraphSheet", Name = "guiGetGSName(), CurrentPage = "Page15")
   guiPrint("GraphSheet")
   dev.off()
}

The file test.pdf is created, but it's empty.  How do I print page 15 to 
the pdf file?  Any suggestions?

Thanks,

Walt Paczkowski
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Tony Plate | 5 Dec 19:28 2007

Re: Printing GraphSheet to a pdf file

pdf.graph() creates a new graphics device instance, and plot() etc commands 
will draw to it.  You didn't plot anything on the newly created device, 
hence the empty file.

You can use export.graph() to export the current page of a graphsheet(). 
Look at its help page.  However, I don't think there's an option to produce 
pdf output.  You could produce eps output, and then use a program like gs 
to convert that to a pdf.

(You could try using dev.copy() to copy the current graph to a new device, 
but I can't get dev.copy() to copy anything but the last page in a 
multi-page graphsheet.)

-- Tony

Data Analytics Corp. wrote:
> Good morning,
> 
> I want to do something very simple.  I created a function to produce 
> multiple graphs, say 25.  All 25 are visible as separate pages.  Using 
> some previous suggestions for how to move about the pages 
> programmatically, I can write another function to select specific pages, 
> say page 15.  I now want to print that specific page to a pdf file.  The 
> function I tried is:
> 
> fn.pdf <- function(){
>   pdf.graph(file = "test.pdf")
>   guiModify("GraphSheet", Name = "guiGetGSName(), CurrentPage = "Page15")
>   guiPrint("GraphSheet")
>   dev.off()
> }
> 
> The file test.pdf is created, but it's empty.  How do I print page 15 to 
> the pdf file?  Any suggestions?
> 
> Thanks,
> 
> Walt Paczkowski
> --------------------------------------------------------------------
> This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
> 

--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Douglas Bates | 6 Dec 19:52 2007
Picon

Re: proportion of states life expectancy > 70

On Nov 27, 2007 10:38 PM, bhavin toprani <b_toprani <at> hotmail.com> wrote:

> Dear all,

>  Could you please suggest an easy command eg, if, for ,etc to find the
> proportion of states having life expectancy greater than 70?

>  Thanks for your help in advance.

>  example of dataframe
>              Life.Exp
>  Alabama   69.05
>  Alaska      69.31
>  Arizona     70.55
>  Arkansas   70.66
>  California  71.71
>  Colorado   72.06

I haven't seen a response to this.  The general way to approach this
is to sum the logical indicators.  When the TRUE/FALSE values are used
in an arithmetic expression they are converted to 1/0 values.
Suppose that your data frame is called mydf.  Then the expression

sum(mydf$Life.Exp > 70)/nrow(mydf$Life.Exp)

should do it.  If you want to be careful about the possibility of
missing data you should use

sum(mydf$Life.Exp > 70, na.rm = TRUE)/sum(!is.na(mydf$Life.Exp))

instead.
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

Tim Hesterberg | 6 Dec 20:40 2007

Re: proportion of states life expectancy > 70

A minor improvement on Bates' answer is to use mean() instead of sum()
  mean(mydf$Life.Exp > 70)
  mean(mydf$Life.Exp > 70, na.rm = TRUE)

>On Nov 27, 2007 10:38 PM, bhavin toprani <b_toprani <at> hotmail.com> wrote:
>
>> Dear all,
>
>>  Could you please suggest an easy command eg, if, for ,etc to find the
>> proportion of states having life expectancy greater than 70?
>
>>  Thanks for your help in advance.
>
>>  example of dataframe
>>              Life.Exp
>>  Alabama   69.05
>>  Alaska      69.31
>>  Arizona     70.55
>>  Arkansas   70.66
>>  California  71.71
>>  Colorado   72.06
>
>I haven't seen a response to this.  The general way to approach this
>is to sum the logical indicators.  When the TRUE/FALSE values are used
>in an arithmetic expression they are converted to 1/0 values.
>Suppose that your data frame is called mydf.  Then the expression
>
>sum(mydf$Life.Exp > 70)/nrow(mydf$Life.Exp)
>
>should do it.  If you want to be careful about the possibility of
>missing data you should use
>
>sum(mydf$Life.Exp > 70, na.rm = TRUE)/sum(!is.na(mydf$Life.Exp))
>
>instead.
--------------------------------------------------------------------
This message was distributed by s-news <at> lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request <at> lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


Gmane