Ralph Richard Cook | 3 Jan 01:44 2006
Picon

Problems with asdf-install

I'm using OpenMCL 1.0 with Slime. If I load asdf-install and, for  
example portable aserve, and save with ccl:save-application, then try  
to load that image via Slime, I'll get

Welcome to OpenMCL Version 1.0 (DarwinPPC32)!
? ;Loading #P"/Users/rrc/slime/swank-loader.lisp"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/swank- 
backend.dfsl"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/ 
nregex.dfsl"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/ 
metering.dfsl"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/swank- 
openmcl.dfsl"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/swank- 
gray.dfsl"...
;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/ 
swank.dfsl"...Bug in OpenMCL system code:
Error reporting error
? for help
[829] OpenMCL kernel debugger:

Anyone else seen anything like this?

Thanks,
Richard
Gary Byers | 3 Jan 02:39 2006

Re: Problems with asdf-install

When you get thrown into the kernel debugger, what does the backtrace
look like ? (As a wild guess, I wonder if anything's trying to do
I/O to a defunct stream early in the startup process.  That would
error, try to print an error message on a defunct stream, error,
try to print a message on a defunct stream ... and getting thrown
into the kernel debugger after this goes on for a while is arguably
slightly better than stack-overflowing in relative silence.)  Whether
it's because of what I'm calling a defunct stream or for some other
reason, the "error reporting error" message generally means that
the error system has concluded that something is very, very wrong.

I don't use Slime, and may be misrembering and/or misunderstanding
what I remember about it, but I wonder what it does when sets up the
standard streams (*TERMINAL-IO*, etc.) for the thread that it creates
and how pervasive that is.  I don't know whether even the vague
suspicion/nervousness I'm expressing about that is at all justified;
it's probably a good idea to first see if this is a "defunct stream"
issue or something else entirely.

On Mon, 2 Jan 2006, Ralph Richard Cook wrote:

> I'm using OpenMCL 1.0 with Slime. If I load asdf-install and, for
> example portable aserve, and save with ccl:save-application, then try
> to load that image via Slime, I'll get
>
>
> Welcome to OpenMCL Version 1.0 (DarwinPPC32)!
> ? ;Loading #P"/Users/rrc/slime/swank-loader.lisp"...
> ;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/swank-
> backend.dfsl"...
(Continue reading)

Ralph Richard Cook | 3 Jan 05:43 2006
Picon

Re: Problems with asdf-install

Backtrace follows. When starting OpenMCL standalone I don't have a  
problem, it's just when going through Emacs/Slime

;Loading #P"/Users/rrc/.slime/fasl/openmcl-1.0-darwin-powerpc/ 
swank.dfsl"...Bug in OpenMCL system code:
Error reporting error
? for help
[938] OpenMCL kernel debugger: ?
(S)  Find and describe symbol matching specified name
(B)  Show backtrace
(X)  Exit from this debugger, asserting that any exception was handled
(K)  Kill OpenMCL process
(?)  Show this help
[938] OpenMCL kernel debugger: b

(#xf01338c0) #x00015558 : _debug_backtrace + 36
(#xf0133900) #x00015884 : _lisp_Debugger + 256
(#xf0133970) #x00010CB0 : _Bug + 88
(#xf0133bd0) #x000063E8 : _SPpoweropen_ffcall + 200
(#xF0133C10) #x0402DFE8 : #<Function BUG #x0801adae> + 244
(#xF0133C20) #x00005E90 : (subprimitive _ret1valn)
(#xF0133C30) #x041276C0 : #<Function FUNCALL-WITH-ERROR-REENTRY- 
DETECTION #x08105e0e> + 140
(#xF0133C40) #x00005E90 : (subprimitive _ret1valn)
(#xF0133C50) #x0412697C : #<Anonymous Function #x081058fe> + 208
(#xF0133C60) #x00005E90 : (subprimitive _ret1valn)
(#xF0133C70) #x041272EC : #<Function FUNCALL-WITH-XP-STACK-FRAMES  
#x08105c2e> + 496
(#xF0133C80) #x04126FCC : #<Function XCMAIN #x081059ae> + 1600
(#xF0133C90) #x00005E90 : (subprimitive _ret1valn)
(Continue reading)

Gary Byers | 3 Jan 06:55 2006

Re: Problems with asdf-install


On Mon, 2 Jan 2006, Ralph Richard Cook wrote:

> Backtrace follows. When starting OpenMCL standalone I don't have a problem, 
> it's just when going through Emacs/Slime

The trouble seems to start with a call to WARN-UNIMPLEMENTED-INTERFACES;
after the warning is printed (buffered) to *ERROR-OUTPUT*, the function
CCL::%BREAK-MESSAGE tries to do (FORCE-OUTPUT *ERROR-OUTPUT*), and
things go downhill from there.

One of the first things that happens when a saved image starts up
(the slightly simplified version) is that *TERMINAL-IO* gets SETQed
to a two-way stream made from an input stream to file descriptor 0
and an output stream to file descriptor 1; the other standard streams
are ordinarily made to be synonym streams to *TERMINAL-IO*.  (This
happens in the initial thread, and there isn't too much that happens
before this happens.)

Before this happens, the value of *TERMINAL-IO* is the value that
it had in the old (pre-SAVE-APPLICATION) image.  That value usually
becomes garbage as soon as the "new" *TERMINAL-IO* is created, but
(even though it's also a two-way stream to file descriptors 0 an 1)
it probably won't work to do I/O to it: pointers to buffers will
have been turned into "dead macptrs", since the #_malloc'ed memory
they use doesn't really exist in the new image.  (The buffering
is actually implemented by #_malloc'ing some memory an making it
look enough like a character or byte vector that things like SCHAR
and AREF work on it.  The lisp string or vector will still be in
the "defunct" stream, but its address will be "where the #_malloc'ed
(Continue reading)

Ralph Richard Cook | 4 Jan 02:04 2006
Picon

Re: Problems with asdf-install

Thanks, changing the defvar's to defparameter's got it.

On Jan 3, 2006, at 12:55 AM, Gary Byers wrote:

>
> The short version of the story is that if you save an image with
> Slime/Swank loaded into it in OpenMCL, Swank's notion of the values
> of the "real" standard stream values is yesterday's news (established
> before the SAVE-APPLICATION and never reset.)  Around the time that
> SLIME finishes compiling/loading itself, it apparently sets  
> *TERMINAL-IO*
> to its (stale) notion of what the "real" *TERMINAL-IO* should be,  
> and that can't work.  (It's not necessary to use ASDF-INSTALL or to
> try to load PASERVE or to do much of anything else in order to trigger
> this.)
>
> I can think of a few ways around this; which way or ways would work
> best depend on whether invoking SLIME always tries to load the swank
> stuff even if it's already present in the image.  (If so, a very
> simple fix would be to change the DEFVARs in swank.lisp's
> SETUP-STREAM-INDIRECTION to DEFPARAMETERs.)
Gary Byers | 10 Jan 03:15 2006

Re: FFI stuff, C++, wxWidgets

Part of Hamilton's message (of several weeks ago) had to do with the
desirability/practicality of a wxWidgets interface.

If anyone's interested in this, they might find the wxCL project

<http://www.wxcl-project.org>

of interest.

I just stumbled on this a few minutes ago; at first glance, it -looks-
like they solve the C++-interface issues by avoiding them (by using
a C wrapper library.)  If that guess is correct, it'd offer a way
of using at least some large subset of wxWidgets from OpenMCL.  (I
have no idea what the thread/event issues might be, and have no
idea about many of the other things that one would want/need to know
about.)

On Wed, 21 Dec 2005, Hamilton Link wrote:

> I'm sending a shout out to Larry Pells, but if anyone else has a dime's
> worth of opinion on this feel free to chime in.
>
> A while back Larry was having some UFFI stuff lock up when he was
> interfacing with C++.  In the last few years, several things have
> changed to make me think about an openmcl/C++ interface: (a) the
> lintel/macintel port of openmcl looming on the horizon, (b) my
> discovering that wxWidgets is bundled with Tiger and is nicely
> cross-platform on linux, (c) me having to learn MFC doc/view stuff and
> use VC++ to do stuff professionally (harsh, I know), and (d) Gary
> saying there's a standard C++ ABI now.
(Continue reading)

rs | 12 Jan 16:52 2006

odd behaviour of #_malloc and length?

i know there is ccl::malloc, but...
when i use length on a local variable containing a string in a  
#_malloc call it errors out with:
   value NIL is not of the expected type (UNSIGNED-BYTE 32).

(let* ((a "a string")
           (mem (#_malloc (+ (length a) 1))))
   (format t "mem is ~s" mem)
   (when mem
     (#_free mem)))

-> value NIL is not of the expected type (UNSIGNED-BYTE 32).

all of the following forms are ok (no error)

(let* ((mem (#_malloc (+ (length "a string") 1))))

(let* ((mem (#_malloc (+ (length "a string") 1))))

(let* ((a "a string")
           (mem (#_malloc (+ (array-total-size a) 1))))

(let* ((a "a string")
        (mem (#_malloc (+ (ccl::uv-size a) 1))))

(let* ((a '(1 2 3 4))
           (mem (#_malloc (+ (length a) 1))))

(let* ((a "a string")
           (len (+ (length a) 1))
(Continue reading)

rs | 12 Jan 17:51 2006

find reason for crashes and prevent mcl from going into kernel debugger?

Hi,
sometimes i see openmcl 1.0 crash during gc (osx 10.4.3, G4/2x500  
MHz, 1GB RAM 600GB RAID):

...Unhandled exception 11 at 0xfd30, context->regs at #xf0600a38
Write operation to unmapped address 0x82087000
In foreign code at address 0x0000fd30

  The code run fine in previous releases (14.2 and 14.4) (example of  
stacktrace appended)
The App is a simple backup-server which runs 24/7 serving up to 12  
clients. During backup it uses several hundrets of MB RAM because the  
clients ask the server about the files already on the server and the  
server returns a list of filedescriptions.

questions:
1.) how to find the problem (maybe from backtrace)
       (does "In foreign code at address 0x0000fd30" mean that the  
error is in a lib using dangling ptrs?)
2.) Since i *have* to solve the problem: Is it possible to tell mcl  
to exit instead of going into the kernel debugger? I could make an  
appropriate entry for launchd to start it up again when it quits.

An alternative would be to fork mcl for each client but i don't  
really like this idea.

Maybe the problem reveals itself in this backtrace?:

Last login: Tue Jan  3 14:15:54 on console
Welcome to Darwin!
(Continue reading)

Gary Byers | 12 Jan 17:56 2006

Re: odd behaviour of #_malloc and length?


On Thu, 12 Jan 2006, rs wrote:

> i know there is ccl::malloc, but...
> when i use length on a local variable containing a string in a
> #_malloc call it errors out with:
>   value NIL is not of the expected type (UNSIGNED-BYTE 32).
>
> (let* ((a "a string")
>           (mem (#_malloc (+ (length a) 1))))
>   (format t "mem is ~s" mem)
>   (when mem
>     (#_free mem)))
>
> -> value NIL is not of the expected type (UNSIGNED-BYTE 32).
>

It's a compiler bug; if the value of a foreign argument of certain
types (including :UNSIGNED-FULLWORD) is obtained from a fixnum
arithmetic operation that may overflow (and the overflow handling
is done out of line), the compiler neglects to store the result
of that operation in the right register before trying to initialize
the foreign argument from that register.  (Yes, that's the short
version ...)

> all of the following forms are ok (no error)

Some of the cases below avoid the bug because the LENGTH operation
gets constant-folded.  Some of the other cases may not error,
but may still suffer from the same bug: the register that doesn't
(Continue reading)

Gary Byers | 12 Jan 18:24 2006

Re: find reason for crashes and prevent mcl from going into kernel debugger?


On Thu, 12 Jan 2006, rs wrote:

> Hi,
> sometimes i see openmcl 1.0 crash during gc (osx 10.4.3, G4/2x500
> MHz, 1GB RAM 600GB RAID):
>
> ...Unhandled exception 11 at 0xfd30, context->regs at #xf0600a38
> Write operation to unmapped address 0x82087000
> In foreign code at address 0x0000fd30
>
>  The code run fine in previous releases (14.2 and 14.4) (example of
> stacktrace appended)
> The App is a simple backup-server which runs 24/7 serving up to 12
> clients. During backup it uses several hundrets of MB RAM because the
> clients ask the server about the files already on the server and the
> server returns a list of filedescriptions.
>
> questions:
> 1.) how to find the problem (maybe from backtrace)
>       (does "In foreign code at address 0x0000fd30" mean that the
> error is in a lib using dangling ptrs?)

It seems to be dying in the GC (which is foreign code ...), and I'd
guess that address 0x0000fd30 is somewhere in the GC.  (Some nearby
addresses show up in the backtrace below as being in/near the function
"gc()".

The backtrace seems to suggest that lisp code was doing some pathname-
related consing when the GC was triggered.  That doesn't tell us too
(Continue reading)


Gmane