Marco Antoniotti | 21 Apr 20:45 2011
Picon

Counter-intuitive API for cxml:parse on LWM?

Hi

here is the example on LWM:

CL-USER 17 > (cxml:parse #P"CellCycle-1991Tys-2.xml" (cxml:make-whitespace-normalizer (make-instance 'sax:default-handler)) :validate nil)
NIL

CL-USER 18 > (with-open-file (f #P"CellCycle-1991Tys-2.xml" :direction :input)
              (cxml:parse f (cxml:make-whitespace-normalizer (make-instance 'sax:default-handler)) :validate nil))

Error: Binary operation READ-BYTE attempted on character stream #<STREAM::LATIN-1-FILE-STREAM /Users/marcoxa/Projects/Genomics/biolosyst/sbmlambda/tests/CellCycle-1991Tys-2.xml>.
  1 (abort) Return to level 0.
  2 Return to top loop level 0.

Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.

CL-USER 19 : 1 > 

And here is the doc string for cxml:parse

 " <at> arg[input]{A string, pathname, octet vector, or stream.}
   <at> arg[handler]{A <at> class{SAX handler}}
   <at> arg[validate]{Boolean.  Defaults to <at> code{nil}.  If true, parse in
     validating mode, i.e. assert that the document contains a DOCTYPE
     declaration and conforms to the DTD declared.}
...
   <at> arg[recode]{Boolean.  (Ignored on Lisps with Unicode
     support.)  Recode rods to UTF-8 strings.  Defaults to true.
     Make sure to use <at> fun{utf8-dom:make-dom-builder} if this
     option is enabled and <at> fun{rune-dom:make-dom-builder}
     otherwise.}
   <at> return{The value returned by <at> fun{sax:end-document} on <at> var{handler}.}

   Parse an XML document from <at> var{input}, which can be a string, pathname,
   octet vector, or stream.
...
   All SAX parsing functions share the same keyword arguments.  Refer to
   <at> fun{parse} for details on keyword arguments."

It appears that you need a 'xstream', but this is against the doc string content (and IMHO, against common sense).

Cheers

--
Marco Antoniotti


Marco Antoniotti | 21 Apr 21:24 2011
Picon

... and another

Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.


CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))

Error: In = of (#\< 254) arguments should be of type NUMBER.
  1 (continue) Return a value to use.
  2 Supply a new first argument.
  3 (abort) Return to level 0.
  4 Return to top loop level 0.

Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.



Cheers


--
Marco Antoniotti


David Lichteblau | 21 Apr 21:06 2011

Re: Counter-intuitive API for cxml:parse on LWM?

Hi,

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
> here is the example on LWM:

what is LWM?

> CL-USER 17 > (cxml:parse #P"CellCycle-1991Tys-2.xml" (cxml:make-whitespace-normalizer
(make-instance 'sax:default-handler)) :validate nil)
> NIL
> 
> CL-USER 18 > (with-open-file (f #P"CellCycle-1991Tys-2.xml" :direction :input)
>               (cxml:parse f (cxml:make-whitespace-normalizer (make-instance 'sax:default-handler)) :validate nil))
> 
> Error: Binary operation READ-BYTE attempted on character stream #<STREAM::LATIN-1-FILE-STREAM /Users/marcoxa/Projects/Genomics/biolosyst/sbmlambda/tests/CellCycle-1991Tys-2.xml>.
>   1 (abort) Return to level 0.
>   2 Return to top loop level 0.
[...]
>    Parse an XML document from  <at> var{input}, which can be a string, pathname,
>    octet vector, or stream.
[...]
> It appears that you need a 'xstream', but this is against the doc string content (and IMHO, against common sense).

Not an xstream, no (those are internal to cxml and would only be used
very rarely by user code).

What you need is a _binary_ stream though (and the docstring could
indeed be improved to say "octet vector, or octet stream" rather than
just "octet vector, or stream".

In other words, add :element-type '(unsigned-byte 8) to the
with-open-file.

d.

Marco Antoniotti | 21 Apr 21:52 2011
Picon

Re: Counter-intuitive API for cxml:parse on LWM?


On Apr 21, 2011, at 15:06 , David Lichteblau wrote:

> Hi,
> 
> Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
>> here is the example on LWM:
> 
> what is LWM?

Lispworks Mac

> 
>> CL-USER 17 > (cxml:parse #P"CellCycle-1991Tys-2.xml" (cxml:make-whitespace-normalizer
(make-instance 'sax:default-handler)) :validate nil)
>> NIL
>> 
>> CL-USER 18 > (with-open-file (f #P"CellCycle-1991Tys-2.xml" :direction :input)
>>              (cxml:parse f (cxml:make-whitespace-normalizer (make-instance 'sax:default-handler))
:validate nil))
>> 
>> Error: Binary operation READ-BYTE attempted on character stream #<STREAM::LATIN-1-FILE-STREAM /Users/marcoxa/Projects/Genomics/biolosyst/sbmlambda/tests/CellCycle-1991Tys-2.xml>.
>>  1 (abort) Return to level 0.
>>  2 Return to top loop level 0.
> [...]
>>   Parse an XML document from  <at> var{input}, which can be a string, pathname,
>>   octet vector, or stream.
> [...]
>> It appears that you need a 'xstream', but this is against the doc string content (and IMHO, against common sense).
> 
> Not an xstream, no (those are internal to cxml and would only be used
> very rarely by user code).
> 
> What you need is a _binary_ stream though (and the docstring could
> indeed be improved to say "octet vector, or octet stream" rather than
> just "octet vector, or stream".
> 
> In other words, add :element-type '(unsigned-byte 8) to the
> with-open-file.

Ok.  But the file is not a binary file.

Marco

> 
> 
> d.
> 

--
Marco Antoniotti

David Lichteblau | 21 Apr 22:51 2011

Re: Counter-intuitive API for cxml:parse on LWM?

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
> Ok.  But the file is not a binary file.

The fundamental reason XML files need to be processed using binary
streams instead of character streams in (portable) Common Lisp is that
an XML parser needs to read the first few bytes before it can know which
external format to parse the file as.

Implemented using a Common Lisp character stream, that would require a
switch of the external format in the middle, using (setf
stream-external-format).  But that function doesn't exist in portable
Lisp.  (One of the few Lisps that supports it is Allegro with its
simple-streams, but we aim to support more Lisps than just Allegro.  We
also aim for more speed than typical gray streams-based implementations
of this idea offer.)

d.

David Lichteblau | 21 Apr 22:54 2011

Re: ... and another

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
> Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.
> 
> 
> CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))

That example should work in any implementation with Unicode support.

Backtrace?
Value of *features*?

d.

Marco Antoniotti | 22 Apr 01:28 2011
Picon

Re: ... and another


On Apr 21, 2011, at 16:54 , David Lichteblau wrote:

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.


CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))

That example should work in any implementation with Unicode support.

Backtrace?
Value of *features*?


Here they are.  I suspect that LWM does not support UNICODE.

CL-USER 2 > (defparameter *source* (cxml:make-source "<example>text</example>"))

Error: In = of (#\< 254) arguments should be of type NUMBER.
  1 (continue) Return a value to use.
  2 Supply a new first argument.
  3 (abort) Return to level 0.
  4 Return to top loop level 0.

Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.

CL-USER 3 : 1 > :b
Call to ERROR
Call to (METHOD RUNES::FIGURE-ENCODING (STREAM))
Call to RUNES:MAKE-XSTREAM
Call to CXML:MAKE-SOURCE
Call to LET
Call to LET
Call to EVAL
Call to CAPI::CAPI-TOP-LEVEL-FUNCTION
Call to CAPI::INTERACTIVE-PANE-TOP-LOOP
Call to MP::PROCESS-SG-FUNCTION

CL-USER 4 : 1 > (pprint *features*)

(BABEL::UCS-2-CHARS
 :RUNE-IS-CHARACTER
 :RUNE-IS-UTF-16
 :ASDF2
 :ASDF
 :MK-DEFSYSTEM
 :COMMON-LISPWORKS
 :LW-EDITOR
 :CAPI-COCOA-LIB
 :CAPI-TOOLKIT
 :CAPI
 :DBCS-ENV
 :COCOA
 :UNIX-WITHOUT-MOTIF
 :COMMON-FFI
 :NEW-PATCH-SYSTEM
 :BYTE-INSTRUCTIONS
 :COMPILER
 :SHALLOW-BINDING
 :ANSI-CL
 :COMMON-LISP
 :IEEE-FLOATING-POINT
 :LISPWORKS
 :CLASS-SHAKE-USING-GATES
 :COMMON-DEFSYSTEM
 :CLOS
 :DBCS
 :UNICODE
 :NATIVE-THREADS
 :UNIX
 :HARLEQUIN-COMMON-LISP
 :LISPWORKS-32BIT
 :LATIN-1
 :LISPWORKS6
 :LISPWORKS6.0
 :PTHREADS
 :DARWIN
 :MAC
 :MACOSX
 :APPLE
 ...)

CL-USER 5 : 1 > 

Any idea about how to fix this?

--
Marco Antoniotti


Marco Antoniotti | 22 Apr 01:32 2011
Picon

Re: ... and another


On Apr 21, 2011, at 16:54 , David Lichteblau wrote:

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.


CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))

That example should work in any implementation with Unicode support.

Backtrace?
Value of *features*?


Here they are.  I suspect that LWM does not support UNICODE.

CL-USER 2 > (defparameter *source* (cxml:make-source "<example>text</example>"))

Error: In = of (#\< 254) arguments should be of type NUMBER.
  1 (continue) Return a value to use.
  2 Supply a new first argument.
  3 (abort) Return to level 0.
  4 Return to top loop level 0.

Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.

CL-USER 3 : 1 > :b
Call to ERROR
Call to (METHOD RUNES::FIGURE-ENCODING (STREAM))
Call to RUNES:MAKE-XSTREAM
Call to CXML:MAKE-SOURCE
Call to LET
Call to LET
Call to EVAL
Call to CAPI::CAPI-TOP-LEVEL-FUNCTION
Call to CAPI::INTERACTIVE-PANE-TOP-LOOP
Call to MP::PROCESS-SG-FUNCTION

CL-USER 4 : 1 > (pprint *features*)

(BABEL::UCS-2-CHARS
 :RUNE-IS-CHARACTER
 :RUNE-IS-UTF-16
 :ASDF2
 :ASDF
 :MK-DEFSYSTEM
 :COMMON-LISPWORKS
 :LW-EDITOR
 :CAPI-COCOA-LIB
 :CAPI-TOOLKIT
 :CAPI
 :DBCS-ENV
 :COCOA
 :UNIX-WITHOUT-MOTIF
 :COMMON-FFI
 :NEW-PATCH-SYSTEM
 :BYTE-INSTRUCTIONS
 :COMPILER
 :SHALLOW-BINDING
 :ANSI-CL
 :COMMON-LISP
 :IEEE-FLOATING-POINT
 :LISPWORKS
 :CLASS-SHAKE-USING-GATES
 :COMMON-DEFSYSTEM
 :CLOS
 :DBCS
 :UNICODE
 :NATIVE-THREADS
 :UNIX
 :HARLEQUIN-COMMON-LISP
 :LISPWORKS-32BIT
 :LATIN-1
 :LISPWORKS6
 :LISPWORKS6.0
 :PTHREADS
 :DARWIN
 :MAC
 :MACOSX
 :APPLE
 ...)

CL-USER 5 : 1 > 

Any idea about how to fix this?

--
Marco Antoniotti


David Lichteblau | 22 Apr 11:08 2011

Re: ... and another

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
> 
> On Apr 21, 2011, at 16:54 , David Lichteblau wrote:
> 
> > Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
> >> Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.
> >> 
> >> 
> >> CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))
> > 
> > That example should work in any implementation with Unicode support.
> > 
> > Backtrace?
> > Value of *features*?
> > 
> 
> 
> Here they are.  I suspect that LWM does not support UNICODE.

It supports Unicode, but LispWorks has slighly weird subtypes of
CHARACTER, and sometimes code insists on one subtype over the other.

In this case, I'm afraid the argument to MAKE-SOURCE needs to be a
string made up of LW:SIMPLE-CHAR rather than CHARACTER.

Things to try:

;; returns T on other Lisps, but might go wrong on LispWorks:
(typep "<example>text</example>" '(vector runes:rune))

;; possible workaroud
(cxml:make-source (coerce "<example>text</example>"
                          '(simple-array runes:rune)))

Sorry about that, but I gave up on trying to fix all of these little
issues related to lw:simple-char a long time ago.  We have to assume
lw:simple-char at some point (for a presumably good reason which I can't
recall any more), and then that assumption trickles down.

So: "the example works in any implementation with Unicode support [IF
the object behind the printed representation of the string has the right
shape]".  Not ideal, I admit that.

d.

Marco Antoniotti | 22 Apr 16:04 2011
Picon

Re: ... and another

Thanks...


On Apr 22, 2011, at 05:08 , David Lichteblau wrote:

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):

On Apr 21, 2011, at 16:54 , David Lichteblau wrote:

Quoting Marco Antoniotti (marcoxa <at> cs.nyu.edu):
Always on LWM.  I assume this has to do with UNICODE or not support, but the example in the docs is misleading.


CL-USER 37 > (defparameter *source* (cxml:make-source "<example>text</example>"))

That example should work in any implementation with Unicode support.

Backtrace?
Value of *features*?



Here they are.  I suspect that LWM does not support UNICODE.

It supports Unicode, but LispWorks has slighly weird subtypes of
CHARACTER, and sometimes code insists on one subtype over the other.

In this case, I'm afraid the argument to MAKE-SOURCE needs to be a
string made up of LW:SIMPLE-CHAR rather than CHARACTER.

Things to try:

;; returns T on other Lisps, but might go wrong on LispWorks:
(typep "<example>text</example>" '(vector runes:rune))

NIL




;; possible workaroud
(cxml:make-source (coerce "<example>text</example>"
                         '(simple-array runes:rune)))

CL-USER 14 > (cxml:make-source (coerce "<example>text</example>"
                                       '(simple-array runes:rune)))

Error: Cannot coerce "<example>text</example>" to type (SIMPLE-ARRAY RUNES:RUNE).
  1 (abort) Return to level 0.
  2 Return to top loop level 0.

Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.

CL-USER 15 : 1 > :b
Call to ERROR
Call to COERCE
Call to EVAL
Call to CAPI::CAPI-TOP-LEVEL-FUNCTION
Call to CAPI::INTERACTIVE-PANE-TOP-LOOP
Call to MP::PROCESS-SG-FUNCTION

CL-USER 16 : 1 > 



Sorry about that, but I gave up on trying to fix all of these little
issues related to lw:simple-char a long time ago.  We have to assume
lw:simple-char at some point (for a presumably good reason which I can't
recall any more), and then that assumption trickles down.

So: "the example works in any implementation with Unicode support [IF
the object behind the printed representation of the string has the right
shape]".  Not ideal, I admit that.

Ok.

What about adding something along the lines of the following in "closure-common/characters.lisp"?

(eval-when (:load-toplevel :compile-toplevel :execute)
    (unless (eq *default-character-element-type* 'lw:simple-char)
      (cerror "Set the default sting character element type to LW:SIMPLE-CHAR."
              "The current default character element type is ~A." *default-character-element-type*)
      (set-default-character-element-type 'lw:simple-char))))

Going over the LW mailing list archives it appears that the above would work, provided that the set-default-character-element-type call is the only one in the image.

I am also bugging the LW folks on this.  

Cheers
--
Marco



--
Marco Antoniotti



Gmane