Wolfgang Meier | 1 Jan 20:17 2005

Re: XUpdate bug

Hi,

> I'm using the 1.0 beta 1 version and there's a bug in the update methods
> of the org.xmldb.api.modules.XUpdateQueryService implementation. 
> Every update returns always "1", instead of the right number of updated
> nodes.

I think I fixed this some time ago. Anyway, updating to beta2 is highly 
recommended if you use XUpdate.

Wolfgang

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
Wolfgang Meier | 1 Jan 20:41 2005

Re: Namespace declarations not stored in DB?

Hi,

> I am creating an XMLResource and setting its content using the string 
> "<foo:bar xmlns:foo='blah'/>":
> 
>                 XMLResource r = 
> (XMLResource)collection.createResource("foo.xml", "XMLResource");
>                 r.setContent("<foo:bar xmlns:foo='blah'/>");
>                 collection.storeResource(r);
> 
> Later on I pull the contents back, and I find all the namespace 
> declarations are gone:
> 
>                 XMLResource r = 
> (XMLResource)collection.createResource("foo.xml", "XMLResource");
>                 DOMSource s = new DOMSource(r.getContentAsDOM());
>                 StreamResult r = new StreamResult(System.out);
>                 
> TransformerFactory.newInstance().newTransformer().transform(s, r);
> 
> output I receive is just "<foo:bar/>", without the namespace 
> declaration. Have I lost my namespaces? How can I get them back?

The namespace declaration is stored along with <foo:bar>. You should see
it if you call r.getContent().toString(). The identity transformation
should also preserve the namespace declaration (?), but this is out of
eXist's control.

Wolfgang

(Continue reading)

Ronak Patel | 2 Jan 05:34 2005
Picon

Re: Xpath and XQuery

Hi,
Thanx for your reply back guys.
I was wondring the Graphical User Interface provided by the eXist allow to use only Querying using Xpath.
If this is right so, how can we use Xquery with exist.
 


Ronakkumar P Patel
Graduate Student(Computer Science)
University of Southern California
Email: ronak_patel <at> ieee.org
Phone: 213 804 1146(work)
_______________________________________________

"Nothing is Impossible in this world the word
Impossible it self says I'm Possible."
_______________________________________________

Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
r.brefort | 2 Jan 11:52 2005

Xquery and ISO-8859-1

Hi,

I made a database using eXist on my intranet and it 
works very well, except a problem with french 
accentuated characters.
Both my database and my request form are in french 
and encoded iso-8859-1. I'm using the Xquery servlet.
The request works well and return results with correct 
accentuated characters. But there is no result  when 
there is an accentuated character within a request 
parameter. 
I modified my query to return the request parameter 
itself, and I saw a small square in place of the 
accentuated character.
Then I modified the "WEB-INF\web.xml" file, replacing 
all "UTF-8" references by "ISO-8859-1". The request 
parameter is now returned with a question mark in 
place of the accentuated character.

To reproduce the problem, I made a very simple xql file :

xquery version "1.0";
let $a := "é"
return
<a>{$a}</a>

(in which é is an accentuated character). It still returns a 
question mark.

Can anybody tell me what to do.

Remy

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
Michael Beddow | 2 Jan 13:11 2005
Picon

Re: Xquery and ISO-8859-1

> Can anybody tell me what to do.

Well the first thing I'd advise, since it sounds as though your have control
of your own data, is to transcode your data into utf-8 and use it in your
queries and output. Internally, eXist uses utf-8 throughout, so anything you
can do to eliminate transcoding on input and output helps remove possible
sources of encoding corruption. Any software you may have that doesn't
support utf-8 should be discarded, since its developers are plainly
troglodytes. This solution, of course, doesn't work for people who simply
have to work with iso-8859-N data because their boss says so.

The second thing is: if you hit encoding problems, don't as a first move
mess with encoding settings in the innards of eXist or any application
framwork that hosts it. All these settings have been scrutinised many times
by people concerned to get eXist's support for multiple encodings as solid
as it can be. That doesn't mean there may not be some sort of problem there
still to be located; but it definitely isn't the *first* place to look.

Thirdly, if you want to, or have to, work in an iso-8859-N encoding in an
XML environment, you will sooner or later hit encoding-related problems, and
you won't be able to trouble-shoot them (or report tham in a way that makes
it easy for others to help you do so), unless you find out a little more
about how encodings in general, and utf-8 and utf-16 in particular work.
For example, you report on one case seeing rectangles and in another
question marks. These indicate different things. A rectangle generally means
(especially in Windows) that the underlying data is correct, but that the
software you are using to render it can't find a glyph that corresponds to
the code-point in the file. In such cases it is usually safe to continue
processing, because the internal representation isn't corrupted and the
rendering issue can be sorted out as a separate problem. Whereas a question
mark more often means that the byte, or byte sequence, at that point in the
data doesn't map to a valid character in the current character set and that
you have therefore suffered information loss in your underlying data, a much
more serious thing.

>To reproduce the problem, I made a very simple xql file :

> xquery version "1.0";
> let $a := "é"
> return
> <a>{$a}</a>

> (in which é is an accentuated character). It still returns a
> question mark.

How did you "make" it? In a text editor set to your default locale and
encoding (which your email suggests is iso-8859-1)?? If so that editor will
have inserted the binary value for the eacute as an iso-8859-1 codepoint,
namely 0xE9.  Without an encoding declaration (which your query appears not
to have), that will be seen by any conformant XML processor as an illegal
value, since it cannot occur in an utf-8 encoded text stream. If that's the
case I'd say you were lucky to get a question mark (signifying "not a valid
code-point in this character set") rather than a a fatal "invalid utf-8"
error.

But this example doesn't really test your problem, which as far as I
understand it is that you are submitting query terms which aren't being
matched, even though they are present in your data. You say you are seeing
the correct encoding in retrieved data, which indicates that the parser and
the serialiser are correctly and transparently transcoding from and to utf-8
for you where your documents are concerned. So that suggests that for one
reason or another the accented characters in your query terms are either
being incorrectly encoded (by you or your software) on input, or being
incorrectly transcoded by the query parsing mechanism in eXist. The latter
has indeed happened occasionally in the past, and it is possible that this
bug has re-surfaced in recent updates.

To try to narrow down the possibilities, I suggest you try some tests using
the command-line client and simple XPath queries that attempt matches on
strings containing an accented character. If these fail, please show us some
sample data and your logs for these queires (which echo what eXist thinks
the query terms are internally). Thanks to a bug in GNU readline, the
command line parser actually only works with query terms input in iso-8859-1
encoding, so you are well catered for there. If you get successful matches
via this basic technique, then it's time to look into what is happening with
your XQueries, including how precisely how you are generating them.

Michael Beddow

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
r.brefort | 3 Jan 05:16 2005

Re:Re:Xquery and ISO-8859-1

Thanks to Michael for his very complete response.

So, I tried requests (without and with accentuated 
characters) form the command-line client, and it works 
perfectly :

   exist:/db/ents/rte> find //ent[statut="Etat (service)"]
   //ent[statut="Etat (service)"]
   found 5 hits in 330ms.
   exist:/db/ents/rte> find //ent[statut="Etat 
(établissement)"]
   //ent[statut="Etat (établissement)"]
   found 10 hits in 341ms. 

When I try a Xquery request, the following request works :

   xquery version "1.0";
   declare namespace util="http://exist-db.org/xquery/util";
   declare namespace fn="http://exist-db.org/local-
functions";
   for $a in collection("/db/ents/rte")//statut
   where match-all($a,"service")
  return
  <p>{$a}</p>

But the same request with "établissement" in place 
of "service" doesn't match any result (I have not any error 
message, just a blank result page).

My request is written in a text editor (JEdit), and I run my 
xql file from an html form in Internet explorer. My 
computer is running under windows XP.

I also tried to encode all my data in UTF-8 as suggested 
Michael, replacing accentuated characters  by code (for 
example "&eacute;" in plae of  "é"). But I got error 
messages when I tried to import the data in the eXist 
database.

Rémy

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
Grzegorz Chrupała | 2 Jan 22:11 2005
Picon

XQuery bug?

Hi,
I'm a new user and let me say eXist is a cool project. I've been
playing with it for a few days as I am considering using it in a
corpus linguistics project.
I've run into a couple of issues with exist's xquery implementation.
The following query, when run on the attached sample doc, should
return one hit , but it doesn't in exist (latest snapshot version).
(: Find a sequence tokens: [adjective, "piece", "of"] :)
for $s in //sentence
where some $token in $s//token[tags/morpho[starts-with(.,'A ')]] satisfies
   let $following := $token/../token[. >> $token]
   where
       $following[1][lemma[. = "piece"]] and 
       $following[2][lemma[. = "of"]]    
   return true()
return
   <item>{string-join($s/variant[ <at> xml:lang='en']//text/text(), ' ')}</item>

It works as expected in Saxon. Is that a bug in eXist?

Best,
--

-- 
Grzegorz Chrupała | http://pithekos.net
Attachment (test-doc.xml): text/xml, 7879 bytes
Gil Tayar | 3 Jan 12:11 2005

$p/node()

From what I understand of XPath, $p/node() should return everything but attributes. This is because $p/node() is a short version of $p/child::node() and attributes are not in the child axis but rather in the attribute axis (unfortunate, since the “node()” is expected to return everything. It’s one of those quirks which you get in any language…)

 

But it seems exist is returning everything. I tried this:

 

for $client in //client[ <at> id = $clientIdParameter]

return

  element {name($client)}

  {

    $client/node()

  }

 

Where $client is a node that looks something like:

<client id=”…”>

            <foo>foo</foo>

            <bar>bar</bar>

</client>

 

The result was (the same as the node):

<client id=”…”>

            <foo>foo</foo>

            <bar>bar</bar>

</client>

 

But what it should have returned was:

<client>

            <foo>foo</foo>

            <bar>bar</bar>

</client>

 

i.e. without the “id” attribute.

 

Bug? 

 

BTW, I am building a project management tool using your database, and so far have found no significant problems. Thanks!

 

Gil Tayar

WebCollage

Chief Technology Officer

Business: +972 (3) 766 1806

Mobile:    +972 (54) 634 4457

P.S. Obviously, easy to work around it (do $client/* instead).

 

Michael Beddow | 3 Jan 12:40 2005
Picon

Re: Re:Re:Xquery and ISO-8859-1

> So, I tried requests (without and with accentuated
>   characters) form the command-line client, and it
> works  perfectly :

That's good news. It confirms that everything is working fine at the core as
far as iso-8859-1 encoded data is concerned. Your problem is arising
somewhere on the periphery and should be relatively easy to isolate and
solve.

> I also tried to encode all my data in UTF-8 as suggested
> Michael, replacing accentuated characters  by code (for
> example "&eacute;" in plae of  "é"). But I got error
> messages when I tried to import the data in the eXist
> database.

Ah, there's a misunderstanding here, with, I think, two components

1) You can't use character entity references (CER) like &eacute; in XML
documents, unless your document has a DTD (or a fragmentary internal subset
of such a DTD if you have no "real" DTD or are using a schema for
validation) in which those entities are declared and defined so that the
parser knows how to resolve them (which the parser in eXist won't do anyway
unless it is explicitly told to validate against a DTD). Without such
declarations in a DTD, a conformant XML parser can resolve only &amp; &lt;
&gt; &quot; and &apos;. People who use XHTML or the TEI-LITE DTD sometimes
don't realise this, because in both cases the relevant DTD defines a wide
range of character entity references. So, for what you are trying to do, you
would need to use a numeric character reference (NCR), in this case &#233 or
&#xe9;  These look like entity references, in that they begin with an
ampersand and end with a semi-colon, but they are not entity references and
shouldn't be called such. An XML parser handles NCRs at a much lower level
than entity references (character or otherwise). As soon as an NCR appears
in the parser's input stream, it is translated from a string to a binary
value corresponding to the number represented in ASCII between the & and the
; in the body of the NCR. The parser proper never sees the NCR at all, and
so doesn't report its presence or its processing. It just disappears,
leaving the binary value behind. Whereas the presence of a CER triggers the
parser's entity  resolution mechanism, which may involve a callback to the
application the parser is servicing and so be visible to that application if
required.

2) Use of the NCR &#233; to represent an e acute hasn't anything to do with
using utf-8. We have to distinguish between the code-point assigned to a
character in a character set and the internal representation of that
code-point. The character Unicode names as  LATIN SMALL LETTER E WITH ACUTE
happens to have the same assigned code-point in both ISO-8859-1 and in
Unicode, namely hex E9. And what the NCR &#xe9; tells the parser is "I want
to insert the character whose Unicode code-point is U00E9 into the text
stream at this point".  How that character is internally represented is a
different matter, and needn't concern us unless things go wrong and we have
to pick over the wreckage, but in an iso-8859-1 encoded document it is
represented as a single byte with value hex E9, whereas in a utf-8 encoded
document the same code-point is represented internally as a two byte
sequence, hex C3 A9. Properly-configured software should always be able to
hide this internal representation from us, but things don't always work out
that way.

So to follow my initial advice and convert your data to utf-8 you would need
to run all your iso-8859-1 documents through a transcoder. The one most
people rely in is iconv, which is in all Linux and most Unix distributions,
and for Windows can be obtained from
http://gnuwin32.sourceforge.net/packages/libiconv.htm

> My request is written in a text editor (JEdit)

That in itself doesn't tell us what internal encoding the editor is using. I
am (by choice) very ignorant about the interaction of Java and Windows, but
I wouldn't be surprised if  the underlying JVM didn't default to the system
locale, which in your case would be iso-8859-1, as its internal
representation for character data. That should mean that when you press your
key for eacute when composing your query, the iso-8859-1 internal
representation of that character goes into the data buffer and gets saved in
the file. And if the data you're querying is encoded in iso-8859-1, then
that's what you want to happen. Are you providing your XQuery with an
encoding declaration, though? Your example doesn't have one, but if it is
iso-8859-1 encoded it needs one. [Q to Wolfgang: I take it the eXist XQuery
parser recognises and handles encoding declarations ?]

This matter can be a bit confusing.  Although XQuery documents are
emphatically not XML documents and so can't and don't have an XML
declaration, then can have an encoding declaration, and indeed must have one
if their encoding is not utf-8. See http://www.w3.org/TR/xquery sections
===========
H5: XQuery documents use the Unicode character set and, by default, the
UTF-8 encoding.
===========
and
===========
H3 An XQuery document may contain an encoding declaration as part of its
version declaration :
xquery version "1.0" encoding "utf-8";
===========
There is a hidden dependency between those two statements, which is hidden
all the more by the order in which they appear in the WD.

SO ... if you still have your collection in eXist encoded in iso-8859-1 and
correctly declared as such (which seems to be the case, because the XPath
test with the cl client succeeds), I suggest you try heading up your
XQueries with
xquery version "1.0" encoding "iso-8859-1";
and then submitting them, again via the command line client, but this time
using its -F argument (NB upper case in that switch) to pass in the name of
the file which contains your Xquery. If your editor is indeed encoding the
query using iso-8859-1 and if eXist correctly supports encoding
declarations on XQueries, this should then work. If if doesn't we will have
to do some more head scratching.

> and I run my  xql file from an html form in Internet explorer.

I'm not clear what "run from" exactly means here and how it relates to the
creation of the query using Jedit, but encoding issues with html form data
add a further layer of possible errors, which I'd prefer to leave aside at
the moment, which is why I suggest delivering the XQuery via the
command-line
client's -F parameter.

Michael Beddow

, and I run my
xql file from an html form in Internet explorer. My
computer is running under windows XP.

Rémy

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Exist-open mailing list
Exist-open <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/exist-open

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
Wolfgang Meier | 3 Jan 13:04 2005

Re: Re:Re:Xquery and ISO-8859-1

Hi Michael,

> ===========
> H3 An XQuery document may contain an encoding declaration as part of its
> version declaration :
> xquery version "1.0" encoding "utf-8";
> ===========

The encoding declaration has been added in the October, 29 draft and was 
not present in previous versions. I just recognized this after reading 
your message.

eXist currently expects the XQuery to be UTF-8 encoded in all cases! But 
now that it's part of the syntax, I will add support for the encoding 
declaration soon.

Wolfgang

-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

Gmane