Re: [trunk regression]: util:expand() throws *start offset* out of bounds exception after recent commits
Ron Van den Branden <ron.vandenbranden <at> kantl.be>
2011-02-01 12:35:04 GMT
Thanks for your description; it helped me isolate the problem with
util:expand(). I'll focus on that function, as a) I'm not too familiar
with the kwic functions, and b) suspect the latter functions depend on
util:expand(). Please feel free to add / comment your findings.
On 1/02/2011 12:17, Hungerburg wrote:
> the error only happens, when the document, where the hit occurs, has
> empty elements above the location of the hit, and only if the string
> that produces the hit should have the hilight start at the beginning
> of the xml-element, that contains the string. This works reliably.
Attached is a XQuery Unit test file illustrating the problem
(startOffsetTest.xml). In my tests, it seems that the problem rather is
related to the complexity of elements that happen to precede a
ft:query() hit. I only see problems when a hit is preceded by a complex
element containing another element (either empty or with text content).
Both clearly produce mismatched start offsets (see failing tests
#7-#10). When the query matches the first string after such a complex
node, an exception is thrown (failing tests #7, #9); when the second
word is matched, the offset appears to be one position early (failing
tests #8, #10). Note that the exception differs when match is
immediately preceded by a complex element with text content (failing
test #7: "start offset out of bounds"), or by a complex element with an
empty element (failing test #9: just "Compilation error: -1").
OTOH, there are no problems:
-when the matching node does not contain any elements preceding the
match (succeeding tests #1-#2)
-when the elements immediately preceding a match do not contain
nesting elements, be they empty or having just text content (succeeding
I can confirm that eXist-1.4.x and trunk behave identical on this test file.
Ron Van den Branden
Wetenschappelijk attaché / Senior Researcher
Centrum voor Teksteditie en Bronnenstudie - CTB (KANTL)
Centre for Scholarly Editing and Document Studies
Koninklijke Academie voor Nederlandse Taal- en Letterkunde
Royal Academy of Dutch Language and Literature
Koningstraat 18 / b-9000 Gent / Belgium
tel: +32 9 265 93 51 / fax: +32 9 265 93 49
E-mail : ron.vandenbranden <at> kantl.be
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
Exist-open mailing list
Exist-open <at> lists.sourceforge.net