XmlSlurper performance aside; as a side-note, I have to wonder why you are shipping 25MB SOAP messages around instead of deferring the big blobs of data to MTOM attachments.
--
Schalk W. Cronjé
[Sent from mobile phone]
-------- Original message --------
From: jochen <at> eddelbuettel.net
Date: 21/05/2013 22:56 (GMT+00:00)
To: user <at> groovy.codehaus.org,youknowwho <at> heroicefforts.net
Subject: Re: [groovy-user] XmlSlurper no GC
Jess,
I've had my hand on the XmlSlurper sources just recently. Its categorization as "lazy evaluation" only extends to GPath queries (where it builds a hierarchy of Iterators instead of collecting nodes into new lists) and node structure manipulation (where new node content is represented by Builder closures until the XML is streamed out again). But it still creates a full in-memory representation of its (groovy.util.slurpersupport.)Nodes in the parse method based on SAX events. So we end up with something that is closer to DOM than StAX. It's all implemented in Java, not compiled Groovy. Anyway, given the dimensions of your typical XMLs, it probably isn't a wise choice.
What I find interesting is that you describe that it makes a difference, whether any queries are carried out or not. Could you send me a full sample (code and data)? I might find some time early next week to investigate. A recently accepted pull request of mine is going to make the network of Node references that the GC needs to come to terms with still a slight bit more complicated (by keeping a parent node reference) come Groovy 2.2.x.
I still believe that Groovy XML processing is utterly brilliant, but the JVM should sure be able to recapture the memory utilized along the way.
Cheers
Jochen (eddel+)
youknowwho <at> heroicefforts.net hat am 21. Mai 2013 um 16:34 geschrieben:
youknowwho <at> heroicefforts.net hat am 21. Mai 2013 um 16:34 geschrieben:
Background, I'm contemplating upwards of 25MB SOAP responses coming across the wire to me and vainly holding on to the illusion that the syntactically beautiful XmlSlurper would ever scale, memory wise, to support concurrent processing of such requests in a JEE server environment. Oh, I can hear the I told you so's echoing from my teammates already. :)
I have a Groovy script that simply reads a 25MB message from disk and does some light processing using XmlSlurper. With a 128MB heap, it just manages to run (oh StAX how I envy you now). However, if I run this block in a loop, then I get an OOME. Once the root GPathResult returned by the parse invocation is queried, then it appears that memory is never GC'd. I see no explicit cleanup methods on any of these classes. I've tried scoping in a closure, explicitly assigning nulls, etc. and nothing seems to work. Other allocations, big arrays, etc. are GC'd as expected. Are there known memory leak issues in this implementation or maybe Sun's underlying SAX implementation (JDK 1.6)?
Also, has anyone calculated the overhead added by XmlSlurper to XML of size X?
thanks,
-Jess