scala.xml.parsing.ConstructingParser.fromSource OutOfMemoryError
I am trying to write some code that will validate a String as valid XML and then convert it to some sort of XML object so that I can process the contents.
I have been trying to use the Scala XML classes (specifically scala.xml.XML.loadString(...)), but this appears to not necessarily provide the full input string when using the toString method as it removes comments and converts CDATA sections to text. Research on the Internet suggested using scala.xml.parsing.ConstructingParser.fromSource(...) (or using an XML library: Anti-XML; Scales Xml; a java XML library).
I chose to use the ConstructingParser as it seemed simplest. It works fine when I provide a valid XML string. When I provide an invalid XML string (in this case I have inserted random text as an attribute without a value), I except an Exception to be thrown, but it displays a message informing me of the invalid content and then processes for a while resulting in a 'java.lang.OutOfMemoryError: Java heap space' error.
I cannot find anything in the documentation informing me that I must pass valid XML to this method, and I have not found any method to just validate the XML String. Please let me know if I have missed something.
I am developing code that will not be deployed for a while, so I have been using Scala 2.10, developing in Eclipse Juno, with Scala IDE Milestones for 2.10 (2.1.0.m2-2_10-201210191132-2563545 (which uses SBT 0.13.0.SNAPSHOT-2_10-20121019-1331)) and Scala Worksheet Nightly Updates for Scala IDE 2.1 and Scala 2.10 (0.2.0.nightly-2_10-201211070431-be42feb). I start Eclipse with '-vmargs -Xss8m -Xms700m -Xmx2048m'.
I have also download Scala 2.10.0-RC1 and Scala 2.9.2, and run the code through the command line interpreter, with the same results.
This seems to me to be similar to an existing Scala Issue: OutOfMemoryErrors and inifinte loops in scala.xml.parsing.ConstructingParser - https://issues.scala-lang.org/browse/SI-4520
I have considered using Anti-XML, but have not found anywhere that tells me how to use it with Scala 2.10. (I have not looked very hard or tried anything myself, as I found the ConstructingParser.)
I have attached a Scala Worksheet file (temp.sc.txt as I could not upload with the .sc extension) with my test code, and text file with the command line interpreter session for Scala 2.10.0-RC1. If I remove 'strtdohso ' from the XML String everything works.
Any advice on what I should do would be greatly appreciated. e.g. What I am doing wrong; How to validate the XML before using ConstructingParser; Whether to submit a bug report; etc.
Thanks,
Evan Bennett
object temp {
val xmlString = """<foo><!-- Comment --><bar strtdohso attr="test"><![CDATA[a > b]]></bar><bar>b</bar><bar>c</bar></foo>"""
val xmlSource = scala.io.Source.fromString(xmlString)
val xmlParser = scala.xml.parsing.ConstructingParser.fromSource(xmlSource, true)
val xmlDocument = xmlParser.document
val xmlToString = xmlDocument.toString
xmlString == xmlToString
println(xmlToString)
}
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\development\scala-2.10.0-RC1\bin>scala
Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7
.0_09).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val xmlString = """<foo><!-- Comment --><bar strtdohso attr="test"><
![CDATA[a > b]]></bar><bar>b</bar><bar>c</bar></foo>"""
xmlString: String = <foo><!-- Comment --><bar strtdohso attr="test"><![CDAT
A[a > b]]></bar><bar>b</bar><bar>c</bar></foo>
scala> val xmlSource = scala.io.Source.fromString(xmlString)
xmlSource: scala.io.Source = non-empty iterator
scala> val xmlParser = scala.xml.parsing.ConstructingParser.fromSource(xmlSource
, true)
xmlParser: scala.xml.parsing.ConstructingParser = scala.xml.parsing.Constructing
Parser <at> 59c81460
scala> val xmlDocument = xmlParser.document
:1:42: '=' expected instead of 'a'ttr="test"><![CDATA[a > b]]></bar><bar>b</bar>
<bar>c</bar></foo> ^
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source
)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:264
)
at scala.xml.parsing.MarkupParserCommon$class.xAttributeValue(MarkupPars
erCommon.scala:66)
at scala.xml.parsing.ConstructingParser.xAttributeValue(ConstructingPars
er.scala:47)
at scala.xml.parsing.MarkupParserCommon$class.xAttributeValue(MarkupPars
erCommon.scala:74)
at scala.xml.parsing.ConstructingParser.xAttributeValue(ConstructingPars
er.scala:47)
at scala.xml.parsing.MarkupParser$class.xAttributes(MarkupParser.scala:3
10)
at scala.xml.parsing.ConstructingParser.xAttributes(ConstructingParser.s
cala:47)
at scala.xml.parsing.MarkupParser$class.mkAttributes(MarkupParser.scala:
282)
at scala.xml.parsing.ConstructingParser.mkAttributes(ConstructingParser.
scala:47)
at scala.xml.parsing.ConstructingParser.mkAttributes(ConstructingParser.
scala:47)
at scala.xml.parsing.MarkupParserCommon$class.xTag(MarkupParserCommon.sc
ala:44)
at scala.xml.parsing.ConstructingParser.xTag(ConstructingParser.scala:47
)
at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:553)
at scala.xml.parsing.ConstructingParser.element1(ConstructingParser.scal
a:47)
at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:418)
at scala.xml.parsing.ConstructingParser.content1(ConstructingParser.scal
a:47)
at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:442)
at scala.xml.parsing.ConstructingParser.content(ConstructingParser.scala
:47)
at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:567)
at scala.xml.parsing.ConstructingParser.element1(ConstructingParser.scal
a:47)
at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:418)
at scala.xml.parsing.ConstructingParser.content1(ConstructingParser.scal
a:47)
at scala.xml.parsing.MarkupParser$class.document(MarkupParser.scala:239)
at scala.xml.parsing.ConstructingParser.document(ConstructingParser.scal
a:47)
at .<init>(<console>:10)
at .<clinit>(<console>)
at .<init>(<console>:7)
scala>
C:\development\scala-2.10.0-RC1\bin>