Re: carriage return in attribute
2009-03-02 16:39:53 GMT
Thanks Michael,
I understand about XML rules for processing of carriage returns. I am dealing with an XML document that in being imported into my application. I am not sure if it has been serialized correctly or not, but if I read through this document byte-by-byte I see carriage return (13) and newline (10) as termination characters in an attribute that is a String. I know it's probably wrong to put these characters in an attribute and this should have been a value of the element inside a CDATA, but this is the document that I need to work with.
So once I parse this document all CRLFs are converted to LFs and I am left with a line with newlines which changes how this attribute is displayed - string is displayed in line instead of having newlines visible.
Now, I guess I can read through the document before it is imported (without parser) and replace all CRLFs with 
 to make it correct. However, this would be ugly and I was wondering if there is an easier way to deal with this.
Hope I am being clear in what I am trying to achieve.
thanks,
Alex
I'm not sure what you're asking for. Attribute value normalization [1] is part of the parsing process. It occurs before the data is presented to an application through any of the standard APIs.
[1] http://www.w3.org/TR/2006/REC-xml-20060816/#AVNormalize
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas <at> ca.ibm.com
E-mail: mrglavas <at> apache.org
Aleksandr Kravets <akravets.work <at> gmail.com> wrote on 02/27/2009 10:07:08 AM:
> Thanks.
> Are there utilities in Xerces that allow carriage returns
> normalization easier than let's say parsing the whole document and
> doing it manually?
> On Thu, Feb 26, 2009 at 6:39 PM, <keshlam <at> us.ibm.com> wrote:
> Carriage return is ASCII 13, so or &xD; will represent that character.
>
> However, be sure you understand XML's rules for whitespace
> normalization in attribute values. Depending on what you're trying
> to do, you may want to replace that attribute with a child
> element... or replace the offending character with some notation
> that your application, rather than XML, will process appropriately.
>
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
> -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> org/pegasus/songs/threes-rev-11.html)
Really, anyone generating faulty XML like this needs to be instructed
in the error of their ways. I mean, what are they creating the XML
for? Is there some parser out there that is currently handling these
faulty documents for them?
Paul
On Mon, Mar 2, 2009 at 12:39 PM, Aleksandr Kravets
<akravets.work <at> gmail.com> wrote:
> So it would need to be replaced in place of carriage return manually?
>
> On Mon, Mar 2, 2009 at 1:36 PM, Paul Gearon <gearon <at> ieee.org> wrote:
>>
>> I'm not saying that this is the answer to your problem, but the entity
>> referred to here is:
>> 
>>
>> Paul
>>
>> On Mon, Mar 2, 2009 at 12:20 PM, Aleksandr Kravets
>> <akravets.work <at> gmail.com> wrote:
>> > Ok, I think I found an issue similar to mine, it is in this thread:
>> >
RSS Feed