Johannes Lichtenberger | 1 Nov 13:51 2009
Picon

Re: Shredding XML

On Sat, 2009-10-31 at 01:05 +0000, Fraser Goffin wrote:
> Thanks for the great comments thus far from every one.
> 
> Several people have mentioned using BLOB or CLOB and indeed this is
> something we have done in the recent past. However, one of the key
> issues is that at least some the applications that will access the
> data are either not XML capable and/or the programmers using them are
> not really that familiar. Whilst its possible to process XML data
> natively in Cobol, most of the time this is not the approach thats
> taken, and resource constraints and project deadlines often mitigate
> towards existing skills, technologies and practices.
> 
> So I'm really interested in experience of shredding moderately complex
> XML content models into relational tables (for example structures that
> might produce 30-50 even possibly more tables when decomposed). And
> also some arguments for and against that approach (I would like to be
> able to make a compelling case for moving towards treating XML as a
> first class type system rather than one which just providing a format
> for data exchange).
> 
> One of the suggestions from one of our solution designers was to
> 'flatten' the XML structure and represent relationships using
> keys/ids, that is, make the XML more like the database. Personally I
> like the contextual relationships implicit in the hierarchical content
> model and am not really keen to navigate around the document using ID
> values as opposed to simply walking the tree ... but maybe others
> people's experience could provide some use cases where that approach
> has merit ?.

You definately should have a look at the XPath Accelerator Scheme
(Continue reading)

Petite Abeille | 1 Nov 18:56 2009
Picon

Re: Shredding XML


On Oct 29, 2009, at 10:20 PM, Fraser Goffin wrote:

> opinions on the subject of decomposing XML into relational databases

Outside of the most trivial case, this is a major PITA of the same  
epic proportion as the object-relational one:

http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

Good luck.

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Jim Tivy | 1 Nov 20:55 2009

RE: Shredding XML

Interesting post, but I am not sure that "now is the time to talk of many
things".

Let me try to focus: 

Proper software execution comes from the choice of appropriate
actions/technologies to match the driving requirements.  But more
importantly, the greatest Wisdom, is to frame the driving requirements
correctly before "going off half cocked" or doing something that is
unnecessary and unwarranted.  

So lets start by framing the requirements again:

Fraser Gofin wrote:

"
The basics are we receive XML messages from an external trading partner and
process those messages, enriching and routing to a number of internal
subscriber applications. One of these applications is MI and the deal here
is that they want the data to been put into a relational database so that
they can create a number of interfaces 'files' which are sent to still more
applications.
"

OR

"
I am mainly interested in the process of LOADING XML data to a database
rather than extracting (at least for the purposes of this discussion).
"
(Continue reading)

Andrew Welch | 2 Nov 11:28 2009
Picon

Re: Shredding XML

> It is possible that the "mother persistent application datamodel" is
> contained in the relational database in all its normalized glory.  If so,
> then, "processing the messages" is simply a "data import" operation.  So the
> question is, how do I get XML X* to tables T*.  It would strike me that lots
> of people are doing this.  Are there common techniques and technologies for
> doing this import?

SAX parse XML into Hibernate pojos, or sometimes its easier to parse
into plain old pojos the mirror the XML structure more closely (to
avoid too much complexity in the SAX parser) and then have another
class that copies the data into the Hibernate pojos.

If you need to get the XML back out again, then you'll need a custom
XML writer (serialiser) to go over the pojos and create the XML (and
possibly another class to copy the data in the other direction)

That's quite a lot or work if the XML is large and varied, and every
time the XML changes there are quite a few code changes needed, but
its not too bad.. I much prefer doing this to using data binding.

--

-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
(Continue reading)

Fraser Goffin | 2 Nov 21:22 2009

Re: Shredding XML

Yes Jim, that is spot on.

Whilst there has been much discussion thus far on the technolgies and
techniques of getting data out of the database (and that has been
interesting), the programming for doing so are 'bread and butter' to
our mainframe Cobol and Sapiens guys, so thats not really my problem.

Mine is the task of getting the data from a fairly complex XML content
model into an appropriately factored relational database. The design
of that database is 'green field' but (and thanks to many on this
thread who have posted related papers) this may not be as easy at it
might at first appear, what with impedence mismatches here there and
everywhere ;-)

Its also the case that the XML data doesn't contain enough data
inherently to represent primary or foreign key values for all of the
relationships that are likely to arise. In some cases I MAY be
permitted to generate them myself (say using a UUID) as I 'walk' the
XML, in other cases I MAY be required to get the database to provide
the value(s), not sure yet. The later may increase the complexity
somewhat (sidenote: our DBAs don't allow stored procs (don't ask)  so
I'm going to be doing whole bunches of INSERTs as part of the
tree-walk I suspect)

I'm really interested in the gotchas and best practices. Some have
already been mentioned like the fact that the XML schema may define
optional items and unrestricted length facets and such like. Others
I've seen in reading talk about the mis-match of identity approaches
(although this was talking primarily about OO/Relational mapping but
the idea is similar I suspect). This could be important, since some
(Continue reading)

Michael Kay | 2 Nov 21:35 2009

RE: Shredding XML

> The PM is very nervous about using any new 
> tech, perhaps justifiably, but my sense of unease ...

That's the essence of the problem: the PM is a Luddite. 

Now, there are plenty of projects that fail because they're over-ambitious
in using new technology.

But there are also lots of projects that cost 10 times what they should
through failing to use it.

I've certainly been in environments where a PM's career was advanced through
delivering a successful project because no-one in higher management ever
knew that it could have been done for a tenth the cost. Indeed, a $1m
project looked better on his CV than a $100K project. Perhaps that's the
environment you're working in. 

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
(Continue reading)

Jim Tivy | 2 Nov 22:57 2009

RE: Shredding XML

Fraser

I am not entirely hearing firm commitment that you plan to establish an RDB
schema and make it the driving schema.  In other words, what this would mean
is that data elements cannot be put into the RDB unless they exist in the
RDB schema.  For example, if some new data elements show up in some external
XML to be imported then the DBA decides whether to allow them into the
appropriate RDB column or not, or drop them for the time being.

Another option (from the infinite number) would be to let the XML schema
generate the RDB schema and the mapping code.  For your application
programmers using SQL on the RDB this would likely lead to gagging and
hacking and an "out of body experience"  This is not something I would
recommend and if this is what you want then get a database that supports
XQuery and retrain your developers.

But I think you have to choose between these two - the first being what it
sounds like you want - then work backwards from that decision.

Jim

-----Original Message-----
From: Fraser Goffin [mailto:goffinf <at> googlemail.com] 
Sent: Monday, November 02, 2009 12:22 PM
To: xml-dev <at> lists.xml.org
Subject: Re: [xml-dev] Shredding XML

Yes Jim, that is spot on.

Whilst there has been much discussion thus far on the technolgies and
(Continue reading)

Liam Quin | 2 Nov 23:45 2009
Picon

Re: Shredding XML

On Mon, Nov 02, 2009 at 01:57:32PM -0800, Jim Tivy wrote:
> I am not entirely hearing firm commitment that you plan to establish an RDB
> schema and make it the driving schema.  In other words, what this would mean
> is that data elements cannot be put into the RDB unless they exist in the
> RDB schema.
[...[

> Another option (from the infinite number) would be to let the XML schema
> generate the RDB schema and the mapping code.

Another is to have a column called "elementname"...

It's even worse in practice, though

--

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

(Continue reading)

Andrew Eisenberg | 3 Nov 00:23 2009
Picon

Implementations of XQuery Update Facility requested


Back in August, the XML Query Working Group announced the availability of version 1.0.0 of the XQuery Update Facility Test Suite [1]. This test suite reflects the XQuery Update Facility 1.0 Candidate Recommendation [2] that was published on June 9.

We are pleased to have received results from Saxonica. We'd like to encourage other implementators to submit their results to us, so that we can advance XQuery Update Facility to W3C Recommendation.

                                               -- Andrew

[1] XQuery Update Facility Test Suite
http://dev.w3.org/2007/xquery-update-10-test-suite/

[2] XQuery Update Facility 1.0
http://www.w3.org/TR/2009/CR-xquery-update-10-20090609/

--------------------
Andrew Eisenberg
IBM
4 Technology Park Drive
Westford, MA  01886

andrew.eisenberg <at> us.ibm.com
Fraser Goffin | 3 Nov 00:30 2009

Re: Shredding XML

Hi Jim,

thats interesting ... which should be the 'driving' schema, XML or Db ?

I guess I've been somewhat tiptoe'ing around this one.

I should admit my bias if its not already apparent. I work mainly in
the SOA integration space and since XML is the primary exchange format
and XML schema does a reasonable job as the type system, I favour
processing XML as .. well XML ... whilst I understand the argument
around leveraging existing technologies and skillset .. often-times
this is little more than protectionism and continually [de]composing
from XML to objects then to CopyLibs and then to relational just seems
unnecessary a lot of the time (sorry - soap box over).  But of course
the whole world isn't XML and just like most other large organisations
the vast majority of our processing capability and data isn't and
probably never will be ... I have no issue with that.

On the one hand it is the end product that drives the design (even if
that design has a relatively short shelf-life ... but hey, we all do
agile right). In that case it is definately the Database schema that
prevails from the pure delivery point of view, since this is the
desired source for the staging area from which to produce interface
files for upstream applications. At present there appears to be no
possiblility of revisiting that choice. At the same time, I don't want
to 'paint myself into a corner' or promote this as an exemplar for all
future approaches (unless it turns out that way :-)

My unease is around the brittleness of the database schema in the face
of change, but I suppose that situation is almost inevitable since I
can't crystal-ball what changes might be coming along next week and
its probably folly to try. XML changes that dynamic, but not
completely.

I have been having this internal debate about, .. if I concede I'm
going to have a relational database then should its design be derived
from the XML schema or should the XML schema change to accomodate the
database, indeed one of the Solution Designers on this has already
indicated a desire to 'flatten' the XML schema (although I have to say
I disagree that it is necessary). I have some degree of opportunity to
change the XML schema (although messages
are received from external sources, within reason, I can transform
them into any 'shape' I like so long as thats a loss-less exchange).
The database is green field so it can be any shape, but clearly some
designs are going to lend themsleves better than others to XML mapping
I would have thought ?

Surely there are some structures in XML that don't map
straight-forwardly. Ted Neward called this the 'last mile' (a familiar
term to us all I'm sure), where the illusion of a high fidelity
solution draws us in, and indeed 80%+ appears to go quite well, but
that last few % hold a disproportiate cost and increasing complexity
(but you don't realise that until late on at which point some are
going to object to a rethink). I want to know where that 'last mile'
lives so I can try and avoid it !

Fraser.

2009/11/2 Jim Tivy <jimt <at> bluestream.com>:
> Fraser
>
> I am not entirely hearing firm commitment that you plan to establish an RDB
> schema and make it the driving schema.  In other words, what this would mean
> is that data elements cannot be put into the RDB unless they exist in the
> RDB schema.  For example, if some new data elements show up in some external
> XML to be imported then the DBA decides whether to allow them into the
> appropriate RDB column or not, or drop them for the time being.
>
> Another option (from the infinite number) would be to let the XML schema
> generate the RDB schema and the mapping code.  For your application
> programmers using SQL on the RDB this would likely lead to gagging and
> hacking and an "out of body experience"  This is not something I would
> recommend and if this is what you want then get a database that supports
> XQuery and retrain your developers.
>
> But I think you have to choose between these two - the first being what it
> sounds like you want - then work backwards from that decision.
>
> Jim
>
> -----Original Message-----
> From: Fraser Goffin [mailto:goffinf <at> googlemail.com]
> Sent: Monday, November 02, 2009 12:22 PM
> To: xml-dev <at> lists.xml.org
> Subject: Re: [xml-dev] Shredding XML
>
> Yes Jim, that is spot on.
>
> Whilst there has been much discussion thus far on the technolgies and
> techniques of getting data out of the database (and that has been
> interesting), the programming for doing so are 'bread and butter' to
> our mainframe Cobol and Sapiens guys, so thats not really my problem.
>
> Mine is the task of getting the data from a fairly complex XML content
> model into an appropriately factored relational database. The design
> of that database is 'green field' but (and thanks to many on this
> thread who have posted related papers) this may not be as easy at it
> might at first appear, what with impedence mismatches here there and
> everywhere ;-)
>
> Its also the case that the XML data doesn't contain enough data
> inherently to represent primary or foreign key values for all of the
> relationships that are likely to arise. In some cases I MAY be
> permitted to generate them myself (say using a UUID) as I 'walk' the
> XML, in other cases I MAY be required to get the database to provide
> the value(s), not sure yet. The later may increase the complexity
> somewhat (sidenote: our DBAs don't allow stored procs (don't ask)  so
> I'm going to be doing whole bunches of INSERTs as part of the
> tree-walk I suspect)
>
> I'm really interested in the gotchas and best practices. Some have
> already been mentioned like the fact that the XML schema may define
> optional items and unrestricted length facets and such like. Others
> I've seen in reading talk about the mis-match of identity approaches
> (although this was talking primarily about OO/Relational mapping but
> the idea is similar I suspect). This could be important, since some
> messages received may 'relate' to others already loaded and, given
> what I said about not having all of the data in the XML to form all of
> the keys, this might be a significant problem.
>
> It is my intention to look into other options (we have recently
> acquired DB2 v9 which includes pureXML) but as is so often the case,
> the immediate project delivery pressures won't allow it. The PM is
> very nervous about using any new tech, perhaps justifiably, but my
> sense of unease is more to do with the perhaps misplaced assumption
> that 'tried and tested' tech like relational databases will always
> provide a workable solution, imho sometimes they actually represent
> the most significant constraint.
>
> So yes, back to the actual problem. How to come up with a database
> design that provides the capability of staging the shredded XML in a
> reasonable efficient manner and enables it to be loaded from XML
> instances received, again efficiently (ideally without 100's of tables
> and joins to negotiate). As far as efficiency of storage, well that
> MAY be a concern although perhaps not a huge one so long as the Db
> doesn't bloat up too much if normalisation is preferred over extra
> tables.
>
> Please add your thoughts and suggestions and experiences as you are
> able. Nothing is too trivial (or rude) to mention (i.e. if you want to
> say don't do this if you want to keep your sanity, thats ok).
>
> regards
>
> Fraser.
>
> I'm
>
>
> 2009/11/1 Jim Tivy <jimt <at> bluestream.com>:
>> Interesting post, but I am not sure that "now is the time to talk of many
>> things".
>>
>> Let me try to focus:
>>
>> Proper software execution comes from the choice of appropriate
>> actions/technologies to match the driving requirements.  But more
>> importantly, the greatest Wisdom, is to frame the driving requirements
>> correctly before "going off half cocked" or doing something that is
>> unnecessary and unwarranted.
>>
>> So lets start by framing the requirements again:
>>
>> Fraser Gofin wrote:
>>
>> "
>> The basics are we receive XML messages from an external trading partner
> and
>> process those messages, enriching and routing to a number of internal
>> subscriber applications. One of these applications is MI and the deal here
>> is that they want the data to been put into a relational database so that
>> they can create a number of interfaces 'files' which are sent to still
> more
>> applications.
>> "
>>
>> OR
>>
>> "
>> I am mainly interested in the process of LOADING XML data to a database
>> rather than extracting (at least for the purposes of this discussion).
>> "
>>
>> It is possible that the "mother persistent application datamodel" is
>> contained in the relational database in all its normalized glory.  If so,
>> then, "processing the messages" is simply a "data import" operation.  So
> the
>> question is, how do I get XML X* to tables T*.  It would strike me that
> lots
>> of people are doing this.  Are there common techniques and technologies
> for
>> doing this import?
>>
>> Fraser, is that a proper framing of the question/requirements?
>>
>> Jim
>>
>>
>> -----Original Message-----
>> From: Petite Abeille [mailto:petite.abeille <at> gmail.com]
>> Sent: Sunday, November 01, 2009 9:56 AM
>> To: xml-dev <at> lists.xml.org
>> Subject: Re: [xml-dev] Shredding XML
>>
>>
>> On Oct 29, 2009, at 10:20 PM, Fraser Goffin wrote:
>>
>>> opinions on the subject of decomposing XML into relational databases
>>
>> Outside of the most trivial case, this is a major PITA of the same
>> epic proportion as the object-relational one:
>>
>> http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
>>
>> Good luck.
>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
>> subscribe: xml-dev-subscribe <at> lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
>> subscribe: xml-dev-subscribe <at> lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
> subscribe: xml-dev-subscribe <at> lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>
>

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe <at> lists.xml.org
subscribe: xml-dev-subscribe <at> lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


Gmane