Re: Performance question
Wolfgang Meier <wolfgangmm <at> gmail.com>
2005-09-01 08:15:32 GMT
Hi,
> I'm working with eXist since almost two years, and I'm using it in
> bioinformatics. We are starting a project where we are going to deal
> with huge volumes of XML content (my test documents have 3GB!), and we
> are going to publish little query results under request using a web
> service. Most of the XML schemas we are using were designed like a list
> of records, so content can be put in a single file or it can be
> scattered over thousands of files (10000~20000). So, my question is:
> taking into account current eXist implementation, is it better to store
> in an eXist database a single huge XML document or to store lots of
> them, but smaller? I'm mainly thinking on query speed, which is very
> important for us.
Many small documents usually have advantages if you need to update
data. Also, there's still an internal limit to the size of a single
document, but it depends on the structure of the data. For query
speed, it doesn't really make a difference if you have many small or
one large doc.
If your documents have a record-like structure, your query probably
needs to process huge node sets (maybe a few million nodes). eXist
does currently need quite a lot of memory to hold these node sets, so
it would definitely be an advantage if you can restrict your queries
to certain logical parts of the data set and organize your files and
collections accordingly. For the future, we hope to have a
query-rewriting optimizer that can optimize selections on huge node
sets automatically.
Wolfgang
(Continue reading)