Re: Exist-open Digest, Vol 85, Issue 41
<claudius.teodorescu <at> gmail.com>
2013-05-20 18:40:25 GMT
Hi,
If we do have some nice java lgpl library for such conversion, I can
add it to the EXPath module for digital publishing I am developing.
Clauxius
Sent from my iPhone
On May 20, 2013, at 19:38, exist-open-request <at> lists.sourceforge.net
wrote:
> Send Exist-open mailing list submissions to
> exist-open <at> lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/exist-open
> or, via email, send a message with subject or body 'help' to
> exist-open-request <at> lists.sourceforge.net
>
> You can reach the person managing the list at
> exist-open-owner <at> lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Exist-open digest..."
>
>
> Today's Topics:
>
> 1. Convert MS Excel to XML (Casey Jordan)
> 2. Re: Convert MS Excel to XML (Casey Jordan)
> 3. Re: Convert MS Excel to XML (Dmitriy Shabanov)
> 4. Re: Convert MS Excel to XML (Jochen Graf)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 20 May 2013 11:30:37 -0400
> From: Casey Jordan <casey.jordan <at> jorsek.com>
> Subject: [Exist-open] Convert MS Excel to XML
> To: "exist-open <at> lists.sourceforge.net ml"
> <exist-open <at> lists.sourceforge.net>
> Message-ID:
>
> <CAPAQPqc6fY=XgP4XaMLFKSJDTEOTzLrMXOiUG7YXgPQE5LektA <at> mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I seem to remember a long time ago someone mentioning that eXist had
> the
> ability to extract XML from MS Office formats like Excel. I am
> essentially
> looking to convert an Excel file into an HTML table using an automated
> process.
>
> Thanks,
>
> Casey
>
> --
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt
> from
> disclosure under applicable law. If you are not the intended
> recipient,
> please be advised that any disclosure copying, distribution, or use
> of
> the information contained herein is prohibited. If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Mon, 20 May 2013 12:13:30 -0400
> From: Casey Jordan <casey.jordan <at> jorsek.com>
> Subject: Re: [Exist-open] Convert MS Excel to XML
> To: Dannes Wessels <dannes <at> exist-db.org>,
> "exist-open <at> lists.sourceforge.net ml"
> <exist-open <at> lists.sourceforge.net>
> Message-ID:
>
> <CAPAQPqdz=2vnLdJB=XeWzP3x_XuRGTJiO2T1AZQVktudw6Kq6g <at> mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I found something to do it via Tika and ANT:
> http://en.wikibooks.org/wiki/Apache_Ant/Converting_Excel_to_XML
>
> I also read through the content extraction blog article for eXist-db:
> http://atomic.exist-db.org/blogs/eXist/ContentExtraction
>
> but I am not sure how Tika has been implemented in eXist and how to
> accomplish the same thing as above. Ideally I would want to do
> exactly what
> the above wiki-book does.
>
> How does contentextraction:stream-content() map to the underlying Tika
> processor? How do I know what node sequence is going to be passed?
> Is there
> a schema somewhere?
>
> I could sift through the java code for the content extraction module
> but
> I'd rather keep that as a last resort.
>
> Any pointers or more documentation?
>
> Thanks,
>
> Casey
>
>
> On Mon, May 20, 2013 at 12:03 PM, Dannes Wessels <dannes <at> exist-
> db.org>wrote:
>
>> Did you check with Apache Tika?
>>
>> --
>> Dannes Wessels
>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>
>> On Monday 20 May 2013 at 17:30, Casey Jordan wrote:
>>
>> I seem to remember a long time ago someone mentioning that eXist
>> had the
>> ability to extract XML from MS Office formats like Excel. I am
>> essentially
>> looking to convert an Excel file into an HTML table using an
>> automated
>> process.
>>
>> Thanks,
>>
>> Casey
>>
>> --
>> --
>> Casey Jordan
>> easyDITA a product of Jorsek LLC
>> "CaseyDJordan" on LinkedIn, Twitter & Facebook
>> (585) 348 7399
>> easydita.com
>>
>>
>> This message is intended only for the use of the Addressee(s) and may
>> contain information that is privileged, confidential, and/or exempt
>> from
>> disclosure under applicable law. If you are not the intended
>> recipient,
>> please be advised that any disclosure copying, distribution, or
>> use of
>> the information contained herein is prohibited. If you have received
>> this communication in error, please destroy all copies of the
>> message,
>> whether in electronic or hard copy format, as well as attachments,
>> and
>> immediately contact the sender by replying to this e-mail or by
>> phone.
>> Thank you.
>>
>> ---
>> ---
>> ---
>> ---------------------------------------------------------------------
>> AlienVault Unified Security Management (USM) platform delivers
>> complete
>> security visibility with the essential security capabilities.
>> Easily and
>> efficiently configure, manage, and operate all of your security
>> controls
>> from a single console and one unified framework. Download a free
>> trial.
>> http://p.sf.net/sfu/alienvault_d2d
>> _______________________________________________
>> Exist-open mailing list
>> Exist-open <at> lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>>
>>
>>
>
>
> --
> --
> Casey Jordan
> easyDITA a product of Jorsek LLC
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> (585) 348 7399
> easydita.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt
> from
> disclosure under applicable law. If you are not the intended
> recipient,
> please be advised that any disclosure copying, distribution, or use
> of
> the information contained herein is prohibited. If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 3
> Date: Mon, 20 May 2013 21:24:16 +0500
> From: Dmitriy Shabanov <shabanovd <at> gmail.com>
> Subject: Re: [Exist-open] Convert MS Excel to XML
> To: Casey Jordan <casey.jordan <at> jorsek.com>
> Cc: "exist-open <at> lists.sourceforge.net ml"
> <exist-open <at> lists.sourceforge.net>, Dannes Wessels
> <dannes <at> exist-db.org>
> Message-ID:
>
> <CADD4p=7Ms2tjS_J8728k8H4a3FU=O9XfpyTeyg1LB1YTY5XsYw <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Should be simple, parameter for function is same as
> input="${input-file}"and output is result.
>
> On Mon, May 20, 2013 at 9:13 PM, Casey Jordan
> <casey.jordan <at> jorsek.com>wrote:
>
>> I found something to do it via Tika and ANT:
>> http://en.wikibooks.org/wiki/Apache_Ant/Converting_Excel_to_XML
>>
>> I also read through the content extraction blog article for eXist-db:
>> http://atomic.exist-db.org/blogs/eXist/ContentExtraction
>>
>> but I am not sure how Tika has been implemented in eXist and how to
>> accomplish the same thing as above. Ideally I would want to do
>> exactly what
>> the above wiki-book does.
>>
>> How does contentextraction:stream-content() map to the underlying
>> Tika
>> processor? How do I know what node sequence is going to be passed?
>> Is there
>> a schema somewhere?
>>
>> I could sift through the java code for the content extraction
>> module but
>> I'd rather keep that as a last resort.
>>
>> Any pointers or more documentation?
>>
>
> --
> Dmitriy Shabanov
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 4
> Date: Mon, 20 May 2013 18:38:12 +0200
> From: Jochen Graf <jochen.graf <at> uni-koeln.de>
> Subject: Re: [Exist-open] Convert MS Excel to XML
> To: exist-open <at> lists.sourceforge.net
> Message-ID: <519A5174.4090404 <at> uni-koeln.de>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Casey,
>
> we use a small eXist XQuery Module for this. Just
> a few lines of Code:
> https://subversion.rrz.uni-koeln.de/trac/eXist-A/browser/trunk/my/eXist/extensions/modules/src/org/exist/xquery/modules/excel
> (The Code is in line with the new eXist 2.0 release.)
> It only supports *.xls files, no *.xlsx and no *.ods
> yet. You need the latest Apache POI *.jar under
> EXIST_HOME/lib/user to make it work:
> https://subversion.rrz.uni-koeln.de/trac/eXist-A/browser/trunk/my/eXist/lib/user
>
> We use it in production since few years. It works good
> as long as sheet cells contain simple values. The conversion
> sometimes fails if there are outlying formattings or
> functions in a sheet.
>
> Feel free to use it.
> Best
> Jochen
>
>> I seem to remember a long time ago someone mentioning that eXist had
>> the ability to extract XML from MS Office formats like Excel. I am
>> essentially looking to convert an Excel file into an HTML table using
>> an automated process.
>>
>> Thanks,
>>
>> Casey
>>
>> --
>> --
>> Casey Jordan
>> easyDITA a product of Jorsek LLC
>> "CaseyDJordan" on LinkedIn, Twitter & Facebook
>> (585) 348 7399
>> easydita.com <http://easydita.com>
>>
>>
>> This message is intended only for the use of the Addressee(s) and may
>> contain information that is privileged, confidential, and/or exempt
>> from
>> disclosure under applicable law. If you are not the intended
>> recipient,
>> please be advised that any disclosure copying, distribution, or
>> use of
>> the information contained herein is prohibited. If you have received
>> this communication in error, please destroy all copies of the
>> message,
>> whether in electronic or hard copy format, as well as attachments,
>> and
>> immediately contact the sender by replying to this e-mail or by
>> phone.
>> Thank you.
>>
>>
>> ---
>> ---
>> ---
>> ---------------------------------------------------------------------
>> AlienVault Unified Security Management (USM) platform delivers
>> complete
>> security visibility with the essential security capabilities.
>> Easily and
>> efficiently configure, manage, and operate all of your security
>> controls
>> from a single console and one unified framework. Download a free
>> trial.
>> http://p.sf.net/sfu/alienvault_d2d
>>
>>
>> _______________________________________________
>> Exist-open mailing list
>> Exist-open <at> lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/exist-open
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> ---
> ---
> ---
> ---------------------------------------------------------------------
> AlienVault Unified Security Management (USM) platform delivers
> complete
> security visibility with the essential security capabilities. Easily
> and
> efficiently configure, manage, and operate all of your security
> controls
> from a single console and one unified framework. Download a free
> trial.
> http://p.sf.net/sfu/alienvault_d2d
>
> ------------------------------
>
> _______________________________________________
> Exist-open mailing list
> Exist-open <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>
> End of Exist-open Digest, Vol 85, Issue 41
> ******************************************
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d