Import into LyX
Rob Oakes <Rob.Oakes <at> oak-tree.us>
2012-02-01 19:59:33 GMT
Dear Users and Developers,
Some time ago, I was experimenting with importing documents into LyX
(specifically about how to crack the import MS Word to LyX nut). In
the process, I got really excited about using OpenOffice to convert
the word document to HTML, running tidy on the HTML and then
importing that way. (The original blog article about this can be
found at
http://blog.oak-tree.us/index.php/2010/05/14/msword-lyx-import.)
Since I'm (re)writing a book chapter about this topic, I thought
that I would look at alternative strategies for importing Word (and
other file formats) into LyX. While doing research, I came across a
(potentially) much better solution.
Somewhat recently (in 2010), a group of Python libraries were
written that handle document conversions. They are part of the
epub-tools library (
http://code.google.com/p/epub-tools/). (I've
been experimenting with ePub document creation from LyX, which is
how I found them.)
One of the tools in the library is able to parse Microsoft word
documents and convert them to XHTML in preparation for generating an
ePub file. I think that the tool can be adapted for directly
converting Word docs to LyX. Not to LaTeX and then to LyX, but directly
to LyX.
I'm putting together a library to experiment with direct conversions
(this is ostensibly being done for the never-ending book project,
but will be released as open code), but before getting too deep into
development, I wanted to poll:
- Is this a tool that would prove useful to yourselves, your
collaborators, and others?
- What features would you consider essential?
(Right now, styles based conversion looks pretty easy -- going
from Heading 1 in Word to Chapter, for example. But I'm not sure
how well it would convert maths. This is something I'll still
need to look at, and may require writing an additional module.)
- What is the best tool to look at for guidance in creating a
new script for word2lyx? tex2lyx?
- Does the script need to support special cases, such as
importing Word "track changes"?
- Just how important do you consider "round-tripping" a
document, e.g., going from LyX to Word and back to LyX.
- Is there anyone who might be interested in collaborating on
this?
Any thoughts would be greatly appreciated.
Cheers,
Rob Oakes