Note on current limitations of tex4ht for LyX --> Word conversion
stefano franchi <stefano.franchi <at> gmail.com>
2014-03-04 15:03:33 GMT
As further info on our current discussion of the LyX --> Word
conversion, here are some of the problems I encountered over the
weekend when converting a ~50,000 word lyx file that had the features
I will eventually make a wiki page with the information contained
here, once I have a more stable and testable configuration. I was in a
rush and did not have time to take detailed note of all the problems I
ran into. The overall process took me about 12 hours, and it includes
all the dead ends I ran into. If had to do it again tomorrow (God
forbid!) , with the acquired knowledge, it would probably take me 1
hour for the conversions plus the time to redo the bibliography (see
below). On afternoon in total, I think.
Comments are welcome, and even more so reports on tex4ht use. I think
contributing to the tex4ht project (as Georg suggested) is a valid
option at this point, and I would like to put together a minimal set
of features we would like to see fully supported.
Bibliography: Biblatex plus biber with custom style (biblatex-philosophy)
Engine: XeTeX (originally it was LuaTex, but I switched to XeTeX later)
Language: Italian, with some bib refs and a few fragments in other
European languages (French, English, German)
Encoding: UTF-8, bot for lyx/tex file and bib file
Fonts: I did not really care (given the output), but I used the TeX
Gyre family for their wide coverage.
I struggled with different, often orthogonal problems, mostly focused
on fonts/encoding/language, as reported below. In the end, the final
but only partial solution was the following:
1. Switch to standard book class
2. Force the use of babel instead of polyglossia
3. Use biblatex standard styles
3. Do not print out the bibliography.
This meant that I was eventually successful with the body text
conversion (including the references in the text), but then I had to
cut and paste the properly organized bibliography from the pdf file,
and reformat manually, as all semantic formatting with such an
operation (e.g. emphasis and small caps)). I tried both free and
proprietary pdf-->word conversions with poor results. Most failed due
to the encoding, I guess. Acrobat Pro succeeded, but the Word file it
produced was of of such a poor quality that I quickly realized it
would be faster to work from the text produced from the cut and paste.
A word/libreoffice guru (which I am certainly not) may disagree on
Main problems encountered in conversion:
tex4ht supports xetex, but not fontspec (an experimental version is
available, but it is not even at alpha stage). It needs TeX fonts,
not OTF. This limitation brought problems with Latin, non Ascii texts.
Whereas XeTeX compilation with OTF version of TeX Gyre's family of
fonts was flawless, the TeX version had issues with the guillemets and
with the open single quote (apostrophe).
I tried most ot the standard TEx fonts (cm, ecm, latin modern, and
they all had similar issues (although not the *same* issues. That is,
not always was the same character missing).
2. Language support
- I had issues with the language support for Italian in one biblatex
style (philosophy-modern, which is really the standard style for
Italian publishing in general, not just in philosophy)
- I could not get polyglossia to work. In the end I forced the use of babel
3. limited support for memoir.
I actually could not compile with memoir and had to switch to the book
standard class, which meant more reformatting in the final output.
Memoir support is still very limited, a more comprehensive .4ht file
should be provided
4. All too generous use of section breaks in output: tex4ht inserts
block quotes in their own sections (in MS Word parlance). This causes
a lot of problems: typically a new page break is enforced after a
section break. I had to manually erase all the breaks. A custom tex4ht
configuration file should take care of thsi problem, but I did not
look into this any further.
Associate Research Professor
Department of Hispanic Studies Ph: +1 (979) 845-2125
Texas A&M University Fax: +1 (979) 845-6421
College Station, Texas, USA
stefano <at> tamu.edu