Webhelp: My adventures therein
Mary Tabasko <tabasko <at> telerama.com>
2014-09-09 00:45:06 GMT
Here is the promised write-up of my adventures with Webhelp. It is long,
so if you don't care, don't bother reading any further! But I hope some
of you find it helpful. I apologize for the length, but this way, at least
it's one big message for those who don't care, not a bunch of irrelevant
For one of our products, we have a doc set that currently consists of
three PDFs and two Microsoft Help CHM files.
Admin guide: 679 pges (13.1 MB)
User Guide: 792 pages (61.6 MB)
What's New: 36 pages (2 MB)
Admin Helpset: 72.5 MB (includes content of
Admin and User Guides, and What's New)
User Helpset: 62.9 MB (includes content of User Guide)
We have been have had issues with the ancient MSHelp compiler
over the ages, and have been getting increasingly worried about
its continued viability. It does some strange things on 64-bit
systems. So we have been looking to replace it.
These documents (and many more) are all built using a homegrown
toolchain. The documents are mostly written in DocBook (v. 4.4) and
converted into various formats using the DocBook stylesheets and
customizations. (Some are written in other XML that we convert to
DocBook 4.4 using some combination of Perl and XSL.)
We use Ant, XSLTproc, XEP, Perl, and various other tools to build our
docs on both local "development" systems (desktops) and on our
build system, with nightly and on-demand builds. We have an entire
set of XSL stylesheets that customize the DocBook stylesheets for
our "corporate" and "product" styles, and then each project may have
a project-specifc stylesheet that tweaks the corporate ones. So a
project's stylesheet may import a corporate stylesheet, which in turn
imports the DocBook ones. Or a project sheet may go straight to DocBook XSL.
Due to corporate restrictions, it is generally not easy to upgrade
things, so we tend to not bother unless we really have to. As a
result, we had been using DocBook 4.4 and DocBook XSL 1.74.3 for
While researching options to replace the MSHelp format, we found
nothing that was both suitable and corporately allowable until we
noticed that Oxygen (one of the XML editors we have in-house)
had a "help" format that looked intriguing. After digging into
it, we discovered that it was based on the webhelp transforms
in DocBook XSL 1.76.0. Based on some experiments with the stylsheets
in Oxygen, we bit the bullet to get the latest and greatest
DocBook XSL release. The format looked like it would do a lot
of what we wanted, and it was based on the already-established
toolchain, so we wouldn't have corporate issues. Could
make it do what we wanted?
We were eventually able to create webhelp docsets that we could
use to replace our CHM archives, but it was non-trivial. The
rest of this describes some of the issues we encountered and how
we addressed, or didn't address, them. But without the DocBook XSL,
we would have been SOL. :) So thank you all again for this wonderful
DocBook 4.4 DTD
Docbook XSL 1.78.1
XSLTProc using libxml2 2.7.3; libxslt 1.1.24
Xalan (for indexing): Xalan-J 2.7.1
Perl, Ant, homegrown XSL, and other supporting players
Issue with the "Content-Type" meta element.
A "meta" element for "Content-Type" is written into each
of our HTML documents; it has the form of an "open tag":
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">,
but it is stand-alone.
The search indexer balks at this (and any other unclosed tags), and
indexing fails. Changing the element to <meta ... /> solves
the problem. I haven't been able to figure out where
this comes from in the XSL transforms, so I was not able
to use XSLT to fix it. (This may be an artifact of some of our
I ended up writing a trivial Perl script that would be
run on all the generated HTML files before the search-indexing
step, to change <meta ...> into <meta .../>. Inelegant, but
effective. This turned out to be really useful later....
Issues with the sidebar TOC.
The generation of the sidebar TOC for each HTML page bogs down
the processing on large documents.
Generating the HTML for our old HTMLHelp format takes less than
2 minutes on our largest doc. When I ran that doc through the
Webhelp transform, it OOMed after 6 hours. I noticed that the
default chunking level was much higher than what we used for
HTMLHelp and wondered if that might be part of the problem. When
I changed it, the processing completed successfully in about 2 hours.
(That would still be a show-stopper for our nightly builds.)
But that the time it took to process was so strongly related to
the number of files it was creating made me suspect the sidebar
TOC was the culprit. (I have to admit that it never occurred to
me to look for a bug report. I didn't find that until much later!)
It took some investigation to determine that the TOC generation
was indeed the problem, but once I narrowed it down, I split the
HTML generation into two steps. I lifted the template that
generates the sidebar TOC into a separate stylesheet, and
pre-generated a single file containing the sidebar TOC
(the <ul id="filetree"> list) as a preliminary step.
When generating the chunked HTML, instead of regenerating the
TOC for each file, we simply read in the pre-generated file.
Two issues with this:
1. I needed to use the "generate.consistent.ids"
parameter to keep the generated IDs in sync between
generating the sidebar TOC and the standard HTML. I had
never encountered that parameter before; I was worried I would
have to solve this myself, so yay again for the stylesheets!
(These generated IDs caused another issue, though, described later.)
2. Since the TOC was pre-generated once, we lost the insertion
of the "webhelp-currentid" attribute for each file. We were
willing to take that loss if necessary, especially given
that the ToC doesn't "stick" (bug 1226, which we did not attempt
to address). But it wasn't.
Since I already had a Perl script that would be run on all the
generated HTML (to fix the "meta" element mentioned above),
it was trivial to add a step to reinstate the "webhelp-currentid"
attribute at the right place in each file.
Handling the sidebar TOC this way kept the processing time to
under 2 minutes with no loss of functionality. I realize that this
is NOT a general solution and probably not suitable for everyone, but
given our build environment and the tools we have available, this was
expedient and fit into our "ecosystem" just fine.
This doesn't address the issue of embedding this TOC in every file.
(I hadn't seen the proposed solution noted in bug 1259 before implementing
my solution, and I'm not sure I'd be allowed to just download it
We are seeing some issues with the "expand/collapse" indicators on the
with values like "collapsable" and "expandable" to indicate the
state of the TOc entry (embedded lists). We often see expanded
lists given the attribute "expandable" rather than "collapsable",
which means that the "rollup" indicators are incorrect. This seems to
happen mostly with pointers to sections inside pages, so I suspect
that this is an interplay between the chunking level and the
goes to a separate page (or at least that no page contains more than
one level of expandable sections). I tried to run this down
to the source (the stylesheets only provide the minified JS library),
but it looks like this library went out of support in 2010 and is
no longer being maintained. (Because of corporate policies, I can't
casually download the original JS library.) Since this affects only
the visual collapse/expand indicators, not the functionality, we are
willing to live with it for now.
Issues with links to local (within-page) IDs.
We noted that within-page links did not work. We found the
messages on the docbook-apps list about this, and tried
commenting out the salient block in the "main.js" file. This
fixed the problem for most links within a page (those within "content".
(We tried using the fix in the later snapshot, but we didn't see any
We also noted another problem with generated links from the sidebar TOC.
If you were on a page like, say, "bk01.html" and tried to navigate
to "bk02ch01s04#id-184.108.40.206.6" (a totally made-up id value, but
the format is what we got), the correct page and local link would load
(that is, the new page would be scrolled to the local link), but the
sidebar disappeared, and the sidebar toggle would not bring it back.
(Clicking the Next link followed by the Previous link would restore
it, but the direct navigation from the sidebar TOC always clobbered the
The problem only occurred with generated IDs. Navigating from the
sidebar TOC on "bk01.html" to "bk02ch01s04#using-passwords" worked
fine. Looking at the gross structure of the links in the sidebar
TOC revealed no differences. The difference had to be in the structure
of the values of the IDs.
By default, the "object.id" template with "generate.consistent.ids"
set makes values like "id-220.127.116.11". I played around with these values
a bit and determined that changing the "dots" to "dashes" solved the
problem. That is, links with id values like "id-4-2-6-3" worked just
fine. (The original ids work fine within the content block; it's only
using them from the sidebar TOC that causes the problem.
I could find no way to tell the "generate-id" function to alter this
structure, so I had to override "object.id" and do it myself. (The
attempted to find it. The browser follows the links fine.)
For completeness, I put "." characters into a couple of our explicitly
provided IDs and the links to them. They then exhibit the same problem:
the sidebar does not appear when you traverse to such an ID. (This
was not a browser-specific problem, either.)
Note: Unless you have "." in your explicit IDs or have set
"generate.consistent.ids" for some other reason, this issue wouldn't affect
anyone who didn't generate the sidebar TOC separately like we did.
Issues with styling and layout.
The webhelp XSL templates provide some customization mechanisms, but
we found that we often needed to override pieces that provided no
handy hooks. And having our CSS file as the first one in the doc
header meant that it was constantly fighting with the "built-in"
stylesheets ("positioning.css", the Jquery stylesheets, and the
CSS elements embedded right into the pages). There were some CSS
items we could not figure out how to override using just our stylesheet.
We spent a lot of hours simply trying to figure out where some bit of
styling was coming from, and then more time trying to figure out how
to override it. I eventually decided that trying to work around that
wasn't worth the effort.
In the end, I ended up taking apart the "user.head.content" template
in "webhelp-common.xsl" and refactoring it. I tried to use only the
customization hooks that were provided, but I just couldn't do it. :)
I broke "user.head.content" into several smaller templates (one to insert
original template simply to call the other templates. That way, I could
selectively override the parts I wanted/needed to. I could then easily
import our stylesheet last, which let me move all of the CSS elements that
were being embedded into each page into the our CSS instead (and change
This made styling the documents MUCH easier. It also meant that
I didn't have a big blob of CSS repeated in every HTML page.
We wanted to change the layout of the items in the header, like the
nav bar, to be consistent with other collateral we have. MOre overrides.
I also found it necessary to parameterize some of the other templates
(like "user.header.content") called from "chunk-element-content".
I ended up overriding a LOT of stuff. Again, these changes are probably
not ideal as general approaches (though I think breaking up some of the
big templates and refactoring them, and maybe adding more parameterized
customization hooks, are, but most of my fixes were geared toward solving
my specific problem in my specific environment.
I also found that we had to alter some of the colors embedded into
"main.js" to get the effects we wanted. I really didn't want to have
to change "main.js", but we couldn't find any other way to
get the changes we wanted. (This was before we discovered the
local-link issue that required us to change the file anyway.)
There was no elegant way to override some of the JQuery styling,
particularly replacing images. We simply had to replace their image
is responsible for getting the images in. We created a "customization
template" (a directory with the same structure as the template, but with
our project-specific variants (images, main.js) in it, and we simply
slapped this on top of the template from the stylesheets when building
The one thing that really drove us crazy was the fact that we could not
figure out how to change the size of that header. We tried a bunch
of different things, and in the end, we just dealt with what we had.
I'm sure it's in that JQuery UI Layout stuff, but none of us was familar
with that package, and we just didn't have the time to try to sort it out.
I would be more than happy to share the customizations we made to the
stylesheets, my Perl script and so on, if anyone is interested in seeing
them. Like I've said, my solutions are probably NOT general-purpose
solutions, but they worked for us and may be helpful to some of you.
I can also send a screen-shot of what our final output looks like.
I don't want to send this out generally, since I suspect most readers
of this list are not interested.
-- 30 --