Jonas Latt | 5 Jan 2009 23:50
Picon

Split a document into multiple HTML pages

I am playing around with docutils and like its straightforward interface and
ease of use. I am however stuck with the following problem. I would like to
generate a fairly long html documentation from rst source. Because of the length
of the document, it would be good to start a new html file for each section
instead of flushing everything into a single file. Although I understand that I
could simply split the rst source into multiple files an apply docutils
iteratively to each of them, I do not think that this would be sufficient for my
purposes. Indeed, I would like to have a consistent table of contents which
refers to all sections of the document, and I wouldn't know how to generate such
a table with this iterative approach.

Instead, I would guess that I need to parse the whole document at once, extract
the document tree and somehow split it up, section after section. I am however
not familiar with docutils and wonder how to do this best. Should I write a new
transformer? Or a new writer? Or something else? Any hint or suggestion on this
topic is very much appreciated.

------------------------------------------------------------------------------
David Goodger | 6 Jan 2009 01:56
Favicon

Re: Split a document into multiple HTML pages

There's rst2chunkedhtml in the sandbox:
http://docutils.sourceforge.net/sandbox/rst2chunkedhtml/
I don't know if it works with the current codebase though.

There may be other solutions I don't recall.

--

-- 
David Goodger <http://python.net/~goodger>

------------------------------------------------------------------------------
Michael Foord | 6 Jan 2009 12:00
Picon
Favicon
Gravatar

Re: Split a document into multiple HTML pages

Hey Jonas,

You may want to look at Sphinx which can build tables of contents for 
multiple input rest docs. It can also build indexes.

http://sphinx.pocoo.org/

Michael Foord

Jonas Latt wrote:
> I am playing around with docutils and like its straightforward interface and
> ease of use. I am however stuck with the following problem. I would like to
> generate a fairly long html documentation from rst source. Because of the length
> of the document, it would be good to start a new html file for each section
> instead of flushing everything into a single file. Although I understand that I
> could simply split the rst source into multiple files an apply docutils
> iteratively to each of them, I do not think that this would be sufficient for my
> purposes. Indeed, I would like to have a consistent table of contents which
> refers to all sections of the document, and I wouldn't know how to generate such
> a table with this iterative approach.
>
> Instead, I would guess that I need to parse the whole document at once, extract
> the document tree and somehow split it up, section after section. I am however
> not familiar with docutils and wonder how to do this best. Should I write a new
> transformer? Or a new writer? Or something else? Any hint or suggestion on this
> topic is very much appreciated.
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
(Continue reading)

Jonas Latt | 6 Jan 2009 16:46
Picon

Re: Split a document into multiple HTML pages

Hello David and Michael,

Thank you for the useful references to rst2chunkedhtml and to Sphinx. After careful reading of the documentation, I understand that Sphinx, with its template-based configuration mechanisms, is exactly the tool I was looking for.

Thanks again,
Jonas


On Tue, Jan 6, 2009 at 12:00 PM, Michael Foord <fuzzyman <at> voidspace.org.uk> wrote:
Hey Jonas,

You may want to look at Sphinx which can build tables of contents for multiple input rest docs. It can also build indexes.

http://sphinx.pocoo.org/

Michael Foord

Jonas Latt wrote:
I am playing around with docutils and like its straightforward interface and
ease of use. I am however stuck with the following problem. I would like to
generate a fairly long html documentation from rst source. Because of the length
of the document, it would be good to start a new html file for each section
instead of flushing everything into a single file. Although I understand that I
could simply split the rst source into multiple files an apply docutils
iteratively to each of them, I do not think that this would be sufficient for my
purposes. Indeed, I would like to have a consistent table of contents which
refers to all sections of the document, and I wouldn't know how to generate such
a table with this iterative approach.

Instead, I would guess that I need to parse the whole document at once, extract
the document tree and somehow split it up, section after section. I am however
not familiar with docutils and wonder how to do this best. Should I write a new
transformer? Or a new writer? Or something else? Any hint or suggestion on this
topic is very much appreciated.


------------------------------------------------------------------------------
_______________________________________________
Docutils-develop mailing list
Docutils-develop <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.
 


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog



------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
Dave Kuhlman | 7 Jan 2009 20:21

Re: rst2odt.py/odtwriter added to main branch

> From: David Goodger <goodger <at> python.org>
> To: Dave Kuhlman <dkuhlman <at> pacbell.net>
> Cc: Docutils-develop <at> lists.sourceforge.net
> Sent: Saturday, December 27, 2008 2:31:00 PM
> Subject: Re: [Docutils-develop] rst2odt.py/odtwriter added to main branch

[snip]

I've made a few fixes in response to David's suggestions.  A few
notes about these changes are below.

> There are some issues though. While I think it was premature to add, I
> understand the reasons. You did propose the addition 10 days ago, and
> I apologize for not replying earlier.
>
> Most importantly, there's no copyright/license statement in
> docutils/writers/odf_odt/__init__.py. Please add a public domain
> statement as found in other files. If you don't intend to put the
> writer in the public domain, please remove it immediately and let's
> discuss first.

Added::

    # Copyright: This module has been placed in the public domain.

>
> I see two test failures:
>
> FAIL: test_odt_basic (test_writers.test_odt.DocutilsOdtTestCase)
> FAIL: test_odt_tables1 (test_writers.test_odt.DocutilsOdtTestCase)
>

I do not see these errors.  When I run alltests.py, I see::

    ....................................................................................
    ----------------------------------------------------------------------
    Ran 1095 tests in 14.630s

    OK
    Elapsed time: 15.348 seconds

The result of running rst2odt.py is a binary (Zipped) file, from
which I extract and an XML file.  The test is a comparison to see
if this is identitical to a previously generated file.  Could this
be a platform dependency?  I'm on a 64-bit Ubuntu GNU/Linux system.

I just now did a fresh checkout of the docutils SVN tree.  When I
go to docutils/test and run alltests.py, it says "OK".

I also have a laptop on which I can boot MS Windows/Vista.  I'll
see if I can install and run the tests there.

> Documentation issues (docs/user/odt.txt):
>
> * Since it's now part of core Docutils, is there any need for section
> 2 ("How to Install It")? I suggest deleting it.

Done.  But, I left a section describing special requirements.

>
> * I'd rename "Command line flags" to "Command line options", since not
> all are flags (i.e. some take arguments).

Done.

>
> * Why do you have both --stylesheet and --stylesheet-path options? I
> would expect rst2odt to require a local file, and wouldn't expect it
> to do any URL resolution. Please don't perpetuate the HTML/LaTeX
> stylesheet confusion currently being debated. I suggest just
> --stylesheet, but with the semantics of the current --stylesheet-path.
> Except, isn't the stylesheet inserted into the .odt file? Why "The
> path is adjusted relative to the output ODF file"?

I'll bet that I did that originally because all the other guys
(rst2html.py, rst2latex.py) did it and I wanted to be cool like
them.  But, you are right.  Only one is needed.  I've removed
--stylesheet-path and modified the help/usage message and the doc.

>
> * Instead of "--no-add-syntax-highlighting", how about
> "--no-syntax-highlighting"? Ditto for "--no-create-sections"
> (--no-sections) & "--no-create-links" (--no-links).

Done.

>
> * Under "Styles", both "styles.xml" and "styles.odt" are mentioned.
> Which is actually used (and how)? Please clarify.

Actually, you can use either.  The default is styles.odt installed
under writers/odf_odt/.  styles.xml is packed inside of styles.odt
(which is a Zip archive).  A likely scenario is to make a copy of
styles.odt, then edit it with oowriter, then use the --stylesheet
option to use your modified copy.  A less likely scenario is to
extract styles.xml from styles.odt, then modify it with a text
editor, then use the --stylesheet option to use your modified copy.

I've made a few changes to the doc to attempt to make that more
clear.

>
> * Table of contents: can't the ODT writer insert an ODT table of
> contents (possibly via an option, as with LaTeX)?

rst2odt.py can generate a table of contents, but it's not a *real*
table of contents; it's just nested bullet/enum lists and has no
page numbers.  That's why in the doc, there is this:

    Table of contents
    -----------------

    ``odtwriter`` can generate an outline style table of contents.  
    However, if you want an ``oowriter`` style table of contents along
    with the formatting control that ``oowriter`` gives you, then you
    may want to omit the ``.. contents::`` directive and, after
    generating your document, open it in ``oowriter`` and insert a
    table of contents.   That feature is under menu item::

        Insert --> Indexes and Tables --> Indexes and Tables

I could generate an oowriter table of contents with no page
numbers, but you would still have to open the document in oowriter
and ask it to update the table of contents, so as to insert the
page numbers.  Either way requires opening the generated
document in oowriter and manually doing something to the table of
contents.

I'm open to suggestions.

rst2latex.py has a --use-latex-toc option.  Perhaps I could add a
--generate-oowriter-toc option, which would cause generation of an
oowriter table of contents instead of the bullet/enumerated list
toc.  But, a user would still need to open the document in
oowriter, right click on the toc, then pick "update".

OK. I implemented the --generate-oowriter-toc option, although it
still needs work.  It inserts an empty, but real table of contents. 
So, now, if you use that option, you would need to open the
generated document in oowriter, find that generated TOC, and update
it.  I'm not sure that having to find and update a toc is any
easier than inserting one, however.

>
> * Under "Syntax highlighting", the note doesn't apply any more, and
> should be removed. But see below; the whole section probably should be
> pulled.
>
> Code issues:
>
> * The code/sourcecode/code-block directive should not be defined by a
> writer. Directives are parser constructs. If we can define a
> code-block directive that works with all writers, fine, but no
> writer-defined directive please. This has been discussed many times
> but never implemented. Please remove it from the writer until it's
> globally implemented.

Removed.

>
> * "odtwriter_plugins"? How are they used? This is something that must
> be done Docutils-wide or not at all.

Removed.

>
> * load_plugins is called at top-level, meaning that the writer cannot
> be imported for static analysis without side-effects. It (and any
> other call) should be guarded by an 'if __name__ == "main":' clause,
> at the very least. But I think all plugin support should be pulled, at
> least for now.

Removed.

>
> * OdtPygmentsFormatter etc.: I have a problem with all those classes
> being defined inside conditional blocks. Please put everything within
> the "try/except ImportError" block into a separate module; import that
> within a try/except.

Done.

>
> * In tools/rst2odt.py, there's a BinaryFileOutput class and a
> publish_cmdline_to_binary function. If these are necessary (I haven't
> examined them closely yet), they should be moved to the right places
> (docutils.io and docutils.core modules, respectively).

Moved them to docutils.io and docutils.core modules, respectively.

>
> --
> David Goodger

Additional change -- Added rst2odt_prepstyles.py to setup.py.

- Dave K.

--

Dave Kuhlman
http://www.rexx.com/~dkuhlman

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
Guenter Milde | 8 Jan 2009 10:29
Picon

latex2e --stylesheet-path option

Dear docutils developers,

Following the discussion of the odtwriter, I wonder:

Does the latex2e writer need a --stylesheet-path option?

- it makes things complicated:

  Explaining the two options and their interaction to the user is
  not straightforward.

  This holds even more if you take into account the third related
  option, --embed-stylesheet.

Use case:

A project with rst documents sorted into a hierarchy of sub-directories
and a common style file in the base dir or a sub dir::

   . base.txt
     style.tex
     docutils.conf
     A/
       a.txt
       ...
     B/
       b.txt

With the line

  stylesheet-path: style.tex           

in docutils.conf, all documents will get a valid link to the style file,
if the conversion is started from the base dir.

Alternatively, the combination

  stylesheet: style.tex
  embed-styleheet: yes

would result in the same end (PDF) documents.

Günter

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
Docutils-develop mailing list
Docutils-develop <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.
David Goodger | 8 Jan 2009 15:35
Favicon

Re: latex2e --stylesheet-path option

On Thu, Jan 8, 2009 at 04:29, Guenter Milde <milde <at> users.berlios.de> wrote:
> Following the discussion of the odtwriter, I wonder:
>
> Does the latex2e writer need a --stylesheet-path option?

I don't think so.

HTML needs both, because HTML docs often refer to stylesheets by URL.
As far as I know, LaTeX doesn't work that way, but requires
stylesheets in the local filesystem. A single --stylesheet option with
the semantics of --stylesheet-path (i.e. local filesystem path) seems
like the logical solution for me.

The current stylesheet situation for LaTeX makes little sense to me.

Keep it simple, sirs.

--

-- 
David Goodger <http://python.net/~goodger>

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
Guenter Milde | 8 Jan 2009 23:39
Picon

Re: latex2e --stylesheet-path option

On 2009-01-08, David Goodger wrote:
> On Thu, Jan 8, 2009 at 04:29, Guenter Milde <milde <at> users.berlios.de> wrote:
>> Following the discussion of the odtwriter, I wonder:

>> Does the latex2e writer need a --stylesheet-path option?

> I don't think so.

> HTML needs both, because HTML docs often refer to stylesheets by URL.
> As far as I know, LaTeX doesn't work that way, but requires
> stylesheets in the local filesystem. 

While this is true, LaTeX has also

* a very rich set of pre-defined styles (packages) in a directory
  hierarchy (the TEXMFPATH)

* a powerful search mechanism to find files in this TEXMFPATH.

> A single --stylesheet option with the semantics of --stylesheet-path
> (i.e. local filesystem path) seems like the logical solution for me.

For example, to use Times Roman fonts instead of the default
Computer Modern, one specifies 

  --stylesheet=mathptmx

which becomes 

  \usepackage{mathptmx} 

in the LaTeX file and includes (on my system) the file

  /usr/share/texmf-texlive/tex/latex/psnfss/mathptmx.sty 

during the latex run (i.e. the PDF generation).

> The current stylesheet situation for LaTeX makes little sense to me.

As with HTML, there are 2 use cases:

1. files in the TEXMFPATH ("installed", site-wide style files (standard
   or local))

   * give only the filename
   * include literally

   --stylesheet

2. files outside the TEXMFPATH (not installed, local style files)

   * a relative path should be rewritten if the output document is in a
     different dir than the pwd

   --stylesheet-path  

> Keep it simple, sirs.

I'd like to, but I'd rather drop the --styleheet-path semantics than the
literal inclusion.

Günter

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
Docutils-develop mailing list
Docutils-develop <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.
Dave Kuhlman | 9 Jan 2009 02:34

Comments on rst.el -- and a problem?

First, thanks much for rst.el.  It makes my work much
easier and more pleasant.

Next, I had a problem using the rst.el from latest SVN.
When editing a .rst/.txt file, I get this message:
"Stack overflow in regexp matcher"

In rst.el, I found this::

     ;; There seems to be a bug leading to error "Stack overflow in regexp
     ;; matcher" when "|" or "\\*" are the characters searched for
     (re-imendbeg
      (if (< emacs-major-version 21)
          "]"
        "\\]\\|\\\\."))

I changed that last line to the following::

        "\\][|]\\\\."))

which seemed to fix the problem.

Or, alternatively, adding the following to my .emacs file 
eliminated the problem::

    (setq font-lock-global-modes '(not rst-mode))

However, I'm guessing that doing this just disabled the
feature and eliminated the use of the regexp.

I'm using Emacs 22.2.1 on Ubuntu GNU/Linux.

Thanks again.  Hope this helps.  Let me know if there is
more testing I can do.

- Dave

 --

Dave Kuhlman
http://www.rexx.com/~dkuhlman

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
grubert | 12 Jan 2009 13:06
Picon
Gravatar

Re: rst2odt.py/odtwriter added to main branch

On Wed, 7 Jan 2009, Dave Kuhlman wrote:

>> FAIL: test_odt_basic (test_writers.test_odt.DocutilsOdtTestCase)
>> FAIL: test_odt_tables1 (test_writers.test_odt.DocutilsOdtTestCase)
>>
>
> I do not see these errors.  When I run alltests.py, I see::
>
>    ....................................................................................
>    ----------------------------------------------------------------------
>    Ran 1095 tests in 14.630s
>
>    OK
>    Elapsed time: 15.348 seconds

======================================================================
FAIL: test_odt_basic (test_writers.test_odt.DocutilsOdtTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 114, in test_odt_basic
     self.process_test('odt_basic.txt', 'odt_basic.odt')
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 80, in process_test
     self.assertEqual(content1, content2, msg)
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 103, in assertEqual
     first, second, msg2)
   File 
"/home/bert/projects/docutils/trunk/docutils/test/DocutilsTestSupport.py", 
line 116, in failUnlessEqual
     (msg or '%s != %s' % _format_str(first, second))
AssertionError: content.xml not equal: expected len: 1800  actual len: 
1821

======================================================================
FAIL: test_odt_tables1 (test_writers.test_odt.DocutilsOdtTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 117, in test_odt_tables1
     self.process_test('odt_tables1.txt', 'odt_tables1.odt')
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 80, in process_test
     self.assertEqual(content1, content2, msg)
   File 
"/home/bert/projects/docutils/trunk/docutils/test/test_writers/test_odt.py", 
line 103, in assertEqual
     first, second, msg2)
   File 
"/home/bert/projects/docutils/trunk/docutils/test/DocutilsTestSupport.py", 
line 116, in failUnlessEqual
     (msg or '%s != %s' % _format_str(first, second))
AssertionError: content.xml not equal: expected len: 43272  actual len: 
43293

----------------------------------------------------------------------
Ran 1095 tests

on ubuntu 8.04 python2.4+

the files start different

   <?xml version="1.0" ?><office:document-content office:version="1.0" 
xmlns:chart=

and the expected one

   <?xml version="1.0" ?><office:document-content xmlns:

cheers

--

-- 

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB

Gmane