Re: which version control system to take?
On Mar 1, 2008, at 6:20 AM, Christoph Held wrote:
> I am very serious about starting using version control for academic
> writing. And it is not as if you had nothing to do with it. When
> doing some reading along these lines I stumbled about some of your
> articles in Practex journal. Although you advocated using
> Subversion back then, the message for me was to use any kind of
> versioning system at all.
> The last time doing a manual merge of three documents from
> different authors was an absolute nightmare and took a lot of time.
> Even if I were using version control for myself only, I'd imagine I
> could still put it to good use by feeding it my coauthors works
> after converting them to plain text myself. This is actually one of
> my reasons I am favouring MultiMarkdown over pure Latex at the
> moment as it is human readable and easy to convert to *.rtf or eben
> *.doc format.
I use SVN for all my coding projects, but also for academic writing.
WRT latter, if I am the sole author, then I write in either LaTeX or
reStructuredText, and use SVN pretty much exactly like I would for a
coding project. In cases where I am not the sole author (which
happens to be most of my work at the moment), I have found that the
critical distinctions are (1) are my coauthors willing to work with
text files (e.g., LaTeX, reST, or some other markup), and (2) are
they willing and/or interested in learning Subversion? This yields 4
(i) If your coauthors are willing to work with text files, then
everything is still pretty straightforward. Even if they don't
access the repository themselves (or perhaps just have read-only
access), you can still distribute copies of exactly what's in the
repository to them, and handle the files you get back from them using
standard diff and merge tools.
(ii) Unfortunately, it can be difficult unless you are in a
mathematical or computer science-related field to find coauthors who
are comfortable working with text files. Even if you're not asking
them to use LaTeX (i.e., you use some kind of simplified markup) and
are only asking them to enter their revisions into a master text file
that you have already set up, many people will still try to do this
in Word, and then want to use "track changes" to make their edits and
comments. If your coauthors insist on using Word to make their edits
and comments, then I've found the following approach works pretty
well. I still maintain the master document in LateX or reST (stored
in the repository), but then translate it into Word for distribution
to my coauthors (e.g., using LaTeX -> latex2rtf -> RTF -> open in
Word and save, but there are many other options for doing this).
When I get back comments, I check out a copy of the project at the
revision from which it was distributed, and then make the changes
manually from each coauthor. If you use separate checkouts for each
coauthor, then you can use SVN's built-in features for resolving
conflicts between their different sets of changes, rather than having
to do it manually. In addition, when I do this, I always use Word's
"compare documents" feature to compare what they send me to the
document I distributed to them. That way, even if they forget to use
"track changes" (or use it inconsistently), I'm sure not to miss any
You can, if you want, check in the Word document(s) you distribute
and the ones you get back from your coauthors (with their changes),
just for completeness sake. However, this will begin to fill up your
repository with a lot of binary junk, and you can't use standard diff
and merge tools with these files. Moreover, since MS Office files
are automatically modified every time you open them (even if you
don't save any changes), they're lousy for strict tracking of changes
(e.g., using diff or checksums).
(iii) While I've had little luck getting people who are used to Word
to use text files, I have had some luck getting non-technical people
to learn how to use Subversion. This is because it is easy-to-use
(at least for most standard operations), well-documented, and tools
like TortoiseSVN make it very easy for Windows users to pick up. If
your coauthors want to do this -- even if they are using Word -- it
can still be helpful, because it eliminates your having to serve as
the middle-man for exchanging files and provides you with an
automatic log of all versions. If you do this, one strategy is to
create a "Word" branch of your project, where your coauthors can
checkout the latest Word version and checkin their changes. You can
then occasionally merge the changes from this branch onto the trunk
and update the branch with your own changes (from the trunk), as
necessary. In fact, as long as you're careful not to copy files
between the branch and trunk, this also makes it easy to purge the
repository of all the Word files, once the paper has been published.
(iv) If your coauthors use LaTeX (or some other text-based markup)
*and* know or are willing to learn SVN, then you're in heaven.
Everything works just like a software project. Believe it or not,
this has occasionally happened to me.
Just a few of my own experiences -- YMMV.
P.S. IMHO, the simplicity of Subversion, its excellent documentation,
and the existence of graphical clients like TortoiseSVN (for those
coauthors who aren't comfortable working at the command line) should
not be overlooked, especially for projects like academic writing (or
any writing, for that matter). In addition, if you are going to
essentially have your own, private repository, then the benefits of
distributed version control become much less important, if at all
(i.e., you can just keep a copy of your entire repository on your