Mergeinfo is not per node
Julian Foad <julianfoad <at> btopenworld.com>
2014-07-22 11:40:13 GMT
For those interested in merging etc., a note on a recent line of thought.
For some time now I've had this idea going round my head that mergeinfo theoretically belongs to each node
separately, and that we "elide" subtree mergeinfo only for convenience, compactness, and to make it less
obtrusive and more easily understandable to the user.
It seemed a nice idea, but it's wrong. Mergeinfo is not inherently "per node".
The content of two branches is usually *different* -- that's the point of branches.
In the per-file model of branching used by CVS, for example, each file is branched, and the content of each
branch of that file can differ. This means for each file in the source tree there is one obviously
corresponding file in the target tree.
In Subversion the intention is to version trees rather than just separate files, and so two branches can
differ in tree structure as well as in file content. The changes to one file on branch B1 can correspond to
changes in two files on branch B2, or in no particular file on branch B2, and so on. A merge cannot assume
there is a 1-to-1 mapping of nodes.
Imagine the change on branch B1 at revision 100 consists of renaming a function, and updating all calls to
it. The change affects files foo.c and foo.h and bar.c. When we merge this change to the target branch B2, we
have to adjust the result, manually and/or automatically, to fit the target branch. Perhaps foo and bar
have been combined into a single file foobar.c on branch B2, and so the change affects only foobar.c. This
does not mean foobar.c alone has received that change, as that would imply all other nodes are still
eligible to receive that change. Rather, the information we need to track is that the target branch as a
whole has received the change as a whole.
- The merge source changes may be a selection of changes from just one subtree (or more generally a subset of
the nodes) in the source branch;
- but the target is not inherently "the corresponding subtree", it's the whole tree;
- and other target nodes/subtrees are *not* still eligible to receive this change.
With nested branching, on the other hand, mergeinfo *does* belong to a subtree of the outer branch. The
intent is to track that a change was merged into a subtree B2/D1, but there may be another subtree B2/D2
where the same change is still eligible to be merged.
- The merge source is a selected subtree;
- the target is a "corresponding" subtree;
- other target subtrees are still eligible to receive this change.
Mergeinfo belongs to the target branch as a whole, in the (common) case of a selective merge of a part of the
changes in the branch.
Mergeinfo belongs to the target subtree (as a whole) when the intent is nested branching.
In designing a revised repository model, we should not think of mergeinfo as an attribute that appears in
the model on every node and needs to be elided/normalized for storage efficiency.
On the client side, we should in future keep mergeinfo only on the branch root in most cases, more so than we do
today. We need to *distinguish* the two cases: whether the user intends to merge only a subset of the
changes in the whole branch, or to make a nested branch. To do so, we may consider heuristics (for example,
assume a subset merge is intended if there is no mergeinfo on the specified target but there is on a parent)
as well as explicit UI.
What I have been calling "mergeinfo" here is only part of the information we need for merging. We also need a
way to map nodes in the source branch to nodes in the target branch, in order to apply most of the individual
per-node changes in the source branch to the "right places" on the target branch before falling back to
conflicts and user input where this automatic attempt fails. I am starting to see this mapping as an almost
completely separate problem with its own metadata rather than something that the mergeinfo should give
us for free.