Merge History For CVS

[Up]
Note: This document is an incomplete draft of a proposal for a design for adding "merge history" to CVS. This would allow you to do multiple merges between the same two branches without getting artificial conflicts.
I have not officially announced this idea anywhere. Maybe someday I will decide to commit the time to actually implement this, and as the first steps I will finish this document and announce it on the mailing list.
Update (Jan 2004): My current thoughts are:
Avoid changing the CVS server, client/server protocol, or RCS file format. Instead, the client would automatically create, store, look for, and use merge history information in tags that conform to special naming conventions.
Simplify the big writeup below by storing minimal info about each merge (possibly only from, to, and common ancestor), not arbitrary name/value pairs. But extend the below by tracking merges between files (or even repositories) (invent a URLish scheme to ID a particular version of a particular file in a particular repository...). The theoretical directory versioning issue is still a bit thorny, though.
Change the way the client/server protocol is used so that merges actually happen in the client instead of the server. (The protocol itself would not change.)
Adopt a convention that every version of every file should have its earlier ancestor merge history easily available, while it can get lax about storing later merge history. (If we merge from a third party read-only repository, we don't need to tell that repository about it. Only if they merge from us...)
Far future: support merging and merge history between repositories managed by different types of version control systems (CVS, perforce, arch, clearcase, ...).
I am also tentatively thinking it would be easier to build a new client mostly from scratch then try to retrofit this support into the existing code base. I am very slowly working on this at this time.
July 2002:
START INTRODUCTION:
   One weakness with CVS is that it doesn't automatically keep track
of merges, so if you want to continue development on a branch
after it has already been merged into the main branch, it requires
a lot of careful manual tracking of what was merged when (using tags)
so that you can cleanly merge the new branch development later without
getting conflicts from the already merged changes.
   So I am seriously thinking about adding merge tracking capabilities
to CVS.

- This occasionally comes up in the mailing list.  It is also
  mentioned in the TODO list (item 39).
- Is there any chance this could make it into CVS proper?
  I probably won't bother unless there is some assurance that
  it will make it in as long as it is done well.
     CVS definately seems to be in a "no new features" mode, with
  only a few people working on bug fixes part time.  It doesn't
  look particularly likely for any large change to make it in,
  no matter how useful it might be...

   The brief version of this proposal is to add a newphrase to
the RCS file format to record whenever a merge is done, and
modifiy the default single -j "update -j" common-ancestor search
algorithm to take into account previous merge history.

   This idea is expanded on in a fair amount of detail below.
If anyone has any additional thoughts or concerns, let me know.

END INTRODUCTION.
==========================
HERE: Update sourceforge, add links, .

================================================
Expanded Version:

Index:
  - Introduction.
  - Use cases.
  - Future extension ideas (not intended to be in original patch)
  - UI changes brainstorm.
  - A theoretical, extendable way to represent a merge.
  - Specific ways to encode merge info in real files and protocols.
  - Internal design of changes.
  - Development plan.

================================================
Use case A. Basic:

1. Developers develop some software on the main branch.
2. It is time for release, so they create a release branch.  They
   start fixing bugs and cutting releases off the branch.
3. New features continue to be developed on the main branch.
4. You think you are done with the release branch, so you merge
   the fixes into the main branch, and you intend to drop the
   old branch.
      [When you do the 'cvs update -j', it automatically
   remembers (in your sandbox) some details about what was merged.
   When you commit it, that info is automatically stored in a
   newphrase of the RCS file.]
5. Oops, you are still supporting the old release, and someone
   just found some more problems.  So you fix the problems on
   the release branch.
6. Now you want to merge the fixes into the main branch as well.
   All you do is a simple "cvs update -j RELEASE_BRANCH", and
   it automatically takes the earlier merge into account when
   calculating which version to use as the common ancestor.
     [Specifically, it uses what was the old head of the release
   branch as the new common ancestor.]

================================================
Use case B. Backport:

1. Developers develop some software on the main branch.
2. It is time for release, so they create a release branch.  They
   start fixing bugs and cutting releases off the branch.
3. New features continue to be developed on the main branch.
4. The main branch is currently unstable, but there is one relatively
   simple feature in it that customers really want now.  So you go
   in and carfully merge the few files needed for that feature
   from the main branch to the release branch.
      [Again, merge details are transiently remembered in your
   sandbox, and then later added to the RCS file.]
5. You continue development on the main branch.
6. Now you want to merge the release branch into the main branch.
   All you do is a simple "cvs update -j RELEASE_BRANCH", and
   it automatically takes the earlier merge into account when
   calculating which version to use as the common ancestor.


================================================
Use case C. External Tools

0. [This is kind of a generalization of the "vendor branch import"
   support in cvs already, but it should avoid some of the problems,
   and support merges in both directions without conflicts.
      No changes will be made to the existing vendor branch support.
   Instead, it just provides hooks that external tools can use.
      The external tools (not part of the proposed patch) would
   treat multiple different repositories as different branches in
   one big meta-repository.  The patch would support ways for
   those tools to inject extra merge info about the merges done
   in the meta-repository.
      So the external tool would use CVS as follows:]
1. The tool starts with 2 sandboxes, tied to the source and
   destination of the merge.
     [Perhaps an advanced tool could talk client/server protocol
   directly, but we'll ignore that possibility.]
2. It searches them to find all controlled files in either one.
   Then for each file it does the following:
3. The script pulls out the full version tree and merge history for
   the file from both reposititories (using some cvs command(s) and parsing
   the output).  Then it combines the version trees and merge histories,
   and seatches them to find what to use as the common ancestor.
4. It use 'cvs update -P' to get a copy of the common ancestor, and
   perform the merge.
5. It uses some cvs commands to register the details of the merge
   in the target sandbox.  It uses an extended syntax to distinquish
   other-repository versions from local versions.
      (In might also add useful merge info fron the target
   repository to the source repository.)
6. The user can test it, fix it, and commit as usual.
   [Some scripts may want the user to use a special tool
   to commit, particularly if the target repository is for
   a system that can't acknowledge external merges.]


================================================
Mini use cases:
D1. If a patched client asks an old server to do a merge,
   the new client should warn the user that it
   won't properly record the merge: the server should be
   upgraded and the merge redone.  But continue the merge
   anyways.
D2. If an old client asks a new server to do a merge, the new server
   should warn the user that it won't properly record the merge:
   the client should be upgraded and the merge redone.  But continue
   the merge anyways.
D3. If new client commits to an old server, and there are merge
   records scheduled to be added, recover as gracefully as
   possilble:
     Maybe: abort the merge (unless -f is selected?)
     Or: Do the merge, output a warning, and leave the merge records
        in the sandbox to try again on the next commit (marked to
        the actual version instead of floating with the sandbox
        version...)?
D4. When the merge record is added to the sandboxes, refer to the
   target as "checkedout", not a specific version number.  Only
   tie to a specific version when you actually comit a specific version.
     This way the user can continue doing updates prior to committing,
   and get the right result.
D5. Fix Up Merge History: The user will sometimes want to manually
   add, remove, or change merge history information.


================================================
================================================
================================================
Future use case E: Rename a File

0. This will not be part of the patch I write.  Maybe it will be
   added in the future.
1. Developers develop some software on the main branch.
2. It is time for release, so they create a release branch.  They
   start fixing bugs and cutting releases off the branch.
3. On the main branch, you are doing massive cleanup, including
   renaming many of the files.  You use a special "cvs move"
   command to rename the files, and it remembers some specialized
   merge history information into both old and new files so
   that tools can automatically follow the rename.
4. You are done with the release branch, so you merge
   the fixes into the main branch.  You use some special
   option to 'cvs update' that causes it to follow the merge
   history back to the old file in order to the find the changes
   made on the branch.  [Of course, it also records this new merge...]
5. Then you commit, and it remembers the merge so you could do it
   again.

================================================
Future use case F: Merge Holes

0. This will not be part of the patch.
      I intend to store merge history in enough detail that
   in the future CVS could be extended to handle this case
   correctly without changing the merge history
   schema, but for now I'm not going to try to deal
   with the multiple sub-merges for one user-requested merge,
   or the possibility of N-way merge conflicts.
      Instead (see below), it will simply treat "GoodChanges" as the
   common ancestor, having the effect of erroneously removing
   the UnrelatedChanges from the main branch.  (At least the
   changes are still in version history, so they could be
   manually re-merged...)

1. Developers develop some software on the main branch.
2. It is time for release, so they create a release branch.  They
   start fixing bugs and cutting releases off the branch.
3. New features continue to be developed on the main branch.
4. The main branch is currently unstable, but there is one relatively
   simple feature in it that customers really want now.  But
   the version tree currently looks like:

     Base ------ UnrelatedChanges -------- GoodChanges
      \
       \------ BugFixes

   So while working on the release branch, the user
   does something like 'cvs update -jUnrelatedChanges -jGoodChanges FILE'.
   Now the tree looks like:

     Base ------ UnrelatedChanges -------- GoodChanges ------ MoreChanges
      \                                     |
       \                                    |
        \                                  \|/
         \------ BugFixes -------- MergedGoodChanges ------ MoreFixes

   But not depicted in this picture is the fact that UnrelatedChanges
   were *not* merged into MergedGoodChanges.
     [These diagrams are kind of sloppy in distinguishing
   actual versions from deltas between versions, but you should be
   able to get the overall idea anyways...]
5. Now you want to merge the release branch into the main branch.
   All you do is a simple "cvs update -j RELEASE_BRANCH", and
   it automatically takes the earlier merge into account when
   calculating which version to use as the common ancestor.
     The merge history and merge algorithms are smart enough that cvs
   will merge "BugFixes" and "MoreFixes" into the main branch,
   but *not* merge MergeGoodChanges (since those changes
   are already applied...).


================================================
Future miscellaneous semi-related changes:

0. None of these will be part of the patch...
   See also .../WishList.txt (HERE), which expands on some of
   these ideas and also includes some unrelated ideas.
G1. Add a per-version "mergetype" newphrase to the RCS file.
    It would be used to look up (in some config file) plugin merge
    algorithms to use instead of the default one.  Give the config
    format some way to resolve different "mergetypes" in one merge.
    Have some mechanism to configure default mergetypes for new
    files (or old files missing "mergetype").
       Always do auto merge on server, but also support sending
    the "from" and "ancestor" files to client for use by an optional
    interactive client-side merge algorithm.  It should be possible
    to configure the client to automatically invoke such a thing
    when possible.
       I also have vague ideas of implementing a generic, bidirection
    communications "tunnel" for the client/server protocol, that
    could be used by things server-side interactive merge tools,
    or an rsync-based file transfer subsystem.
G2. Add a per-version "keywordexpansion" newphrase to the RCS file
    that takes precedence over the main file setting.
    If you delete and readd a particular file name with a
    totally different type, the old and new files can have different
    keywordexpansion modes...
G3. Add a "conditionally override -k" to checkout and/or update,
    that can be generically instructed to only override -k in
    certain ways (for example, never override -kb).
G4. Add a "meta-operation log" newphrase to the RCS format that
    records details about various meta-operations like adding and
    removing tags.  Theoretically, if a tag is erroneously removed,
    this could be used to figure out specifically how to fix it.
G5. Patch management.  I've got various ideas for making it so
    CVS could help automate tracking multiple independent
    "changesets", provide me with a combined view of
    all the "changesets", yet being able to "commit" additional
    changes to a specific changeset only.  I get the impression
    that BitKeeper may have some support for this, but I don't know
    a whole lot about how it does it.
G6. Alternate/better rename idea: The files in a directory would be
    controlled by a special "CVS/dir,v" file in the repository,
    where each version of CVS/dir is a list of which RCS files
    contribute to that version of the directory (ideally it would
    only reference local RCS files, but might sometimes reference
    ones in other directories).  Carefully maintain backwards
    compatibility in the absense of a CVS/dir,v file.  (I've
    got more ideas, like how to handle hard links cleanly; this is
    just a summary.  See the wishlist link above.)
      This is conceptually cleaner then recording "merge history"
    between different files, but the need to maintain backwards
    compatibility and the significant internal redesign of CVS
    that would likely be needed makes it so this probably isn't
    practical.  It might be best to drop client/server
    compatibility for this, and overhaul a large chunk of the
    CVS code.
G7. One specific instance of a specialize multi-system merge tool:
    Be able to merge among multiple sandboxes tied to the same
    repository (usefull for things like trying changes on both
    UNIX and NT before committing.).  See HERE for a tool
    that kind of supports this, but doesn't really manage
    merging...

================================================
================================================
================================================
Possible User Interface Changes:

Some of this is very tentative.

1. "cvs update -j [-j]" will record in the sandbox that a merge occured
2. "cvs update -j" will by default use the smarter algorithm for finding
   a the common ancestor.  But you can still get the old algorithm
   with some new option (--ancestor-algorthm=treeOnly?)
3. "cvs update -C", or any time update replaces a missing file, it
   will clear out any merge history info stored in the sandbox that
   mentions the CHECKEDOUT version of that file.
4. What if 'cvs update" notices that a file is unmodified, but is
   marked as having been merged?  Should it warn the user?  Clear out
   the merge info automatically (probably not)?  Report it as modified
   (perhaps with a little m?)?
5. "cvs commit" will copy the sandbox merge history at the
   same time it is committing the file.
6. What if 'cvs commit' notices an unmodified file that is marked
   as having been merged?  Should it abort the merge (unless
   some kind of override option is set)?  Should it drop the
   merge info (probably not)?  Should it write a new (unchanged)
   version and point the merge history at that?
   Should it just add the merge info to the file, without adding
   new versions (probably best)?
7. "cvs status" should probably mention any sandbox merge history state
   somehow.
8. "cvs log" should probably add a section to report merge history.
   Should it be there by defualt?
9. Add a "cvs arrow" command with sub-commands for adding, removing,
   and listing.
      Include an option to control weather to save it in the sandbox
   (for a later commit), or send it directly to the RCS file.
      Possibly just go with "cvs log" changes for listing, but probably
   not.
      Possibly enhance "cvs admin" with new options for add/remove,
   but probably no.
10. Maybe or maybe not: A "cvs decode" command that reads the output of
    other commands (like "cvs log"), decodes the output, and then
    outputs a specificaly asked for piece of information out of it.
       This could allow scripts to easily and robustly parse the
    output of a command.  It would often be faster then adding
    detailed queries to ("cvs log") [avoid repeated overhead of
    client/server and reading a large RCS file], but not as fast as
    a program internally parsing the output.
       It could also be used to convert "escaped" strings back into
    pure binary data (see 8-bit clean discussion lower down).
11. Maybe give "cvs tag" some options for tagging versions based
    on searching the version tree?  Perhaps "tag the common ancestor
    that would be used for a specified merge", or "find the a merge
    entry that matches a query, and tag the from, to, or
    ancestor revision mentioned in that merge".
    (FUTURE, not part of initial patch.)

#HERE: Expand on the above?

================================================
================================================
================================================
Theoretical representation for merges.

  The representation I intend to use will store the version numbers
of the "from", "to" and "common ancestor" versions of the file for
every merge ever done involving that file.  Also arbitrary name
value pairs (date, comment, (see below).).
   Names will be match the RE "[A-Za-z_][A-Za-z_0-9]*" (like
C identifiers; or a subset of valid RCS "id"s.

#####
HERE: dropping default contributer to a version.  Alternate locators.
MergeName (if the user wants to give a big, important merge a name
for some reason?)

Required:
   from     - The version number of the source of the merge.
   to       - The version number of the result of the merge.
   ancestor - The common ancestor that was used for the merge.

Optional
   ignoreParent - Normally defaults to 0 (though mayb the
                  default should be 1 when it is dead?)
                    When 1, don't consider the parent of the "to"
                  version to be to be a direct ancestor of the "to"
                  version.
   comment - If the user wants a comment to go with the merge...
   id      - Some sort of name for the merge (might be queried for...)
HERE: Something to say "treat this as the direct ancestor".

A user supplied comment for the ", "id", "date", "user", ...

#####
Version numbers deserve some clarification:
1. If it looks like an RCS version number, then it refers to a
   version of the same file.
2. If it is "CHECKEDOUT", it refers to the version currently
   checked out and will be bound to a specific version when
   the file is committed.
2. If it is anything else, then it refers to something external,
   in another file, another repository, or even a completely
   different version control system.
     Naming convention for external versions is something like
   "scheme:SCHEME_SPECIFIC_INFO".  Model the after URLs: access
   method/what kind of repository, where the repository is,
   what file within that repository, what version of that file.
     Issues:
    - Relative references.  If refering to another file in the
      same repository, robustness would suggest that a relative
      path is best (then you can relocate the repository without
      breaking internal references).  But that complicates checking
      if multiple things represent the same file.  Internally,
      it would probably need to transiently normalize all
      references to absolute references...
    - UUIDs vs "current location" vs both to find files...
      Clearcase at least uses UUIDs to refer to files.  But
      the ability to look things up by UUID may be missing
      from the UI (I'm not sure), and certainly trying to
      manually read a history using UUID references would be
      difficult.
        But if you try to use "current location", how would
      tools handle it if it moved?
        One desirable property of external version references
      is that they be normalized (or easily/generically
      normalizable) so that the string can uniquely ID a node
      in the version graph without needing to know how to decod
      the string.  That doesn't work so well if you use CCase
      rename capabilities.
        A possibility is to use UUIDs in the from and to properties,
      and have seperate path_from/path_to settings to record
      the expected pathname at that location...
HERE: Expand on this some more.

#####
String representation also deserves some attention.
I think it is worth the time to carefully consider
how to encode strings so they can store any arbitrary 8 bit
data. (Although I don't think it is a good idea to actually use
that capability...)
1. Characters that need to be escaped probably include [\n\0\r"@],
   anything that is "non-printable", and whatever character is
   used for escaping.
2. Unicode: Essentially ignore it.  Just store a stream of
   bytes that is "typically" interpreted as IEEE (HERE) like the
   rest of CVS.  If someone wants to store unicode, the
   8-bit-cleanliness means they can use any encoding they want,
   though they may need to intercept and massage the output...
3. Three input encoding schemes are:
    - Fully encode a supplied, unencoded byte stream.
          When getting data directly from the user on a clean
      channel, have this as an option selectable by command line
      flags...
    - Partial encoding: Except for the escape character, escape
      all special characters.  The escape character is left as
      is, as long as it is part of a valid escape sequence.
        The idea here is to allow a defensive programming strategy
      where we always partially encode any string coming from an
      external source, even if it is *supposed* to already be
      encoded.  If it really is encoded, the partial encoding
      does nothing, but if it somehow is *not* encoded, partial
      encoding it can protect you...
    - Pre-encoded (no encoding): There should never
      be a reason to assume the string is properly pre-encoded.
      Use partial encoding instead.
4. Output encodings:
    - Internally (including in RCS files), the string is stored
      and manipulated in encoded format.
    - Normally (and especially if output as part of a larger construct),
      output strings in encoded format.
    - Perhaps a "cvs decode" command that takes an encoded
      string (either on standard input or on the command line)
      and outputs the original binary string on standard output
      would be useful?
4. Newlines are an issue:
    - When decoding, be able to ignore unescaped newlines.
    - For some forms of output, you don't want newlines at all.
      (easier parsing by scripts...)
    - For other forms of output, adding ignored real newlines next
      to the encoded newlines can make the output more readable
      but harder to parse.
    - Can't be relied upon anyways ["\n" vs "\r\n" are often hosed
      up by whatever tools you happen to be using].
5. Options:
    - C style escapes.  "\n\0"
    - quoted-printable style escapes. "=0a=00"
    - XML/HTML style escapes. "&#010;&#000;"  HERE: correct?


--------------------
Alternate representation:
(I'm not planning on using this representation.)

- Subversion gives each version a list of all version deltas that
  have been accounted for in that version.
    I suspect my representation can theoretically represent about
  the same information as the representation above (with reasonable
  conversions between the representations), but I kind of like
  a representation that closely resembles the input (the actual
  merges performed).
- I believe clearcase is more limited: I think it only remembers the
  from and to versions of a merge, not the common ancestor.
     On the other hand, the initial common ancestor search algorithm
  I have in mind is probably going to only use the "from" and
  "to" versions of past merges.  But storing more details
  would allow future enhancements to be smarter.

================================================
================================================
================================================
Specifics of storing merges in sandbox, repository, and client/server
protocol.  It might be useful to use the same format for
more then one thing (for example, output a fragment of an
RCS file in a CVS/mergeHistory file in the sandbox).

----------------
RCS:
  (For context, see "man rcsfile")
  Add a newphrase "merges" to the "delta" section of the RCS file,
  as shown in the following addition to the RCS file grammer:

     <admin> ::=    # [snip] ...
                  merges { <mergeInfo> { : <mergeInfo> }* } ;
                  { <newphrase> }*

     <mergeInfo> ::= { <infoItemName> <infoValueValue> }+

     <infoItemName> ::= <id>

     <infoItemValue> ::= <string> | <id> | <num>

  Alternatives:
    - I considerred storing the merges to (or from?) version X
      in the "delta" section for version X, but it doesn't seem
      like a properly normalized representation (possibilities of
      inconsistencies, etc).  It also doesn't
      eliminate the issue of storing multiple merges in one section
      of the file, with each merge needing multiple name/value
      pairs.
    - I also considerred using one newphrase per merge, but that
      would require multiple instances of the same newphrase
      in one section of the RCS file, and:
       a. There are no provisions for that in the existing format.
          (See how locks and symbols (tags) are handled...)
       b. I wouldn't be suprised if some RCS file parsers
          can't manage multiple instances of a single newphrase.

  Questions:
    - Is there some semi-official registrar of newphrases that
      people have added to RCS, to make sure I don't conflict with any
      body else's extensions?
    - Does anybody know of any other merge history extensions anyone
      has added to RCS that I should try to be able to interoperate
      with?

----------------
Sandbox:
  I'm leaning towards using a new admin file (CVS/mergeHistory) that
  just stores the relevent data fragment from one of the other
  representations.  I'll use whichever one seems easiest to factor
  out.
     Ooops: Needs to store a fragment per file, with additional
  info to identify the file that each fragment is for.

----------------
Client/server protocol:
  - I'm thinking of making sure there is no line length limit and that
    the encoded form of the strings don't have any newlines,
    and then just passing each merge in a form like:
      "mergeHist ${var}=\"${value}\" ${var}="${value}" ...\n"
    This will be recognized both client->server and server->client.
    The file it applies to will be derived from context.
    HERE: more details

----------------
Command line:
   When a name/value pair needs to be specified on the command line,
   each name/value pair will specified as something like:
     --set    NAME=VALUE           # Fully encode value
     --setEnc NAME=ENCODED_VALUE   # Partially encode value.
     --setD   NAME=DATE            # Use version number that goes with DATE.
     --setR   NAME=TAG             # Use version number that goes with TAG.
     --setRD  NAME=BRANCH:DATE     # Use version number for date on branch.

      Of course, cvs update -j will automatically set all required names,
   but they can be overridden or more added if desired.

----------------
CVS log output:
HERE
   merge:
       name="value"
       name="value"
       name="value"
   merge:
       name="value"
       name="value"
       name="value"

   Values will appear in encoded format, no newlines.  It is intended to
   be easy to pull out the values with simple RE based parsers.  (Nothing
   tricky for embedded newlines, quotes, etc.)

================================================
================================================
================================================
Internal design:
- A new file will be created to house the common ancestor search
  algorithm, and other merge history related functions.
     In new code I'll try to aim for the reentrancy desire
  expressed in the HACKING file.
- Things like I/O routines for various formats may go
  in existing files (for those I/O formats) or in the new file,
  depending on where it seems to fit best.
- Obviously changes to specific cvs subcommands will go in the
  file for that subcommand.
- New cvs subcommands (cvs mergeHistory or something) will
  get their own files (for high level code); lower level code
  may go into the generic new file with the search algorithm...
- I think the RCS code can read and write newphrases already,
  so I don't need to do much to it (except verify that it can
  handle newphrases.)

Search Algorithm:
  - First, create a combined directed graph from the version tree
    and the "from" and "to" versions of all historical merges (both
    from the RCS file and from the sandbox admin files).
  - Now search the graph for the "closest" common predecessor.
        This will likely be a relatively small part of the patch,
    that will be easily changed or replaced as desired, but basic
    dificulties are:
  - Define "closest":
       - Number of transitions?
       - Sum of some kind of transition weights?
       - Consider version tree common ancestor A and
         a nearer ancestor B that happens to be common via a
         long series of merges and changes through several
         different branches.  Even though B likely has
         more transitions, it is still probably a better
         choice then A for the common ancestor.
       - Only consider direct ancestors as candidates; choose
         the closest one that is any kind of ancestor of
         the other version?
  - External versions:
       - It probably can't use an external version as any contributer
         to the merge (although future changes (not part of this initial
         proposal) might implement the ability to ask an
         external plugin for a copy of the indicated external
         version...).
       - The simplest thing is to ignore transitions involving
         external versions completely.  They are only for use
         by external tools.
       - On the other hand, perhaps it would be useful to be able
         to transition through external versions during the search
         for the best internal common ancestor.


================================================
================================================
================================================
Splitting Up the Patch Into Subpatches:
  - Smaller patches are often desired.
  - Prerequisite bugfixes found during development will be
    split off.

//HERE: Reorganize:
Application Options:
1. Build it as one big monolithic patch.  It will still be developed
   in the lean order, but trying to organize the patch in that way
   is not practical, unless pieces of the patch are regularly applied
   to the official source.
2. Build as one big patch, but prior to application, attempt to
   seperate it as much as possible into distinct subpatches for
   easier chewing.
      //HERE: What subpatches might be possible.
3. Submit incremental patches regularly, which someone will apply
   nearly as often as I generate them.
      //HERE: incremental patches that might be submitted.

How to split up a megapatch:
  - The main patch would need to include the low level search
    algorithm; read/write routines for various protocols/file
    formats; and enhancements to the update.c and commit.c
    [This would be the bulk of the changes regardless...]
  - "cvs mergeHistory" and other new subcommands could probably
    be seperate, if desired.  [Though it doesn't seem likely you
    would want to apply the main patch but not this one...]
  - Other loosely related changes would also be seperate.
    [depending on how loosely and/or how embedded they are
    in the same parts of the code as the main patch...]


A tentative development strategy is listed below.
If regular small merges into the source tree can be arranged,
incremental patches might be applied regularly during
this plan:
  - Make any fixes to make sure that it can read/write RCS files with
    arbitrary newphrases without messing them up.  Also do some
    tests of RCS itself...  If problems are found, come up with
    transition strategies when access may be made when
    working with old versions of CVS and/or RCS...
  - Add an initial UI facility to add/remove merge history entries
    directly to the RCS file (local only, for now).
      Put some effort into a good interface up front, but it is
    subject to change during development.
  - Add in sandbox admin file capabilities to temporarily remember merge
    history entries, and make "commit" copy those entries into
    the RCS file.  (Initial testing will depend on manually adding
    those entries to the sandbox admin files.) [Initially only support
    direct repository access.]
  - Extend commit to handle merge histroy over the client/server
    protocol.
  - When a merge is done, have "cvs update" record a merge history
    entry in the sandbox admin files.
  - Have "cvs update -j" (single -j) use a merge history aware
    algorithm for deciding the common ancestor to use.
  - Possibly experiment with different algorithms, maybe
    even allowing plugin algorithms.
  - Go through the new UIs, clean them up, and finalize them.
    If not already done, support:
       - Directly changing merge history immediately.
       - Scheduling merge history changes for the next commit.
       - Arguments to update to support optional merge history
         attributes.
       - Arguments to update to control algorithm to use for searching
         for common ancestor: (at least 2 algorithms: the new default
         merge-history-aware algorithm, and the old (current) algorithm)
    Finish up the test cases and documentation.  And do any final
    cleanup of the code changes.
  - Look into optional, loosely related enhancements (other merge
    history mechanisms, UI for encoding/decoding strings, etc).

This is tentative.  Test cases should be developed fairly soon
(facilitate development).  Documentation may take a bit longer
then test cases (initially somewhat fluid interface, and docs aren't
much use during initial development), but should not be
delayed too long.

My impression is that some open source owners prefer to merge
things using this strategy.  But it definately takes regular and consistent
attention by someone with commit access to the tree, plus at least
some degree of faith in the long term strategy:
merging things that aren't immediately useful (having commit copy
history entries from the sandbox to the RCS file is not useful
when there is not yet an automated way to get those entries into
the sandbox in the first place...)

HERE: somewhere:
HERE: If tag not found internally, search externally referenced (by
      merge history) items.
HERE: Extended tags reference external versions?
================================================
================================================
================================================
GARBAGE OBSOLETE
     ### I don't like this because it means either redundant info
     ### (a "to" value vs the delta number) and the corresponding
     ### potential for inconsistencies, or it means that one attribute
     ### of the a merge record comes from outside the record.
     #<delta> ::= <num>
     #            # [snip] ...
     #            next { <num> };
     #            merges <mergeInfo> { : <mergeInfo> }* ;
     #            { <newphrase> }*

    #########################
    OR:

        # (One newphrase per merge)  But the existing grammer doesn't
        # use multiple same-named phrases at all...

      <admin> ::=
                  # [snip] ...
                  { merge { <nameValue> }+ ; }*
      <nameValue> ::= <id> : <value>
      <value> ::= <id> | <num> | <string>

     The reluctance to incorporate patches may be overreaction to
  past problems: Burned by the PreservePermissions stuff, and as far
  as I can tell, nothing major has been merged in since then (just small
  bugfixes).  As direct experience, I've helped with the
  "advisory lock patch" (HERE: Link).  As far as I know it
  meets all the requirements mentioned in HACKING, but no
  one is jumping to incorporate it.