WishList.txt - mogilvie's personal wishlist for CVS features/ideas.

[Up] - See index.html

This file refers to some of the items in the TODO file in CVS 1.11.

NOTES:
- TODO item 52 can be simulated by using pserver, local passwords, and
  mapping everybody to a single 'cvsuser' account.

================================
1. Get some/all of the the cvsEnhancement scripts inserted
   into contrib directory. (not really critical)

3. Integrate cvsSync into CVS itself.  See the FUTURE comments in the
   scripts for thoughts about this.  Is TODO item 100 related?
   ("Checked out files should have revision control support.  Maybe.")

4. Integrate cvsPasswd (at least the password-changing capability;
   probably not the general administration ability).  This would
   eliminate any possible need for cvsPasswd.cgi.

5. [See cvsChangeLocation.  It could probably use a bit more testing,
   but it works.]
      Either as a contrib script or built in, implement a
   "cvs reposmove" command.  This command would recursively
   update "CVS/Root" files to the value set with the "-d" option.
   It might only check for the existence of the directory,
   not the files, (and certainly not the file versions), in order
   to be usefull when you restore an old backup on a different
   machine when the original server got hit by lightening.
     It (or something similar) would also support an argument to
   cause it to make an adjustment to the "CVS/Repositories"
   file as well, to help end-users syncronize with manual
   administrative reorganizations of top-level modules within a
   repository.  ("[explicitArg]/CVS/Repositories" would be
   completely replaced, but in subdirectories, only the initial
   part of CVS/Repositories would be replaced.)
   Obviously this technique of moving things around
   should only be used if you don't need to ever go back to the
   old directory structure.
     (See TODO items 149 and 189)
     Perhaps it would be best to start with the "cvs_chroot" script
   from the VChacks project at:
   http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/vchacks/?cvsroot=vchacks

6. I haven't tested it, but I'm wondering how inteligently
   CVS behaves if you restore the repository from an old backup,
   and then someone does a "cvs update" in an up-to-date
   working copy?  It doesn't loose what could be a
   relatively quick, easy way to "finish" reocvery, does it?

7. Can loginfo be adapted to accept spaces in file names
   unambiguously?  To allow multiple, seperate %{?} substitutions
   on a line?  Other kinds of substitutions (suggestions?)?
   Is it at all practical to have a scheme to have loginfo
   called once for an entire multidirectory commit?

8. cvsCheck-like options built into "cvs log", and improve
   locking in general.  (See also wishlist item 12)
      BUG: If you lock (cvs admin -l) a file that was just recently
   imported, it locks version 1.1.1.1.  Then, when you go to commit
   changes to that file, it temporarily locks version 1.1, and
   then fails because it doesn't know which version to unlock
   (even though they are the same).  I suspect their may be
   problems with branching and locking, as well.

9. How efficient is -z?  Is it nearly as effiecient
   as a .tar.gz file when initially checking out
   a large repository?
     If not, would it be worth it to make some generic scripts so that
   a new user could do
   "tar -xvzf ...tar.gz ; cd ... ; ./addCVS ; cvs update" to get the
   latest version with minimal network traffic?  Optionally with 
   a "cvs reposmove" command thrown in (see wish list item 5).
     [Implement both some scripts to *generate* such a tar file, plus
   the "addCVS" script.]

10. If "cvs edit" notices someone is already editting a file, then
    send a message to stdout of "cvs edit" to that effect.  Can't
    do this through the notify script because it currently screws up
    the remote protocol.  (My test was putting "cat" in notify.)
    [The advisory lock patch eliminates most of the incintive for
    this one.]

11. Ability to check out a deep directory via absolute path, but
   locally leave out most (all but 1 or 2 directories) of the depth.
     Modules might kind of do this, but don't work great when you
   have to protect your $CVSROOT/CVSROOT directory.
   [-d option is probably adequate.]
   [See also Wishlist 33.]

12. Ability to do a simple recursive add (or an import without
    vendor branches and required tags...)
    (TODO item 123?)  (TODO item 145 is loosly related)
    See also Whishlist items 8, 12 13, 14, 15, 28, and 31.

13. Multisite: (TODO 186)  (Also related to some of the import issues)
    [See also WishList 12, 13, 14, 15, 28 and 31.]
  [CCase Multisite is based on exactly duplicating all revision history
  info at all sites.  In contrast, the scheme described below is
  designed to treat the sites more like branches that are
  periodically merged into each other; we only merge in the
  latest version, not duplicate full history.]

   Low level tools:
     - Patch mode: Something kind of like a cross between
       "patch" and "cvs update".  Main advantage over plain
       "patch" is ability to do the equivalent of "cvs add"
       and "cvs remove" automatically as appropriate.
       [Also handle -k mode, file execute bits, and other meta
       data somehow.]
          cvs diff and rdiff might (or might not [metadata]) be adequate
       for creating patches.
     - Overwrite mode: overwrite, add, and remove files in a checked
       out area to make it become an exact duplicate of an alien tree.
       [Also handle -k mode, other meta data somehow.]
     - Generalize the tools above to work using a plugin that could
       (conceivably) implement an interface to any version control
       system (clearcase, etc.), not just CVS.  The interface
       would be used in the following way:
        1. Plugin  interface to retreive info about what the current
           state of the tree is.
        2. Program using plugin decides what files need to be touched.
        3. Plugin provides method to be told what is
           going to be done (in bulk), so that it can do
           "pre-change" operations.  (edit/lock/checkout type
           operations).  It also remembers anything it needs to
           do after the files are changed (cvs add, for example).
        4. Program does whatever it needs to to existing files.
        5. Plugin provides a method to allow the system to do
           any "post-change" operations it might need to. (like cvs add)
            [Update: Perhaps just have both pre and post change
           functions that must both be called by the higher level
           tool at the correct time.]
        6. User tests, debugs, and finally commits.
       This is designed to allow maximum effeiciency in the face of
       high-latency network connections.
       [Be able to go both directions: Patch from foreign, or patch
       to foreign.]
    See also MergeHistory.html

14. Patch manager:
     Generic/automated scheme for tracking where various "mods"
     have been merged (branches, repositories, etc).  Can be
     manually done with tags now, but something a bit more
     automated might be possible.
       Perhaps be able to create named "patches" as the difference
     between two tags (or versions).  When applying a large "patch",
     be able to ignore any contained small patch that has already
     been applied.  Be able to create and track variations of a
     patch (as related to conflicting patches only applied to
     some branches).  Be able to "unapply" a patch both on a
     "decided not stable enough for this branch yet" basis
     (consider it no longer applied) and on a
     "not applicable for this branch; define new
     variation that is empty" basis (consider it applied via a
     variation).
       What to do about patch dependencies?  Making sure
     committed changes are only a patch (or a variation of
     a patch), and don't also incorporate unrelated changes?
     Should we generalize the definition of a patch so
     it could have gaps in the definition?  ("-r1.1 to
     -r1.2 plus -r1.3 to -r1.4" [1.2 to 1.3 would presumably
     be an unrelated change...])
       Streamline common cases:
     1. Repeatably updating a parent branch with all bug
        fixes from a child branch.
     2. Applying selective bug fixes of a parent branch to a child
        branch.
     3. Multisite syncronize.
     3. Others?
    [See WishList 24 for a cleaner solution.]

15. Multisite high level tool:
    [See also WishList 12, 13, 14, 15, 28 and 31.]
     1 button syncronize multiple sites, internally using low-level
     multisight and patch manager tools.  Callable in a cron
     job.  But what about conflicts?

16. TODO 85 (symbolic links)
    Perhaps use special "State" to mean the contents specify a symlink,
    instead of a normal file or a deleted file.  I like this solution best.
      Alternately, have a special version-controlled file
    under CVS/symlinks,v that lists the symlinks (and targets) in
    the directory.
      See also Wishlist 22.

17. Revision controlled directories (smart pruning). 
     Use a "CVS/Directory,v" file in the repository, which
    possibly uses special states.
    [Low priority.  Workarounds: use a dummy 0 byte file, or fix
    your makefile to create missing output directories.]
    [See Wishlist 22 for an extended form of this idea]

18. TODO 150 (cvs message for individual files).

19. Is there any way you can configure an arbitrary
    transport filter (for encryption, tunnelling, or specialized
    compression) around any of the existing remote protocols?
      Specifically, does it support an encrypted link, but
    pserver user authentication in any way?  How hard is it
    to get outside a firewall (using SOCKS or something custom)?

20. Generic -k handling:
      -kX1-kY1,-kX2-kY2,...,-kXn-kYn{,-kYn+1}
    If current file is -kX, find a (comma-serated) entry of the form
    "-kX-kY", and use -kY on the file.  The last entry can be like "-kY",
    it means use -kY no matter what the current mode is.
       Common ones might include "-kb-kb,-kk", "-kb-kb,-kv",
    "-kb-kb,-ko-ko,-kk", etc.
    Related: What happens if you commit a file you checked out with
    a changed -k setting?  Is the changed -k setting applied
    to the RCS file?  Most likely we would want to support both
    methods.  How well does update handle changed -k settings?

21. Renaming files:
        Based on a concept mentioned in a mailing list message from:
    >From: woods@weird.com (Greg A. Woods)
    >To: "Hondros, Constantine" <Constantine.Hondros@nl.compuware.com>
    >Cc: "CVS-II Discussion Mailing List (E-mail)" <info-cvs@gnu.org>
    >Subject: RE: Newby : moving/renaming files loses version information?
    >Reply-To: info-cvs@gnu.org (CVS-II Discussion Mailing List)
    >Organization: Planix, Inc.; Toronto, Ontario; Canada
    >Date: Wed, 16 May 2001 15:41:56 -0400 (EDT)

        The basic idea is tie in extra information in the dead
    and initial log entries (or possible some kind of extended RCS
    format node entry) that describe how to transition through
    the rename.  Enough information would be kept
    in the reference to support renaming
    back and forth repeatably.  Generally, after a rename the new file
    version numbers would start off where the old ones left off, but
    for robustness, you will also be able to refer to old versions
    by a new numbering scheme (perhaps prepend 0. to old version
    version numbers every time it transitions backwards through
    a rename?)
        CVS commands would be enhanced to automatically follow
    this backwards chain when necessary.  log, diff, rdiff, rlog,
    and update -j would probably all benefit from this.
    update -r needs a bit more thought.  Perhaps if it refers to
    a version with a different name, it should merge the sandbox
    changes of the current name into the requested version (with the
    other name).
        With a little more effort, you could use a
    doesn't-really-rename version of this scheme to limit the size of
    RCS files for very large binary files with lots of versions.  Perhaps
    give such a split up file an extension of ,v$NUMBER.
        The biggest sticking point may be the issue of re-added
    files of old names.  I have vague notions of by default making
    all -r/-j version references refer to other versions
    in a series of renames, only falling back on totally seperate
    removed/added "lines" under special circumstances (version
    doesn't exist on current "line", or version is a
    "fully qualified" version).
        Thinking about this is an interesting mind exercise, but is
    the added complexity (in both the code/implemenation *and*
    in actually using it) worth it?
       See also MergeHistory.html which attacks the same theoretical
    problem from a different angle.

22. Directory versions (Empty dirs, rename files, merging changes
    into renamed files, hard links, and sym links).
    [This is less of a hack then 21, but still a massive change,
    especially if you want to keep cross-version compatibility to
    any extent at all.]
    a. The key thing to make directory versioning work is to use an
       extra level of indirection.  In general, the name of the ,v
       file is no longer used to calculate the name of the
       coresponding working file.  Instead, you have a
       known-format text file called "CVS/dir,v", which describes the
       current contents of the directory by giving the "to-use"
       name of a file, a file type, and a relative path to the
       directory or ,v file to use for that entry.
    b. When adding a new file, use the same name if it doesn't already
       exist.   Otherwise, create a new unique name.
    c. When removing, go ahead and add the "dead" revision
       in addition to removing the entry from the ',v' file. (Maximize
       compatibility.)  Never rename the actual ',v' file.
    d. When locking, you need to 1. Lock current directory. 2. Read
       contents.  3. Lock any other directories that have files you
       have.  If step 3 fails, you need to unlock and start from the
       beginning.
    e. Interacting with dead revisions: A non-existent directory is
       indicated by the lack of an entry in the *parent* directory.
       If a given tag doesn't exist in the CVS/dir,v file,
       assume the directory exists for that tag, and use
       dead/non-dead/labeled/unlabeled revisions to figure
       out the contents (as currently done).  It may be desirable
       to have new files (especially mangled names) be stored in the
       CVS directory, but that probably isn't critical. (An old tag
       probably won't exist on new files...)
    f. Hard links introduce reposititory complications:
       You usually don't want a re-"add" to use the same ',v' file
       because it could hose up hard links.  You may want
       the ',v' file to keep track of every CVS/dir,v file that
       references it, to make it easier to track hard links in
       sandboxes.
    g. You probably need a "cvsck" program, similar in spirit to
       fsck, especially if you want to track hard links.
    h. Support two different ways of tracking hardlinks:
       - All instances of a hardlink known by a single checkout
         or update operation are hardlinked together.  CVS/* meta
         data links them all together with relative paths.
       - If hardlink fails, or if update doesn't see/can't find
         other links, then just leave them seperate.  Can always
         be joined later, or just accept that a change to one
         instance of the hardlink won't show up in the other part
         until a commit/update cycle.
       - Update unmodified will break a hard link if the operation
         doesn't cover all links.  But if it is *modified* and hard
         linked to something it can't find, then update will not
         touch it at all (temporary conflict).
    i. Do we drop support for the modules file?  (simulate it with
       multiple directories)  Related: Do we support manually hacked
       up directory organization? (In CVS/Entries, use special nomenclature
       to distinguish *real* added/removed directories,
       vs. auto-recursively checked out directories, vs. hacked-in
       additional.  The cases are all handled slightly differently on
       update or commit.
    j. Perhaps implement a special "defragment" utility that
       make the latest versions of files exist in the default location,
       and modifies old references to point to new location?
         Need to be careful about using such a thing, though.
       You may need a policy of can't have any sandboxes checked
       out when you use it (but that would make it almost useles...).

23. Support symlinking directories in repository for purposes
    of controlling partion/disk usage.  (Have a caveat
    in docs that says to expect undefined behaivior if a given
    directory can be reached more then one way via symlinks...
    Also, if symlink introduces a logical directory loop...)
    [Trick: use wrappers for chdir() and getcwd() that follow
    the logical directory structure (via symlinks) rather then
    the physical directory structure.  Better (but much more work)
    is to fix CVS so it doesn't chdir() all over the place like
    it does.]

24. In sandbox, remember merges in the CVS/* files, and then on
    commit inject special RCS extensions to remember the merge.
      When a user asks to merge two versions, be intelligent
    about using the last merge points, rather then just the
    common ancestor.
      Have "cvs admin" have some options to insert/remove merge
    notes.
      This idea is fleshed out quite a bit in MergeHistory.html.
    [See TODO 39.]

25. Support reading/writing revision files using zlib.
    (Use a ",v.gz" extension)
      Also gets you gzip's CRC algorithm to detect corruption closer
    to when it occurs!
      If both compressed and uncompressed copies of a revision file exist,
    it is a fatal error.
    Config:
       CompressedRCS = { never | alwaysUncompress | preserve | alwaysCompress }
    [Perhaps config attic files seperately from active files?]
    [Perhaps, for backwards compatibility if anyone hid files
    by gzipping them, a config option to ignore compressed files?]

26. Add a new -k expansion mode that allows merging but disables
    altering line endings and expansion of RCS keywords.  (But perhaps
    most places this might be useful, -ko would be adequate?)

27. For automated testing of cross-version protocols, build a capture
    option around socket communications.  Store it in a text file (with
    escapes for binary data) so that parts of it can be manually editted
    to use REs for validating expected input.
       Such a tool has been developed (but not isolated/integrated).
    See the first/non-dated version of my branch of the Advisory
    Locks patch (editCheck.diff) on the following page:
       http://sourceforge.net/tracker/
       index.php?func=detail&aid=471046&group_id=4680&atid=304680

28. Another multisite idea:
    [See also WishList 12, 13, 14, 15, 28 and 31.]
    Support extended revision numbers of the form REPOSITORYID/number
    where number would be the RCS (or some other system specific)
    revision number of the file as specified by REPOSITORYID.  Elsewhere
    in the file (or perhaps better: in CVSROOT/repository_map)
    would be a mapping of REPOSITORYID to info like:
      - What kind of repository it is. (In general, it would
        be nice to be able to plugin, for example, a CCase adapter,
        OR have a "transient repository" in a sandbox somewhere.)
      - Where it can typically be found (machine, directory, protocol)
      - If such a thing exists, include a GUID of some kind that
        could be used to track down the repository if it moved.
    You would be able to ask update to do merges among any
    versions on any repositories.
       The "transient" idea could support transient update and commit.
    This would provide an alternative/cleaner solution then cvsSync to
    the issue of trying things out on multiple architectures before
    committing.

29. RCS extension ideas:
    [RCS is supposed to be extendable without breaking backwards
    compatibility.]:
    a. TODO 24: a file-level extension that just lists all merges
       that have been done.
    b. A revision-level extension that can remember a seperate -kMode
       for each revision.  The file level -kMode provides a default
       that will probably usually be set to mimick the "latest" setting,
       to maximize backwards compatibility.
    c. Alternate to TODO 26: A revision-level extension
       "mergealgorithm" flag that can
       override the default merge algorithm.  Predefined or defined
       in CVSROOT/merge_algorithms.  Predefined:
         - default  (What happens now, including the unmergability of -kb
             files.)
         - unmergable  (Never mergable)
         - diff3   (Always use the internal diff3 algorithm, even
           on -kb files.)
       The CVSROOT/merge_algorithms file would specify some or all of
       the following:
         - A shared library that can be dlopen()ed and called in
           the client to do the merge (if it can be found) (most
           general (can interact with user) and fastest (on a lot of
           merges, no need to do expensive process-startup on each
           file)).
         - A script to invoke on a client with three file names to
           do the merge (if it can be found).
         - A shared library that can be dlopen()ed and called on the
           server to do the merge (if it can be found).
         - A script to invoke on the client with three file names
           to do the merge (if it can be found).
         - Fallback procedures to pick the best available method
           of merging.
         - What (if anything) to do on commit of a conflict file
           to decide if the user has done *something* about the conflict
           (and abort the commit if nothing has been done).
    d. Meta-log (logs tag operations, admin operations, etc, and the
       dates they are done on.)  In the short term, it is just a
       log file that could be *manually* used to reconstruct things
       after a screwed up merge operation.
          In the longer term, perhaps an "undo tag" command could be
       developed that works by looking at the meta-log.

30. Some thoughts about using the rsync algorithm to transfer files:
    (see TODO 195.b1)
    a. It might be useful to add a bidirectional tunnelling facility
       to the client/server protocol.
          Example:
        Client: ("RSyncTunnel %d\n",tunnelNumber)
        Server/Client: ("Tunnel %d %d\n%*s",
                        tunnelNumber,byteCount,byteCount,bytes)
        Server/Client: ("CloseTunnelSend %d\n",tunnelNumber)
       NOTES:
       - Generally you would want to multiplex things as much as possible.
         A single instance of the rsync algorithm should handle
         at least a whole directory, possibly the entire tree.  It
         would proceed partially independently of the main
         CVS conversation.
       - The first command allocates a tunnel number, binds it to
         something, and provides whatever additional data is needed
         for that something.
       - The two ends of the algorithm can send arbitrary data to each
         other using the "Tunnel" command.  When no more data needs to
         be sent, they can use the "CloseTunnelSend" command.
       - It is probably simplest if you allow the tunneled
         protocol to duplicate information where convienient
         (e.g., use a "Syncing filename\n" to say that a file is
         being sync via rsync, even though that same information is
         probably embedded in the rsync tunnel somehow.  gzip is
         generally very good with this kind of redundency...
       - It would be simplest to have client refrain from sending
         "update" until rsync has finished, but (assumming there
         is one "update" command per directory, which I am not
         sure about) it could get better network utilization by
         having the server buffer commands after the update until
         rsync has finished everything needed for that particular update
         command.
       Other Benefits:
       - This could also be used by other things in the future, possibly
         including dynamically loaded plugins/hooks that might be
         a new/faster alternative to the *info scripts currently
         supported.
       - This rsync tunnelling could also be used by internal server
         updates in a master/slaves distributed cvs server.  (See below (31).)

31. Yet another multisite system.
    [See also WishList 12, 13, 14, 15, 28 and 31.]
      [NOTE: The following could actually be implemented using some
    kind of CVS proxy server to implement the master/slave technique.
    No modifications are needed to CVS itself.  It could even
    redo the way some things are done (like use rsync instead of
    whole files (receive file on slave, rsync it to master, send
    file to server)...]
       This time the idea is that there is a single writable repository,
    and multiple readonly slaves.
      So far this can be emulated on its own.
    Key distinguishing features:
    a. Clients always connect to one of the slaves, but if the client
       wants to do something that requires write access, the slave
       *automatically* forwards that request to the master.
    b. When a write-to-master operation is finished, the slave automatically
       syncs up with the master.
    c. On read operations, it can autosync using any of of least 3
       strategies:
        - Don't auto-sync at all.
        - Normally just do a lightweight file size and/or timestamp
          check.  Only o heavy sync if the light check indicates the
          the necessity.  Perhaps the client should cache the the
          timestamp it last saw on the server?
        - Do heavy sync always. (Use rsync to make sure slave ,v files
          match server ,v files.)  Note that this doesn't necessarily
          defeat the purpose of the the master/slave arrangement -
          only light rsync validations occur needlessly; bulky file
          contents changes only transfer once for a whole set of
          sandboxes, not once per sandbox.
    d. There will be a configured-in default strategy, as well
       as the ability for the user to explicitly override it.
       [Perhaps using the -s global option.]
    e. It would be good if we perhaps used CVSup or something instead
       of raw rsync so that we could avoid overwriting more recent
       ',v' files if the master was restored from an old backup.
         Although it should obviously not loose the newer data, I'm
       not sure what, if any, automatic recovery and/or built-in
       recovery commands should be programmed in.
         (If server time-stamp is obsreved to go backwards in time and
       the file isn't identical, set a persistent "inconsistent" state
       until recovery is accomplished...)
    f. Make sure we can talk between client and slave using old clients,
       but optional extensions may be allowed (an up-front-hint about
       weather a client may need to forward to the master?)
    g. Authentication is a little hairy: pserver is easy, but
       getting all the keys ans stuff set up for every user to
       allow ssh between client/slave and between slave/master could
       be awkward.  Perhaps have the ability to use a different
       method between slave/master then between client/slave
       would ease things. (Include the encrypted-pserver method??)
    h. Possible internal changes:
        - Update the main loop to support multiple sources of input
          (using select(), etc.)  Could also benefit plugins (*info
          and/or possible dynamic loading alternative), and rsync
          protocol (WishList 30 above).
        - The recursion/locking utilities may need significant
          changes to work well. (Actually, I hear it already needs
          work to make a better fix then the current "lock the whole
          tree" workaround to some kind of "cvs tag" race condition.)
        - Question: How closely should the slave monitor what it is
          forwarding between the client and the master?
             Perhaps once the slave realizes it needs to forward
          to the master, it should cleanup whatever it started
          to do, open a connection to the master, and then
          tunnel the client's communications to the master.
          Non-tunneled slave/master comm would be reserved for
          slave/master-only data (like rsyncing the ',v' files).
        - What to do if the slave and master dissagree about the
          correct "Valid-Responses" line to send to the client?
          Is it safe to send more then one "Valid-Response"?
    i. Also be able to schedule master/slave syncs in a cron job,
       without involving sandboxes.

32. Mimick Clearcase "config-specs":
      - Design a language that has most of the features
        of clearcase "config-specs". (Specify what sticky tags/dates
        of files, whole file trees, and/or filename globs to
        checkout.)
      - Be able to checkout/update a view to match a given config-spec.
      - Be able to derive a config spec from the current "sticky"
        information in a sandbox.
      - Ideally, it would include the ability to handle rearranged
        directories or even multiple repositories in a single sandbox.
      - Consider how it might be merged with the modules facility?
      - First cut should definately be a seperate script.  If
        it proves useful, then it can be reimplemented as part
        of CVS itself in the future.

33. Support inserting extra whitespace into an RCS file to enable
    fast tagging (when possible, just overwrite some of the whitespace
    instead of rewriting the whole RCS file).
       Notes:
      - Use variable amounts of whitespace (by file size and/or random
        numbers) to prevent a "herd" problem where every file needs the
        slow method at the same time.
      - Do file block and memory page alignment to minimize critical
        sections against anything trying to read the file without
        doing CVS locks.
      - This may conflict with wishlist 25 (compressed RCS files),
        but maybe zlib could somehow support making part of the
        file uncompressed?