WishList.txt - mogilvie's personal wishlist for CVS features/ideas. [Up] - See index.html This file refers to some of the items in the TODO file in CVS 1.11. NOTES: - TODO item 52 can be simulated by using pserver, local passwords, and mapping everybody to a single 'cvsuser' account. ================================ 1. Get some/all of the the cvsEnhancement scripts inserted into contrib directory. (not really critical) 3. Integrate cvsSync into CVS itself. See the FUTURE comments in the scripts for thoughts about this. Is TODO item 100 related? ("Checked out files should have revision control support. Maybe.") 4. Integrate cvsPasswd (at least the password-changing capability; probably not the general administration ability). This would eliminate any possible need for cvsPasswd.cgi. 5. [See cvsChangeLocation. It could probably use a bit more testing, but it works.] Either as a contrib script or built in, implement a "cvs reposmove" command. This command would recursively update "CVS/Root" files to the value set with the "-d" option. It might only check for the existence of the directory, not the files, (and certainly not the file versions), in order to be usefull when you restore an old backup on a different machine when the original server got hit by lightening. It (or something similar) would also support an argument to cause it to make an adjustment to the "CVS/Repositories" file as well, to help end-users syncronize with manual administrative reorganizations of top-level modules within a repository. ("[explicitArg]/CVS/Repositories" would be completely replaced, but in subdirectories, only the initial part of CVS/Repositories would be replaced.) Obviously this technique of moving things around should only be used if you don't need to ever go back to the old directory structure. (See TODO items 149 and 189) Perhaps it would be best to start with the "cvs_chroot" script from the VChacks project at: http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/vchacks/?cvsroot=vchacks 6. I haven't tested it, but I'm wondering how inteligently CVS behaves if you restore the repository from an old backup, and then someone does a "cvs update" in an up-to-date working copy? It doesn't loose what could be a relatively quick, easy way to "finish" reocvery, does it? 7. Can loginfo be adapted to accept spaces in file names unambiguously? To allow multiple, seperate %{?} substitutions on a line? Other kinds of substitutions (suggestions?)? Is it at all practical to have a scheme to have loginfo called once for an entire multidirectory commit? 8. cvsCheck-like options built into "cvs log", and improve locking in general. (See also wishlist item 12) BUG: If you lock (cvs admin -l) a file that was just recently imported, it locks version 1.1.1.1. Then, when you go to commit changes to that file, it temporarily locks version 1.1, and then fails because it doesn't know which version to unlock (even though they are the same). I suspect their may be problems with branching and locking, as well. 9. How efficient is -z? Is it nearly as effiecient as a .tar.gz file when initially checking out a large repository? If not, would it be worth it to make some generic scripts so that a new user could do "tar -xvzf ...tar.gz ; cd ... ; ./addCVS ; cvs update" to get the latest version with minimal network traffic? Optionally with a "cvs reposmove" command thrown in (see wish list item 5). [Implement both some scripts to *generate* such a tar file, plus the "addCVS" script.] 10. If "cvs edit" notices someone is already editting a file, then send a message to stdout of "cvs edit" to that effect. Can't do this through the notify script because it currently screws up the remote protocol. (My test was putting "cat" in notify.) [The advisory lock patch eliminates most of the incintive for this one.] 11. Ability to check out a deep directory via absolute path, but locally leave out most (all but 1 or 2 directories) of the depth. Modules might kind of do this, but don't work great when you have to protect your $CVSROOT/CVSROOT directory. [-d option is probably adequate.] [See also Wishlist 33.] 12. Ability to do a simple recursive add (or an import without vendor branches and required tags...) (TODO item 123?) (TODO item 145 is loosly related) See also Whishlist items 8, 12 13, 14, 15, 28, and 31. 13. Multisite: (TODO 186) (Also related to some of the import issues) [See also WishList 12, 13, 14, 15, 28 and 31.] [CCase Multisite is based on exactly duplicating all revision history info at all sites. In contrast, the scheme described below is designed to treat the sites more like branches that are periodically merged into each other; we only merge in the latest version, not duplicate full history.] Low level tools: - Patch mode: Something kind of like a cross between "patch" and "cvs update". Main advantage over plain "patch" is ability to do the equivalent of "cvs add" and "cvs remove" automatically as appropriate. [Also handle -k mode, file execute bits, and other meta data somehow.] cvs diff and rdiff might (or might not [metadata]) be adequate for creating patches. - Overwrite mode: overwrite, add, and remove files in a checked out area to make it become an exact duplicate of an alien tree. [Also handle -k mode, other meta data somehow.] - Generalize the tools above to work using a plugin that could (conceivably) implement an interface to any version control system (clearcase, etc.), not just CVS. The interface would be used in the following way: 1. Plugin interface to retreive info about what the current state of the tree is. 2. Program using plugin decides what files need to be touched. 3. Plugin provides method to be told what is going to be done (in bulk), so that it can do "pre-change" operations. (edit/lock/checkout type operations). It also remembers anything it needs to do after the files are changed (cvs add, for example). 4. Program does whatever it needs to to existing files. 5. Plugin provides a method to allow the system to do any "post-change" operations it might need to. (like cvs add) [Update: Perhaps just have both pre and post change functions that must both be called by the higher level tool at the correct time.] 6. User tests, debugs, and finally commits. This is designed to allow maximum effeiciency in the face of high-latency network connections. [Be able to go both directions: Patch from foreign, or patch to foreign.] See also MergeHistory.html 14. Patch manager: Generic/automated scheme for tracking where various "mods" have been merged (branches, repositories, etc). Can be manually done with tags now, but something a bit more automated might be possible. Perhaps be able to create named "patches" as the difference between two tags (or versions). When applying a large "patch", be able to ignore any contained small patch that has already been applied. Be able to create and track variations of a patch (as related to conflicting patches only applied to some branches). Be able to "unapply" a patch both on a "decided not stable enough for this branch yet" basis (consider it no longer applied) and on a "not applicable for this branch; define new variation that is empty" basis (consider it applied via a variation). What to do about patch dependencies? Making sure committed changes are only a patch (or a variation of a patch), and don't also incorporate unrelated changes? Should we generalize the definition of a patch so it could have gaps in the definition? ("-r1.1 to -r1.2 plus -r1.3 to -r1.4" [1.2 to 1.3 would presumably be an unrelated change...]) Streamline common cases: 1. Repeatably updating a parent branch with all bug fixes from a child branch. 2. Applying selective bug fixes of a parent branch to a child branch. 3. Multisite syncronize. 3. Others? [See WishList 24 for a cleaner solution.] 15. Multisite high level tool: [See also WishList 12, 13, 14, 15, 28 and 31.] 1 button syncronize multiple sites, internally using low-level multisight and patch manager tools. Callable in a cron job. But what about conflicts? 16. TODO 85 (symbolic links) Perhaps use special "State" to mean the contents specify a symlink, instead of a normal file or a deleted file. I like this solution best. Alternately, have a special version-controlled file under CVS/symlinks,v that lists the symlinks (and targets) in the directory. See also Wishlist 22. 17. Revision controlled directories (smart pruning). Use a "CVS/Directory,v" file in the repository, which possibly uses special states. [Low priority. Workarounds: use a dummy 0 byte file, or fix your makefile to create missing output directories.] [See Wishlist 22 for an extended form of this idea] 18. TODO 150 (cvs message for individual files). 19. Is there any way you can configure an arbitrary transport filter (for encryption, tunnelling, or specialized compression) around any of the existing remote protocols? Specifically, does it support an encrypted link, but pserver user authentication in any way? How hard is it to get outside a firewall (using SOCKS or something custom)? 20. Generic -k handling: -kX1-kY1,-kX2-kY2,...,-kXn-kYn{,-kYn+1} If current file is -kX, find a (comma-serated) entry of the form "-kX-kY", and use -kY on the file. The last entry can be like "-kY", it means use -kY no matter what the current mode is. Common ones might include "-kb-kb,-kk", "-kb-kb,-kv", "-kb-kb,-ko-ko,-kk", etc. Related: What happens if you commit a file you checked out with a changed -k setting? Is the changed -k setting applied to the RCS file? Most likely we would want to support both methods. How well does update handle changed -k settings? 21. Renaming files: Based on a concept mentioned in a mailing list message from: >From: woods@weird.com (Greg A. Woods) >To: "Hondros, Constantine" >Cc: "CVS-II Discussion Mailing List (E-mail)" >Subject: RE: Newby : moving/renaming files loses version information? >Reply-To: info-cvs@gnu.org (CVS-II Discussion Mailing List) >Organization: Planix, Inc.; Toronto, Ontario; Canada >Date: Wed, 16 May 2001 15:41:56 -0400 (EDT) The basic idea is tie in extra information in the dead and initial log entries (or possible some kind of extended RCS format node entry) that describe how to transition through the rename. Enough information would be kept in the reference to support renaming back and forth repeatably. Generally, after a rename the new file version numbers would start off where the old ones left off, but for robustness, you will also be able to refer to old versions by a new numbering scheme (perhaps prepend 0. to old version version numbers every time it transitions backwards through a rename?) CVS commands would be enhanced to automatically follow this backwards chain when necessary. log, diff, rdiff, rlog, and update -j would probably all benefit from this. update -r needs a bit more thought. Perhaps if it refers to a version with a different name, it should merge the sandbox changes of the current name into the requested version (with the other name). With a little more effort, you could use a doesn't-really-rename version of this scheme to limit the size of RCS files for very large binary files with lots of versions. Perhaps give such a split up file an extension of ,v$NUMBER. The biggest sticking point may be the issue of re-added files of old names. I have vague notions of by default making all -r/-j version references refer to other versions in a series of renames, only falling back on totally seperate removed/added "lines" under special circumstances (version doesn't exist on current "line", or version is a "fully qualified" version). Thinking about this is an interesting mind exercise, but is the added complexity (in both the code/implemenation *and* in actually using it) worth it? See also MergeHistory.html which attacks the same theoretical problem from a different angle. 22. Directory versions (Empty dirs, rename files, merging changes into renamed files, hard links, and sym links). [This is less of a hack then 21, but still a massive change, especially if you want to keep cross-version compatibility to any extent at all.] a. The key thing to make directory versioning work is to use an extra level of indirection. In general, the name of the ,v file is no longer used to calculate the name of the coresponding working file. Instead, you have a known-format text file called "CVS/dir,v", which describes the current contents of the directory by giving the "to-use" name of a file, a file type, and a relative path to the directory or ,v file to use for that entry. b. When adding a new file, use the same name if it doesn't already exist. Otherwise, create a new unique name. c. When removing, go ahead and add the "dead" revision in addition to removing the entry from the ',v' file. (Maximize compatibility.) Never rename the actual ',v' file. d. When locking, you need to 1. Lock current directory. 2. Read contents. 3. Lock any other directories that have files you have. If step 3 fails, you need to unlock and start from the beginning. e. Interacting with dead revisions: A non-existent directory is indicated by the lack of an entry in the *parent* directory. If a given tag doesn't exist in the CVS/dir,v file, assume the directory exists for that tag, and use dead/non-dead/labeled/unlabeled revisions to figure out the contents (as currently done). It may be desirable to have new files (especially mangled names) be stored in the CVS directory, but that probably isn't critical. (An old tag probably won't exist on new files...) f. Hard links introduce reposititory complications: You usually don't want a re-"add" to use the same ',v' file because it could hose up hard links. You may want the ',v' file to keep track of every CVS/dir,v file that references it, to make it easier to track hard links in sandboxes. g. You probably need a "cvsck" program, similar in spirit to fsck, especially if you want to track hard links. h. Support two different ways of tracking hardlinks: - All instances of a hardlink known by a single checkout or update operation are hardlinked together. CVS/* meta data links them all together with relative paths. - If hardlink fails, or if update doesn't see/can't find other links, then just leave them seperate. Can always be joined later, or just accept that a change to one instance of the hardlink won't show up in the other part until a commit/update cycle. - Update unmodified will break a hard link if the operation doesn't cover all links. But if it is *modified* and hard linked to something it can't find, then update will not touch it at all (temporary conflict). i. Do we drop support for the modules file? (simulate it with multiple directories) Related: Do we support manually hacked up directory organization? (In CVS/Entries, use special nomenclature to distinguish *real* added/removed directories, vs. auto-recursively checked out directories, vs. hacked-in additional. The cases are all handled slightly differently on update or commit. j. Perhaps implement a special "defragment" utility that make the latest versions of files exist in the default location, and modifies old references to point to new location? Need to be careful about using such a thing, though. You may need a policy of can't have any sandboxes checked out when you use it (but that would make it almost useles...). 23. Support symlinking directories in repository for purposes of controlling partion/disk usage. (Have a caveat in docs that says to expect undefined behaivior if a given directory can be reached more then one way via symlinks... Also, if symlink introduces a logical directory loop...) [Trick: use wrappers for chdir() and getcwd() that follow the logical directory structure (via symlinks) rather then the physical directory structure. Better (but much more work) is to fix CVS so it doesn't chdir() all over the place like it does.] 24. In sandbox, remember merges in the CVS/* files, and then on commit inject special RCS extensions to remember the merge. When a user asks to merge two versions, be intelligent about using the last merge points, rather then just the common ancestor. Have "cvs admin" have some options to insert/remove merge notes. This idea is fleshed out quite a bit in MergeHistory.html. [See TODO 39.] 25. Support reading/writing revision files using zlib. (Use a ",v.gz" extension) Also gets you gzip's CRC algorithm to detect corruption closer to when it occurs! If both compressed and uncompressed copies of a revision file exist, it is a fatal error. Config: CompressedRCS = { never | alwaysUncompress | preserve | alwaysCompress } [Perhaps config attic files seperately from active files?] [Perhaps, for backwards compatibility if anyone hid files by gzipping them, a config option to ignore compressed files?] 26. Add a new -k expansion mode that allows merging but disables altering line endings and expansion of RCS keywords. (But perhaps most places this might be useful, -ko would be adequate?) 27. For automated testing of cross-version protocols, build a capture option around socket communications. Store it in a text file (with escapes for binary data) so that parts of it can be manually editted to use REs for validating expected input. Such a tool has been developed (but not isolated/integrated). See the first/non-dated version of my branch of the Advisory Locks patch (editCheck.diff) on the following page: http://sourceforge.net/tracker/ index.php?func=detail&aid=471046&group_id=4680&atid=304680 28. Another multisite idea: [See also WishList 12, 13, 14, 15, 28 and 31.] Support extended revision numbers of the form REPOSITORYID/number where number would be the RCS (or some other system specific) revision number of the file as specified by REPOSITORYID. Elsewhere in the file (or perhaps better: in CVSROOT/repository_map) would be a mapping of REPOSITORYID to info like: - What kind of repository it is. (In general, it would be nice to be able to plugin, for example, a CCase adapter, OR have a "transient repository" in a sandbox somewhere.) - Where it can typically be found (machine, directory, protocol) - If such a thing exists, include a GUID of some kind that could be used to track down the repository if it moved. You would be able to ask update to do merges among any versions on any repositories. The "transient" idea could support transient update and commit. This would provide an alternative/cleaner solution then cvsSync to the issue of trying things out on multiple architectures before committing. 29. RCS extension ideas: [RCS is supposed to be extendable without breaking backwards compatibility.]: a. TODO 24: a file-level extension that just lists all merges that have been done. b. A revision-level extension that can remember a seperate -kMode for each revision. The file level -kMode provides a default that will probably usually be set to mimick the "latest" setting, to maximize backwards compatibility. c. Alternate to TODO 26: A revision-level extension "mergealgorithm" flag that can override the default merge algorithm. Predefined or defined in CVSROOT/merge_algorithms. Predefined: - default (What happens now, including the unmergability of -kb files.) - unmergable (Never mergable) - diff3 (Always use the internal diff3 algorithm, even on -kb files.) The CVSROOT/merge_algorithms file would specify some or all of the following: - A shared library that can be dlopen()ed and called in the client to do the merge (if it can be found) (most general (can interact with user) and fastest (on a lot of merges, no need to do expensive process-startup on each file)). - A script to invoke on a client with three file names to do the merge (if it can be found). - A shared library that can be dlopen()ed and called on the server to do the merge (if it can be found). - A script to invoke on the client with three file names to do the merge (if it can be found). - Fallback procedures to pick the best available method of merging. - What (if anything) to do on commit of a conflict file to decide if the user has done *something* about the conflict (and abort the commit if nothing has been done). d. Meta-log (logs tag operations, admin operations, etc, and the dates they are done on.) In the short term, it is just a log file that could be *manually* used to reconstruct things after a screwed up merge operation. In the longer term, perhaps an "undo tag" command could be developed that works by looking at the meta-log. 30. Some thoughts about using the rsync algorithm to transfer files: (see TODO 195.b1) a. It might be useful to add a bidirectional tunnelling facility to the client/server protocol. Example: Client: ("RSyncTunnel %d\n",tunnelNumber) Server/Client: ("Tunnel %d %d\n%*s", tunnelNumber,byteCount,byteCount,bytes) Server/Client: ("CloseTunnelSend %d\n",tunnelNumber) NOTES: - Generally you would want to multiplex things as much as possible. A single instance of the rsync algorithm should handle at least a whole directory, possibly the entire tree. It would proceed partially independently of the main CVS conversation. - The first command allocates a tunnel number, binds it to something, and provides whatever additional data is needed for that something. - The two ends of the algorithm can send arbitrary data to each other using the "Tunnel" command. When no more data needs to be sent, they can use the "CloseTunnelSend" command. - It is probably simplest if you allow the tunneled protocol to duplicate information where convienient (e.g., use a "Syncing filename\n" to say that a file is being sync via rsync, even though that same information is probably embedded in the rsync tunnel somehow. gzip is generally very good with this kind of redundency... - It would be simplest to have client refrain from sending "update" until rsync has finished, but (assumming there is one "update" command per directory, which I am not sure about) it could get better network utilization by having the server buffer commands after the update until rsync has finished everything needed for that particular update command. Other Benefits: - This could also be used by other things in the future, possibly including dynamically loaded plugins/hooks that might be a new/faster alternative to the *info scripts currently supported. - This rsync tunnelling could also be used by internal server updates in a master/slaves distributed cvs server. (See below (31).) 31. Yet another multisite system. [See also WishList 12, 13, 14, 15, 28 and 31.] [NOTE: The following could actually be implemented using some kind of CVS proxy server to implement the master/slave technique. No modifications are needed to CVS itself. It could even redo the way some things are done (like use rsync instead of whole files (receive file on slave, rsync it to master, send file to server)...] This time the idea is that there is a single writable repository, and multiple readonly slaves. So far this can be emulated on its own. Key distinguishing features: a. Clients always connect to one of the slaves, but if the client wants to do something that requires write access, the slave *automatically* forwards that request to the master. b. When a write-to-master operation is finished, the slave automatically syncs up with the master. c. On read operations, it can autosync using any of of least 3 strategies: - Don't auto-sync at all. - Normally just do a lightweight file size and/or timestamp check. Only o heavy sync if the light check indicates the the necessity. Perhaps the client should cache the the timestamp it last saw on the server? - Do heavy sync always. (Use rsync to make sure slave ,v files match server ,v files.) Note that this doesn't necessarily defeat the purpose of the the master/slave arrangement - only light rsync validations occur needlessly; bulky file contents changes only transfer once for a whole set of sandboxes, not once per sandbox. d. There will be a configured-in default strategy, as well as the ability for the user to explicitly override it. [Perhaps using the -s global option.] e. It would be good if we perhaps used CVSup or something instead of raw rsync so that we could avoid overwriting more recent ',v' files if the master was restored from an old backup. Although it should obviously not loose the newer data, I'm not sure what, if any, automatic recovery and/or built-in recovery commands should be programmed in. (If server time-stamp is obsreved to go backwards in time and the file isn't identical, set a persistent "inconsistent" state until recovery is accomplished...) f. Make sure we can talk between client and slave using old clients, but optional extensions may be allowed (an up-front-hint about weather a client may need to forward to the master?) g. Authentication is a little hairy: pserver is easy, but getting all the keys ans stuff set up for every user to allow ssh between client/slave and between slave/master could be awkward. Perhaps have the ability to use a different method between slave/master then between client/slave would ease things. (Include the encrypted-pserver method??) h. Possible internal changes: - Update the main loop to support multiple sources of input (using select(), etc.) Could also benefit plugins (*info and/or possible dynamic loading alternative), and rsync protocol (WishList 30 above). - The recursion/locking utilities may need significant changes to work well. (Actually, I hear it already needs work to make a better fix then the current "lock the whole tree" workaround to some kind of "cvs tag" race condition.) - Question: How closely should the slave monitor what it is forwarding between the client and the master? Perhaps once the slave realizes it needs to forward to the master, it should cleanup whatever it started to do, open a connection to the master, and then tunnel the client's communications to the master. Non-tunneled slave/master comm would be reserved for slave/master-only data (like rsyncing the ',v' files). - What to do if the slave and master dissagree about the correct "Valid-Responses" line to send to the client? Is it safe to send more then one "Valid-Response"? i. Also be able to schedule master/slave syncs in a cron job, without involving sandboxes. 32. Mimick Clearcase "config-specs": - Design a language that has most of the features of clearcase "config-specs". (Specify what sticky tags/dates of files, whole file trees, and/or filename globs to checkout.) - Be able to checkout/update a view to match a given config-spec. - Be able to derive a config spec from the current "sticky" information in a sandbox. - Ideally, it would include the ability to handle rearranged directories or even multiple repositories in a single sandbox. - Consider how it might be merged with the modules facility? - First cut should definately be a seperate script. If it proves useful, then it can be reimplemented as part of CVS itself in the future. 33. Support inserting extra whitespace into an RCS file to enable fast tagging (when possible, just overwrite some of the whitespace instead of rewriting the whole RCS file). Notes: - Use variable amounts of whitespace (by file size and/or random numbers) to prevent a "herd" problem where every file needs the slow method at the same time. - Do file block and memory page alignment to minimize critical sections against anything trying to read the file without doing CVS locks. - This may conflict with wishlist 25 (compressed RCS files), but maybe zlib could somehow support making part of the file uncompressed?