ShortIdSetDiff - compute the set difference of two ShortId files
ShortIdSetDiff file1 file2
The repository reads two separate files to decide which files to keep when it does a weed: one for sources, and one for deriveds. These files can be used with the CountShortIds(1) program to compute the number and sizes of both kinds of files in the repository. However, the list of derived files may also contain what are traditionally considered source files if those source files were named in the result of any function call. The ShortIdSetDiff program can be used to subtract out any source files in the source keep list from the derived keep list. To determine the source and derived keep lists used by the most recent repository weed, use the TestShortId program's "g" command.
ShortIdSetDiff writes the set difference of the ShortId's in file1 minus the ShortId's in file2 to the standard output. Both files may be unsorted, and both may contain duplicates. The resulting output, however, will not contain duplicates.
Both arguments may name a file by name or ShortId. If the argument does not start with "0x", it is taken to be a literal filename. If the argument starts with "0x", it is assumed to be a hexadecimal value denoting the ShortId of a file in the repository. Rather than reading the file named by the literal argument, ShortIdSetDiff instead reads the file in the repository denoted by the hexadecimal value.
First, the TestShortId program is used to determine the derived and source keep lists of the most recent repository weed:
$ ~mann/tmp/TestShortId (q)uit, (c)reate, (o)pen, (l)eafShortId, (s)hortIdToName, (k)eepDerived, check(p)oint, (g)etWeedingInfo, to(u)ch: g ds = 8fcbc594, dt = 872043831, ss = ab081304, st = 872216605 sourceWeedInProgress = 0, deletionsInProgress = 0, deletionsDone = 1, checkpointInProgress = 0 (q)uit, (c)reate, (o)pen, (l)eafShortId, (s)hortIdToName, (k)eepDerived, check(p)oint, (g)etWeedingInfo, to(u)ch: qThe derived keep lists are named by the ds and ss values, respectively.Next, we run ShortIdSetDiff on these two files, saving the results in a temporary file:
$ ~/vesta/bin/ShortIdSetDiff 0x8fcbc594 0xab081304 > /tmp/deriveds Reading /rafael/vesta-srv/sid/ab0/813/04... Total lines = 40155; unique lines = 9499 Reading /rafael/vesta-srv/sid/8fc/bc5/94... Total lines = 95952; unique lines = 6942; written lines = 5258This output indicates that the source keep list (the second argument) contains 9,499 unique ShortId's and that the derived keep list (the first argument) contains 6,942 unique ShortId's. Of these 6,942, 5,258 were not in the source keep list, and were written to the file /tmp/deriveds.
This page was generated automatically by mtex software.Allan Heydon (heydon@src.dec.com)
Created on Thu Aug 21 23:59:17 PDT 1997 by heydon Last modified on Fri Nov 9 14:31:15 EST 2001 by ken@xorian.net modified on Fri Aug 22 00:09:50 PDT 1997 by heydon