CMP -- Compare Text or Binary Files
User Guide

release 5.13, 17 December 2001
Copyright © 1994-2001 by Stan Brown, Oak Road Systems
 

CMP will compare text or binary files (or groups of files) and report any differences. Output is suitable for piping, or processing by other programs. A value returned in ERRORLEVEL lets batch files take action based on whether files are the same or differ.

This user guide is sometimes revised between software releases. You may want to check for revisions at http://oakroadsystems.com/sharware/cmp.htm.
              
 Why CMP?
 Getting Started
  System Requirements
  Installation and Demo
  Evaluation, License, and Warranty
 User Instructions
  Comparing Single Files
  Comparing Groups of Files
 Overview: How CMP Compares Files
  File Selection
  The Input Stage
  Difference Blocks, Look-Ahead, and Resync
  Reporting Difference Blocks
  Summary Report
 Options
  How to Specify Options
  Options by Category
  Environment Variable
  Alphabetical List of Options:   0 1 2 ? A B C D E F I L M N Q R S U W Z
 Return Values (ERRORLEVEL)
 What's New?

 


Why CMP?


CMP works your way, much more than the DOS utilities COMP.COM and FC.EXE.


 

Getting Started



System Requirements

The 16-bit program CMP16 runs under plain DOS, or in a DOS box under Windows. The 32-bit program CMP32 requires a DOS box under Windows 98, Win95, or Win NT 4.0. (I fully expect it to run in Windows 2000 and Windows ME, but have not tested it.)

CMP16 and CMP32 operate the same and have the same features, with two exceptions:

If you typically run CMP in a DOS box under Windows 9x or NT, CMP32 is the one you want.


Installation and Demo

There is no special installation procedure. Simply move CMP16.EXE, CMP32.EXE, or both to any convenient directory in your path. Each executable is completely self contained.

You may wish to rename the executable you use more often, CMP32.EXE or CMP16.EXE, to the simpler CMP.EXE. All the examples in this user guide will assume you've done that. Otherwise, just substitute CMP16 or CMP32 wherever you see CMP in the examples.

If you'd like to see a demo of CMP, change to the subdirectory containing the demo files and execute the batch program:

        cd demo
        demo

You need to have 100 KB or so free on your hard drive for temporary files that will be created. The files are automatically deleted when the demo finishes.


Evaluation, License, and Warranty

CMP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

The registered version offers some enhancements over the evaluation version:


 

User Instructions


For a quick summary of operating instructions and options, type

        cmp /? | more

The basic command form is

        cmp options file(s) otherfile_or_directory

Differences are normally listed on the screen, but you can send them to a file with normal DOS redirection (>reportfile). If you don't want to see the differences at all, but just want CMP to test whether the files are the same, >NUL will do that, or you can use the /Q3 option to get a one-line "report".

Options can actually be specified anywhere on the command line, not just before the first file spec, and they can be stored in an environment variable. All options will be scanned and will apply to all files, no matter where they appear on the command line

File specs may contain wild cards; see Comparing Groups of Files below.

There are two special rules for file specs that contain certain characters:


Comparing Single Files

To compare one file to another:

        cmp [options] filespec1 filespec2 [>reportfile]

Example 1: Compare two files in the current directory.

        cmp /w100 mywords.txt herwords.txt

Example 2: Compare MYWORDS.TXT in directory D:\ORDINAL to OURWORDS.TXT in the current directory.

        cmp /w100 d:\ordinal\mywords.txt ourwords.txt

Example 3: Compare FERNS.TXT in the ORDINAL directory of disk D to a file of the same name in the BACKUP directory of the current disk. As you can see, when the two files have the same name, you need only type the file name once.

        cmp /w100 d:\ordinal\ferns.txt \backup
        cmp /w100 d:\ordinal\ferns.txt \backup\ferns.txt

Example 4: Compare LIZARD.CPP in the ORDINAL directory of disk D to a file of the same name in the current directory of the current disk. Remember that "." means "current directory" in DOS commands.

        cmp /w100 d:\ordinal\lizard.cpp .
        cmp /w100 d:\ordinal\lizard.cpp .\lizard.cpp

Comparing Groups of Files

To compare groups of files, specify only a disk and/or directory as the last filespec:

        cmp [options] [path\]files path [>reportfile]

CMP will tell you at the end how many comparisons it tried, broken down to file pairs the same, file pairs different, and files missing.

Example 1: Compare all files in the current directory with extension .TXT to files of the same names in directory D:\OTHER.

        cmp *.txt d:\other

Example 2: Compare three named files in the current directory to files of the same names in directory D:\PLACATE.

        cmp sheep.txt goat.txt eland.txt d:\placate

Example 3: Compare the three named files in directory F:\FIRST to files of the same name in directory SECOND, a subdirectory of the current directory.

        cmp f:\first\sheep.txt goat.txt eland.txt second

When comparing multiple files, only the first file may have a disk or directory indicated. The same path will be applied to the other files automatically.

Example 4: Compare all the .DOC files in the current directory of drive A, plus XX.HTM in the current directory of the drive A, to files of the same names in the current directory of drive B.

        cmp a:*.doc xx.htm b:

 

Overview: How CMP Compares Files


You can customize almost every aspect of CMP's operation, but all those choices can be bewildering. This section gives you an overview of how CMP operates, so that you can understand the various options in context. There is a lot here, so by all means feel free to skip right to the options when you're first getting started.

Try playing around with the sample files ALICE1, ALICE2, ALICE3 that are included in the DEMO subdirectory. Try comparing files with various options, and then look at the differences in results. The included demo.bat program shows you the effects of some options.

Hint: When you are comparing the output of CMP with different options, you can redirect the two runs to two different files and then use CMP itself to compare them. For example, to compare two files with and without the /E option, and see what difference it makes, you could do this:

        cmp alice1 alice2    >without
        cmp alice1 alice2 /e >with
        cmp with without | more

File Selection

        Wild Card Expansion

Please be aware that CMP16 and CMP32 expand wild cards slightly differently because CMP32 supports long filenames. Thus CMP32 would expand abc* to include all files, with any extension or none, whose names start with abc; with CMP16 you need abc*.* to get the same result. This matches the way DOS commands like DIR operate.

When expanding wild cards, CMP will consider hidden and system files for possible matches.

        Single Compare

If you are comparing one specified file to another specified file -- no wild cards, no subdirectory search -- then CMP considers it an error if either one can't be found. In that case, CMP displays an error message and stops.

        Multiple Compare and Missing Files

If you are comparing multiple files, CMP will process the file specs in order, comparing each one against any matching files in the other directory. If you used the /S option to search subdirectories, CMP will match all file specs from the command line against each directory before moving on to the next subdirectory.

The question then arises, what to do about missing files? Consider this example:

        cmp d:\a\*.htm d:\b

There are two kind of missing files to consider in multiple compare:

  1. File doesn't exist in the first directory. In the above example, this problem occurs if there are no .HTM files in directory D:\A.

    If you specify *.htm, you probably want to know if there are no .HTM files at all. But suppose you specified the /S option for subdirectory search? Then maybe you know perfectly well that there are no .HTM files in directory D:\A, because you're interested in comparing the HTM files in several subdirectories of D:\A.

  2. File exists in the first directory but not the second. In the above example, suppose CMP finds files P.HTM, Q.HTM, and R.HTM in directory D:\A but only R.HTM in directory D:\B. Then P.HTM and Q.HTM are considered missing files from directory D:\B.

    Maybe you want a message for each .HTM file that's in A but not in B, but maybe you really mean only to compare the few that exist in both directories and you don't care about the others.

Here's how CMP resolves these issues: Ordinarily it will not tell you about either sort of missing files. But you can request these warnings by means of the /C3 option, telling CMP to display messages for both types of missing files. If you want finer-grained control, see the description of the /C option.

For even more information about every possible file match and mismatch, use the /D option and look for the lines beginning "cmp CF:".


The Input Stage

After scanning the environment variable and the command line, CMP reserves computer memory as needed for the look-ahead buffer, according to the values set by the /L option and /W option. Then CMP begins reading lines from the (first) two files to be compared.

The /Wwidth option is important in reading files. For binary files, CMP reads chunks of width characters at a time. For text files, CMP reads a line at a time, but ignores the excess on any line that is longer than width characters. CMP will let you know about every such line; use the /Q1 option to suppress such warnings.

After reading each line of a text file, CMP immediately discards any spaces and tabs at the end. Therefore, if two lines are the same except that one of them has some trailing spaces or tabs and the other does not, CMP considers them to be the same. Those trailing spaces and tabs do count against the maximum width, however.

Empty lines are normally treated the same as any other lines, but you can use the /E option to tell CMP to discard empty or blank lines. They will not be used in comparison and will not appear in difference reports. (They will still be counted, so that line numbers in the output will be correct.)

        "Massaged" Lines

Upon reading each line, CMP may store it in memory as it was read from the file, in a "massaged" form, or both. If the /I option is set, CMP will massage the line by changing all letters A-Z to lower case. If the /B option is set (for text files), CMP will massage the line by changing all runs of spaces and/or tabs to a single space. (If the /B option and /I option are not set, there is no massaging of lines.) It is the massaged lines that CMP will compare between files.

When lines are massaged, CMP normally stores both forms, so that the difference reports will show the lines exactly as they were in the files. But if the /M option is set then CMP will not save the original lines for display. In that case the difference reports will show the massaged lines, but you will have more memory available for look-ahead.

As long as corresponding lines from the two files are the same (after any massaging), CMP discards them and keeps reading. But when they don't match, CMP recognizes the beginning of a difference block.


Difference Blocks, Look-Ahead, and Resync

When the next line from the first file doesn't match the next line from the second file, CMP recognizes the beginning of a difference block. Now CMP keeps reading lines trying to resynchronize the files.

The /Llook-ahead,resync option limits this look-ahead procedure. CMP may accumulate up to look-ahead lines from each file, trying to find lines that match again. CMP considers that it has resynchronized the files if resync consecutive lines are the same between the two.

An example may help clarify the significance of look-ahead and resync. Suppose CMP finds, after the first 31 lines of the two files match, that line 32 of file 1 doesn't match line 32 of file 2. In this case, CMP has to look ahead at line 33 of file 1 and line 33 of file 2.

              file 1               file 2
        ------------------   ------------------
        (two files identical to this point)
        (31) line d          (31) line d
        (32) line e          (32) something different

Maybe the two lines 33 will match, or maybe line 32 of file 1 will match line 33 of file 2 (meaning that line 32 of file 2 is new in that file and doesn't exist in file 1). Maybe there are 25 new lines in file 2, and line 32 of file 1 will match line 57 of file 2. CMP needs to keep looking ahead until it does find a match.

If CMP can't resynchronize files within the specified look-ahead depth (/L option), it will display the message

        cmp warning: look-ahead lines from both files with no match **

and then report the differing lines. Then it will proceed to the next files to be compared, if there are any. To avoid this message, run the comparison again with a higher look-ahead value in the /L option.

        The Resync Value

After a difference, finding just one line from file 1 that matches a line from file 2 may not be enough. This is where the resync value of the /L option comes in. CMP will not consider the two files resynchronized until that number of lines from the two are the same.

The resync value has an effect on the quality of difference reports, meaning how differences are grouped. You don't have to worry about missing any differences, no matter what value of resync you select.

This is quite a long section. If you find yourself getting bored, you may prefer to skip it and simply try come compares, experimenting with different values for both numbers on the /L option. The supplied ALICE* files in the DEMO directory are designed for this.

(Note: Difference blocks can be reported in UNIX diff format or traditional format, depending on the /U option. Example 1 shows traditional format, and Example 2 shows UNIX diff format.)

Example 1. Consider the lilies of the field, and this scenario:

             file 1                 file 2
        -----------------    -------------------
        (80) Four score       (90) Four score
        (81) and seven        (91) and seven
                              (92) (i.e., 87)
        (82) years ago,       (93) years ago,
        (83) our fathers      (94) somebody
        (84) brought forth    (95) brought forth
        (85) upon this        (96) on this
        (86) continent a      (97) continent a
        (87) new nation,      (98) mighty nation,
        (88) conceived in     (99) born in
        (89) liberty and     (100) liberty and
        (90) dedicated to    (101) dedicated to

As you can see, a number of edits have been made in this one paragraph of Lincoln's famous speech. Do you want each edit reported as a separate difference, or would you prefer to see the differences in this paragraph as one connected change? Pre-5.0 releases of CMP had a fixed resync of 1 and reported each of the above changes separately, like this:

        2.92>(i.e., 87)

        1.83>our fathers
        2.94>somebody

        1.85>upon this
        2.96>on this

        1.87>new nation,
        1.88>conceived in
        2.98>mighty nation,
        2.99>born in

With resync = 1, as shown above, CMP sees that file 1 line 82 matches file 2 line 93, so it considers that difference block at an end; then one line later it starts a new difference block because file 1 line 83 is different from file 1 line 94.

But you may prefer that CMP not consider the files matched up again until it finds two consecutive matching lines, like lines 89-90 and 100-101 in the above example -- in other words, longer difference blocks but fewer of them. With resync set to 2, CMP reports a single connected series of edits on the above passage:

        1.82>years ago,
        1.83>our fathers
        1.84>brought forth
        1.85>upon this
        1.86>continent a
        1.87>new nation,
        1.88>conceived in
        2.92>(i.e., 87)
        2.93>years ago,
        2.94>somebody
        2.95>brought forth
        2.96>on this
        2.97>continent a
        2.98>mighty nation,
        2.99>born in

Higher values of resync are allowed, but for the above passage higher values would have the same effect as resync = 2.

Which resync value is right? CMP sets resync = 2 by default, but which resync value is "right" depends on the specifics of the data. If CMP seems to be reporting a lot of differences for files that have cluster changes like the above, you may find the report is more useful if you set a higher resync value with the /L option option. But don't make resync too large: you'll slow CMP down without making the reports better. Probably you'll never want to set resync greater than about 5.

CMP must be able to find resync identical lines within the look-ahead limit. For instance, suppose you specify the /L option as /L10,2. You are telling CMP to look ahead ten lines at a time, but of those ten lines two must resynchronize. That means that, with /L10,2, any difference block longer than 10-2 = 8 lines will cause CMP to give up on those two files and move on to the next files, if any.

If you need to reduce resync to 1 for some reason, you probably want to turn on the /E option option as well. That will prevent CMP from synchronizing on blank lines.

Example 2. Here's a clearer example of how resync can make a difference. The problem is especially likely to crop up if you have sections of text separated by blank lines, because then CMP resynchronizes on the blank lines instead of actual matching text. (Blank lines can be ignored by setting the /E option, but the same problem also applies to lines of dashes, the closing */ and } in C or C++ code, and so forth.)

Consider this excerpt:

             file 1                 file 2
        -----------------    -------------------
        (20) line A           (20) line A
        (21) end.             (21) end.
        (22) line B
        (23) end.
        (24) line D           (22) line D
        (25) line E           (23) line E
        (26) end.             (24) end.
        (27) line G           (25) line G
        (28) line H           (26) line H
        (29) end.             (27) end.
        (30) same after this  (28) same after this

As you see, file 1 lines 22-23 do not exist in file 2; otherwise the files are identical.

Suppose resync = 1. CMP sees that the two lines 22 don't match and reads ahead to file 1 line 23 and file 2 line 24, which do match; so CMP reports a difference block of file 1 line 22 and file 2 lines 22-23. But now the next sections don't match: file 1 lines 24-25 are different from file 2 lines 25-26. So CMP reports another difference, which ends at file 1 line 26 and file 2 line 27. Then there's another mismatch, and so on right down the file. CMP never does resynchronize:

        22c22,23
        < line B
        ---
        > line D
        > line E
        24,25c25,26
        < line D
        < line E
        ---
        > line G
        > line H
              (etc.)

and so on for a very long report from just one real difference.

With resync = 2, CMP recognizes this and reports just one difference, file 1 line 22-23:

        22,23d21
        < line B
        < end.

For this particular pair of files, a resync value of 2 is definitely better. Other pairs of files may work best with a different resync, but 2 is usually a good first choice, which is why it's CMP's default.

        Look-Ahead and Memory Use

The look-ahead buffer uses your computer's memory. CMP32 can use all memory including virtual memory, but CMP16 can use only DOS memory.

The look-ahead buffer uses, in bytes, either roughly the look-ahead value in the /L option times the width in the /W option, or double that product. Why double? If you set an option that causes input lines to be massaged, CMP stores two copies of each line, one massaged for comparison and one original for display in difference reports. In that case, you can free up memory for the look-ahead buffer by setting the /M option.

You don't need to remember all this. If you exceed the available memory with the combined options, CMP will display a message suggesting you try lower values for /L or /W, or turn on the /M option if that would help.

        Look-Ahead and Program Run Times

In a difference block, CMP has to compare each new line from each file with all the non-matching lines from the other file. This means that the number of compares grows as the square of the number of different lines, so the program may run rather slowly on files that have very long difference blocks. For instance, if you set /L2500, you are telling CMP that whenever it finds a difference between the two files, it should look ahead as far as 2500 lines in each file to try to resynchronize. If in fact the next 2499 lines of the two files are different, CMP will be doing roughly 2499² = over 6 million comparisons on that block alone. (The number of lines in the file is not an issue, just the number of consecutive lines that are actually different.) If you have files with long runs of differing lines, you can make CMP run faster by using a smaller look-ahead value.


Reporting Difference Blocks

CMP normally reports each difference block to the screen; you can add >reportfile on the command line to send this output to a file instead. You can use the /A option to limit CMP to reporting a certain number of difference blocks. You can prevent completely prevent CMP from reporting difference blocks with the /Q3 option; then CMP will report just one line for each pair of files, to tell whether they were the same or different.

CMP gives you significant control over how difference blocks are reported. The biggest choice is between UNIX diff format or traditional format; if you chose traditional format there are additional options for line numbers and separators.

For either format, if you have compressed runs of white space with the /B option or chosen to ignore case with the /I option, the original lines will ordinarily be displayed. To reduce use of computer memory, use the /M option. This tells CMP to display the "massaged" lines in difference reports, and frees up extra memory for a larger look-ahead buffer.

UNIX diff format (the /U option) shows the lines without line numbers, but precedes each difference block with the numbers of the lines added, changed, or deleted, like this:

        1a2,5
        >                  SHERLOCK HOLMES
        >         THE ADVENTURE OF THE SPECKLED BAND
        >             by Sir Arthur Conan Doyle
        >
        8,10c12,14
        < the acquirement of wealth, he refused to associate
        < with any investigation which did not tend
        < towards the unusual, and even the fantastic. Of
        ---
        > the acquisition of wealth, he refused to associate
        > himself with any investigation which did not tend
        > toward the unusual, and even the fantastic. Of
        52,54d59
        <   "My dear fellow, I would not miss it for
        < anything."
        <

By contrast, the traditional CMP report form shows the differing lines from file 1 and file 2 with their line numbers, like this:

        2.2>                 SHERLOCK HOLMES
        2.3>        THE ADVENTURE OF THE SPECKLED BAND
        2.4>            by Sir Arthur Conan Doyle
        2.5>

        1.8>the acquirement of wealth, he refused to associate
        1.9>with any investigation which did not tend
        1.10>towards the unusual, and even the fantastic. Of
        2.12>the acquisition of wealth, he refused to associate
        2.13>himself with any investigation which did not tend
        2.14>toward the unusual, and even the fantastic. Of

        1.53>  "My dear fellow, I would not miss it for
        1.54>anything."
        1.55>

With the traditional report form, a block of added lines is shown by leading 2s with no leading 1s, changed lines have some of each, and deleted lines have leading 1s with no corresponding leading 2s.

You can customize the traditional report form in several ways:


Summary Report

For each pair of files, CMP will normally report the number of lines in each file and the number of difference blocks found:

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 125
        ** The files are significantly different.  Blocks reported: 8

If the files compare the same, you will see a message like this one:

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 120
        ** The files are identical.

If you have the /B option, /E option, or /I option set, you have indicated that some actual differences are not significant. In this case, if the files compare equal the message will say

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 124
        ** The files are effectively identical for the options chosen.

Note that the files compare equal even though they have different numbers of lines. This can happen when empty lines are suppressed with the /E option.

Finally, if some lines have been truncated according to the /W option, the message will say "effectively identical within the /W width".

Though CMP normally reports the number of lines in each file, the /A option or the /Q2 or /Q3 option tells CMP not to display that line of the summary report.

        Final Truncation Warning

CMP reports truncation as it reads individual lines, but does not summarize truncation for each pair of files. However, because the individual truncation messages may be overlooked or suppressed, CMP also reports a final truncation message at the very end:
 
      cmp warning: lines were truncated -- use /W265 for complete comparison
 
You can see that CMP tells you the longest line it read in any file. If you want to re-run the comparison and have each file compared to the very end of each line, use the suggested value for the /W option. (Even this message is suppressed by the /Q3 option.)


 

Options


CMP's operation can be modified by quite a number of options, either on the command line or in an environment variable.

Because there are a great many options, they are presented below both by category and alphabetically. Here are quick hyperlinks to each option:

0   1   2   ?   A   B   C   D   E   F   I   L   M   N   Q   R   S   U   W   Z


How to Specify Options

You have a lot of freedom about how you enter options:

For instance, the following are just some of the different ways of turning on the W100 and B options:
 
      /w100 /b    /w100-b    /w100/b    /w100B    -W100-B    -W100 /b


Options by Category

(Some options are listed in multiple categories to make them easier to find.)

These options affect file input:

These options affect the comparison process:

These options affect output:

Finally, here are the general program options:


Environment Variable

If you use certain options frequently, with the registered version of CMP you can put them in the ORS_CMP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

CMP processes the environment variable before any command-line options, which means that an option on the command line will override the corresponding option in the environment variable.

The toggles, /2 /B /E /I /M /R /S /U, reverse their state every time you specify them. So if you usually want case-blind comparisons, put /I in the environment variable. Then, if you want case-sensitive comparisons for a particular run, simply put /I on the command line and that will reverse the setting from the environment variable. To alter the settings of other options, like /L and /F, simply put the option on the command line with the new desired setting.

You may want to specify options without regard to what might be in the environment variable -- when running CMP in a batch file, for instance. To ensure this, put the /Z option first on the command line.

If you have any question which options are in effect, simply use /D on the command line to display all option values.


Alphabetical List of Options

/?
Display a help message and option summary, then exit with no further processing. You can redirect or pipe this information. For instance, you can display the help text one screen at a time by typing
        cmp /? | more
or print the help text with the command
        cmp /? >prn
/0 and /1
These options let you control the values that CMP returns to DOS.
 
/0   Return 0 in ERRORLEVEL if there are any differences in any files, or 1 if every pair of files compares equal.
 
/1   Return 1 in ERRORLEVEL if there are any differences in any files, or 0 if every pair of files compares equal.
 
neither   Return 0 in ERRORLEVEL.

Regardless of these options, CMP will return a higher value in ERRORLEVEL for premature termination. For more details, see Return Values later in this user guide.
 
/2
Arrange every comparison so that the older file is file 1 and the newer file is file 2.
 
/An
Stop comparing after reporting n difference blocks. If you expect some files to have lots of differences, you can use this option to limit the output and make CMP run faster.
 
The default is to read every file to the end and report all difference blocks; that's equivalent to /A0. If you just want to know whether files are the same or different without seeing the actual differences, see the /Q3 option rather than the /A option.
 
The parameter n limits the number of difference blocks reported, not the number of different lines. And it applies to each pair of files. Example:
        cmp code\*.cpp \bkup /A4
compares all .CPP files in the CODE subdirectory to files of the same names in the BKUP root directory. No more than 4 difference blocks between any one pair of files will be reported. This would be a good choice when you think most files are the same or nearly the same, but a few have lots of differences.
 
Dependencies: When the /Q3 option is set, the /A option is ignored and A1 is implied.
 
/B
Compress all runs of blanks and/or tabs in text files to a single blank, for purposes of comparison and display. With the /B option, CMP considers "a    b", "a b", "a{tab}b", and "a  {tab} b" identical.
 
Runs of spaces and/or tabs are compressed to a single space, not completely removed. Thus CMP will always consider "ab" (with no space between "a" and "b") different from "a b" (any spaces or tabs between "a" and "b").
 
Regardless of this option, CMP will always ignore spaces and tabs at the ends of lines in text files. Some more details are given above in "Overview: The Input Stage".
 
Dependencies: The /B option is ignored when the /R option (binary files) is set.
 
/C
Complain (display a warning message) when expected files are not found during multiple compare. The option may have any of four values:
 
/C0   (default) Don't display any warnings for missing files.
 
/C1   Display a warning for class 1 missing files, when a file spec on the command line does not match any files in the first directory.
 
/C2   Display a warning for class 2 missing files, when an existing file in the first directory has no counterpart with the same name in the second directory.
 
/C3   Display warnings for both classes of missing files.
 
Please see Overview: Multiple Compare and Missing Files for more information about kinds of missing files -- or just specify /C3.
 
Ordinarily, warnings about missing files are written to the standard error stream, which is usually your screen. However, if you have redirected standard output to a report file with >, and you have not limited output with the /Q2 or /Q3 option, then CMP will display these warnings to stderr and write them to the report file.
 
The /C option is meaningful only in multiple compare. In single compare, a missing file is always an error since there is nothing for CMP to do.
 
/Dfile   or   /D   or   /D-
Display debugging information. This includes whether you're running CMP16 or CMP32, whether this program is registered, the contents of the environment variable, the values of all options specified or implied, the files specified, and details of every file scanned. This information is normally suppressed, but you may find it helpful if CMP seems to behave in a way you don't expect.
 
Since the debugging information can be voluminous, if you want to see it at all you will usually want to specify an output file. The file must follow the D with no intervening space, and the filename ends at the next space. CMP will append to the file if it already exists.
 
A plain /D sends debugging information to the standard error output (normally the screen). Be careful not to specify any other options between /D and the next space, or they'll be taken as a filename. Finally, /D- sends debugging information to the standard output, which you can redirect (>) or pipe (|). This intersperses debug information with the actual output of CMP.
 
You can weed through the debugging output to some extent. CMP writes the following unique strings on most lines of output, so you can send debug output to a file and then grep the file for  
/E
Ignore any empty lines, or lines that contain only blanks and tabs. Without the /E option, CMP will keep track of blank lines and report added or deleted blank lines as differences.
 
The /E option can make CMP do a much better job on some text files, because it keeps CMP from resynchronizing on a blank line. Please see Example 2 in the overview.
 
Dependencies: The /E is ignored when the /R option (binary files) is set.
 
/Fn   or   /F0n   or   /F-n
Format line numbers in a field of n columns when reporting difference blocks in traditional format.
 
The /F option lets you ensure that reported difference lines all line up visually. (You might wonder why CMP doesn't just figure the necessary width on its own. To do that, CMP would have to read each file an extra time, just to count lines. That would slow the program down significantly.)
 
n may be as large as 10. /Fn and /F0n right justify the line numbers, filling unused positions with spaces and zeroes respectively. /F-n left justifies the line numbers. The default is the same as /F0, which displays each line number with no padding, as shown in the sample difference report.
 
n is a minimum field width, but the whole line number is always displayed. For example, if you specify /F4, /F04, or /F-4, line numbers for any differences in lines 1 through 9999 will be justified in a four-character field. Any larger line numbers will expand to the right, like this:
         /F4 output         /F04 output         /F-4 output
        1.  98>text1a      1.0098>text1a       1.98  >text1a
        2.  99>text1b      2.0099>text1b       2.99  >text1b
        2. 100>text1c      2.0100>text1c       2.100 >text1c

        1.2398>text2a      1.2398>text2a       1.2398>text2a
        2.2399>text2b      2.2399>text2b       2.2399>text2b

        1.234168>text3a    1.234168>text3a     1.234168>text3a
        1.234169>text3b    1.234169>text3b     1.234169>text3b
        2.234170>text3c    2.234170>text3c     2.234170>text3c
Dependencies: The /F option is ignored when the /Q3 option (don't display difference blocks) is set.
 
/I
Ignore case; treat letters A-Z the same as a-z for comparison.
 
Because of limitations in the MSVC library, the /I option affects only the English letters A through Z. Non-English lower-case letters are always considered different from the corresponding upper-case letters.
 
/Llookahead,resync   or   /Llookahead   or   /L,resync
When the files are different, CMP will look ahead as many as lookahead lines in each file to find where the files become the same again, and will consider that the files are the same again only when resync lines from the two files are the same. (If the /E option is set, empty lines will not count against either resync or look-ahead.) Please see the explanation and examples in Overview: Difference Blocks and Look-Ahead.
 
The default is /L20,2 in CMP16 and /L100,2 in CMP32. As the option forms above show, you can specify either resync or lookahead without changing the other.
 
resync can be 1 or greater; lookahead must be at least 2 greater than resync. lookahead may not exceed 32000, but other factors may restrict that.
 
Even if CMP and available memory will let you set lookahead as large as 32000, values greater than a few hundred are not recommended. Please see the note on run times in the Overview.
 
/M
Display lines as massaged according to the /B option or the /I option, not as they appear in the files.
 
CMP normally retains copies of the original lines from file for display in reporting difference blocks. But this roughly doubles the computer memory needed for the the look-ahead buffer. If you're willing to see approximate versions of the original lines in the difference reports, set the /M option and you increase the space available for look-ahead.
 
Dependencies: The /M option has effect only if you have turned on the /B option, the /I option, or both.
 
/Nstr
Separate line numbers from lines by str instead of the default > character when reporting difference blocks in traditional form.
 
You can specify a string of up to six characters; the string is terminated by the next space or tab. Don't use quotes with this option unless you want them in the output.
 
If you want certain characters like =, |, <, or space in your separator, you can't simply type them because DOS gives them special meanings. Use special "numeric escape sequences" to represent those characters in the /N option. For example, to make your output look like this:
        1. 98 : text1a
        2. 99 : text1b
        2.100 : text1c

        1.398 : text2a
        2.399 : text2b
use the sequence \32 to represent the space character, like this:
        cmp /N\32:\32 /F3 file1 file2
The numeric escape sequences are a backslash (\) followed by the numeric value of the character, up to three decimal digits. A leading 0 denotes octal; a leading 0x or 0X denotes hexadecimal. Here are some sample sequences:
 
instead ofuse any of            
(space)\32  \0x20 \040
(tab)\9   \0x09 \011
< (less)\60  \0x3C \074
= (equal)\61  \0x3D \075
> (greater)\62  \0x3E \076
| (vertical bar)\124 \0x7C \0174
" (double quote)        \34  \0x22 \042
 
The above are only examples: you can enter any character as a numeric sequence. For example, capital A would be \65, \0x41, or \0101.
 
Dependencies: The /N option is ignored when either the /U option (UNIX diff-style output) or the /Q3 option (don't display difference blocks) is set.
 
/Qlevel       (registered program only)
Set the quietness level, to suppress some output that you may not want. Please see the Overview for discussion of the normal output from CMP.
 
/Q0   (default) Display all normal messages and warnings.
 
/Q1 Suppress the program logo, any warning messages about individual truncated lines, and the final display of line counts for the two files. If any lines were truncated, a single message will still appear at the end of processing.
 
/Q2 Suppress the items mentioned for /Q1 plus the blank lines between difference blocks. Also, send the headers (file names) and footers (count of difference blocks or message that files are equal) to stderr (the error output, normally your screen) rather than stdout (standard output, which can be redirected with > or piped with |).
 
This lets you redirect the output of CMP and get only the difference lines from the two files. You still get line numbers, but by using the /F option you can force them to a fixed format that is easily stripped away. Example:
            cmp /Q2 /F6 file1 file2 >report
will send just the different lines to the file called REPORT, suppressing all non-essential messages. Essential messages will appear on your screen because they are written to stderr and are not redirected. Assuming each file has fewer than a million lines, each line redirected to the REPORT file will have a 9-character prefix: file number (1 or 2), a period, a six-digit line number field, and the separator character >.
 
/Q3 Suppress the program logo and all output, even the summary truncation warning and warnings about questionable use of options. Error messages will still be displayed, and so will the one-line summary showing numbers of files same, different, and missing.
 
For each pair of files compared, CMP will display just one line of output consisting of the file names and the comparison status, "identical", "identical/massaged" (if the /B option, /E option, or /I option was set), "identical/truncated" (if lines were truncated because of the /W option width setting), or "different".
 
This is handy when you have two sets of files to compare and don't care about the actual differences, only which files are different between the two sets.
 
/Q without a following number is normally the same as /Q1. The old /QQ option still works and is about the same as /Q2. For historical reasons, a plain /Q after any previous /Q option will reset the quietness level to 0.
 
Dependencies: When /Q2 or /Q3 is set, the /A option and the /U option are ignored.
 
/R
Compare files as binary. This is useful for non-text files such as some word-processing files, spreadsheets, databases, and executable programs.
 
A text file has lines ending with carriage return (ASCII 13), line feed (ASCII 10), or both; and the first Control-Z (ASCII 26) marks the end of file. Also, a text file doesn't contain any NUL characters (ASCII 0). Binary files, on the other hand, may have NUL and Control-Z characters in the middle, and often don't have "lines" separated by anything.
 
DOS doesn't mark files as binary or text, and therefore CMP has no way to know which a given file may be. By default it reads all files as text, but if you specify the /R option then CMP will read all files as binary.
 
When CMP reads files in binary mode, there's no such thing as a line, so CMP reads files in blocks of characters. The block size is given by the /W option.

The choice of text or binary mode also affects how CMP displays lines in difference blocks. In normal text mode, any differing lines are displayed as simple strings. Non-printing characters, like tab (ASCII 9) or Control-X (ASCII 24), are given no special treatment and appear just as DOS displays them; thus screen output may appear strange if a text file contains non-printing characters. But in binary mode, non-printing characters are displayed using their numeric values in hex, such as <09> or <18>.
 
If all you care about is whether two binary files are the same or different, you can use the /Q3 option to suppress the display of difference blocks, or the /A1 option to stop reporting after the first difference.
 
Dependencies: When the /R option is set, the /B option and the /E option are ignored.
 

/S
After comparing the indicated files, work down the subdirectory trees to compare matching files in subdirectories, including directories marked hidden or system.
 
The /S option is most useful with wild cards. Consider this example:
        cmp /s *.htm d:\new
Here CMP will compare all .HTM files in the current directory to files with the same names in directory D:\NEW. Then CMP will work its way down all subdirectories below the current directory, and whenever it finds a corresponding file in a corresponding subdirectory under D:\NEW it will compare them.
 
The first set of files need not be in the current directory. For example, suppose that you made a backup a couple of days ago and since then have edited a lot of files, and you now want to list all the changes you made. If the backup is rooted at directory JANBKUP on drive E, and the current files are rooted at directory WORKING on drive C, you could use this command:
        cmp /s e:\janbkup\*.h *.cpp c:\working
Wherever there's a .H or .CPP file in E:\JANBKUP or a subdirectory, such as E:\JANBKUP\WESTREGN, CMP will try to compare it to a file of the same name the corresponding subdirectory (C:\WORKING\WESTREGN). Please see Multiple Compare and Missing Files for details of how CMP will diagnose missing files.
 
The /S option is active in both the registered and the evaluation version of CMP. But in the evaluation version, CMP will search only two levels, the initial level and one level of subdirectories below that.
 
/U
Display UNIX-style output, putting line numbers above each difference block with a letter for added, changed, or deleted. Traditional CMP output displays the line number with each line. Please see Overview: Reporting Difference Blocks for sample outputs.
 
When selecting UNIX-style output, you probably want the /2 option also. The reports list added or deleted lines, and "added" and "deleted" make sense only if the first file is actually older.
 
The freeware Vim editor will color-code UNIX-style difference reports, if your terminal can display colors.
 
Dependencies: When the /Q2 or /Q3 option is set. the /U option is ignored. When the /U option is set, the /N option is ignored.
 
/Wwidth
Compare lines only up to width characters or in width-character blocks. The default width is 254.
 
width can be 2 to 32764 in CMP16 and 2 to 2147483644 in CMP32. But your computer probably doesn't have enough memory for lines that wide; see Look-Ahead and Memory Use in the Overview.
 
Comparing text files
 
CMP will examine each line only up to the specified width, and will display a warning message for any lines that exceed it. You can suppress these warnings by using the /Q1, /Q2, or /Q3 option.
 
In addition to the warning for each line, if any lines were truncated then CMP will display a single warning at the end of execution to tell you the longest line that was read from any file. Then you know the exact value to use with /W if you want to run CMP again and have it compare all lines to the end.
 
If you want to predict the needed width for a given file, simply compare the file to itself with a small width value and the /Q2 option to suppress messages, like this:
        cmp /Q2W10 file1 file1
Comparing binary files
 
CMP will read the files in chunks of width bytes and compare them. There is no question of truncation.
 
/Z
Reset all options to their default values.

If you use the /Z option on the command line, any options in the environment variable will be disregarded, and so will any preceding options on the command line. This can be useful in batch files, to make sure that the action of CMP is controlled only by the options on the command line, and not by any settings in the environment variable.
 
The /Z option is the only one whose effect can't be reversed. If you use /Z more than once, CMP disregards the environment variable and all command-line options up through the last /Z.


 

Return Values (ERRORLEVEL)


By default, CMP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.

0   program ran to completion (whether the files are the same or different)
2   help message displayed (/? option, or no files specified on the command line)
128   program self-check failed
253   not enough memory for look-ahead or other program requirements
254   specified file not available in single-file compare
255   bad option, or other error on the command line
 

You might want to use CMP in a batch file or a makefile and take different actions depending on whether two files are the same or different. To do this, use the /0 or /1 option. The /1 option emulates UNIX diff by returning an error level of 1 if the files are different or 0 if they're the same. /0 is the opposite: it returns 0 if the files are different or 1 if they're the same. In other words, the /0 or /1 option gives the value CMP should return if differences are found.

When comparing multiple files, the /1 option tells CMP to return an error level of 1 if any files compare as different, or 0 if all files compare as identical. The /0 option returns 0 if any files compare different, or 1 if all files compare identical. Missing files do not count as different, whether or not you have turned on warning messages for them.


 

What's New?


Only the more important changes are listed here. As always, the complete revision history is available as a separate document.

CMP 5.13, 17 December 2001

Another minor release. By user request, "file not found" messages /C option) are now written to both standard output and standard error, if you redirected standard output to a report file. Also by user request, when you append debug output to an existing file, a prominent header separates new output from previous output. Other minor improvments to debug output include a clearer list of filespec arguments, for help in diagnosing possible user errors on the command line.

CMP 5.12, 2 July 2001

This is a minor release. It adds one feature: when comparing multiple files, CMP now tells you at the end how many pairs were the same, how many pairs were different, and how many sought files were missing.

One line of the help message was 82 characters long; now it isn't.

CMP 5.11, 5 June 2001

This is a very minor release, for one new feature: CMP now tells you not only how many blocks of lines were different between the two files, but also how many actual lines were different.

CMP 5.1, 8 April 2001

new feature:

bugs fixed:

CMP 5.0, 4 March 2001

Release 5.0 was a complete rewrite of the program. This section lists only highlights of the changes, but in case you're interested the complete revision history is available as a separate document. It includes an important Transitional Note for users who are upgrading from CMP 4.3 or earlier.

Major new features in release 5.0 included:

Release 5.0 also gave you much more control over reporting with a new /A option to limit the number of difference blocks reported, new settings on the /Q option for super-compressed reports, and more formatting choices with the /F option.

Other enhancements in release 5.0 included a demo, the new /2 option to compare files in date order, and optional debug output to a file. Options and file specs can now be freely mixed on the command line. CMP16 now uses all of DOS memory if needed; and both versions use a faster algorithm.