RCSChecker

RCSChecker is a syntax checker for RCS (CVS) files.

It's main purpose is to find out if

  1. the *,v files in a CVS repository are all syntactically valid and
  2. which newphrase extensions are actually present in that repository.

To this end I reviewed the original rcsfile(5) grammar. Failing to find current grammars for this file format I took a look at JavaCC and was able to come up with a working grammar in a very short time. JavaCC has a very nice and expressive syntax.

The closest match for an existing RCS file format parser was some Perl modules and one for ANTLR (see jrcs.g). Unfortunately, this file is a bit outdated. It does not work with current ANTLR v3.2. An attempt at re-writing this to match v3.2 syntax did work out OK, up to a point where lookahead became a problem.

Using JavaCC was easier thanks to the built-in support for lexical states - with no code blocks required. Having to use code blocks for lexical state management in ANTLR (really?) appears to be a drawback in a comparison with JavaCC.

Download
Grammar and auxiliary files are available here: RCSChecker.jar. The JAR contains the grammer (RCS.jj) and compiled class files (JDK 6). A MANIFEST in included in the JAR as well so that one can run the JAR directly.
How to use it?
Make sure some Java Runtime is available.
java -jar RCSChecker.jar < <input file,v>

Of course, it's possible (and expected) to take the grammer and tweak it to suit whatever other uses you need it for. Or subclass the parser class and change things this way.

In case binary data in your CVS files cause problems it may be required to specify a Java file encoding other than the default such as

java -Dfile.encoding=ISO-8859-1 -jar RCSChecker.jar
Examples
A sample CVS file is included in the JAR as "sample,v".
jar xf RCSChecker.jar sample,v
java -jar RCSChecker.jar < sample,v
Output
The program outputs to stdout the newphrase IDs it detects along with the words that follow. If the input file syntax verifies OK there will be no further output, and the return code (%ERRORLEVEL%) will be zero. In case of a syntax error, the JavaCC exception is output and the return code will be 1. Output for the included sample file "sample,v" is:
ID: deltatype
    text
ID: kopt
    kv
ID: permissions
    666
ID: commitid
    1158467fa5ef0686
ID: filename
    sample
Notes

1) I think the original grammar is not perfect. newphrase IDs theoretically may look like revisions (1.2.3.4). This makes parsing a bit complicated when a delta production (starting with a revision) needs to be told apart from a newphrase which happens to start with such "revision". Practically this does not happen, as all newphrase IDs (that I have seen) look like nouns in the style of the other RCS file format tokens (head, strict and so on). So this grammar assumes newphrase IDs start with letters or underscore, but no dots or digits.

2) The JAR file was generated with

mkdir src-generated
javacc -OUTPUT_DIRECTORY:src-generated RCS.jj
mkdir classes
javac -d classes src-generated\*.java
jar cvfm RCSChecker.jar MANIFEST.MF RCS.jj sample,v -C classes .
Copyright

This code is licensed under the Apache License, Version 2.0.

Back to main page. Back to software page.


 
This page was last changed on Julye 29th, 2010. © Matthias Gärtner 2010