34 TEX File Filter: Add Type Information

This Perl script takes an ALLPROSE Noweb form of an Aldor source as standard input and embellishes the documentation part with some information from the code part. Additionally, for types it adds a %def at the end of the code chunk and a showexports command at the beginning of the following line.

In fact, the script distinguishes three kinds of code chunks:

  1. chunks that contain a category, domain, or package definition,
  2. export code chunks, and
  3. all other code chunks.

The script transforms only the first two chunks and their corresponding API documentation. For the first case it expects input of the following form.

\begin{+++}  
line 1  
...  
line n  
\end{+++}  
% empty lines or LaTeX comment lines  
<<some code chunk>>=  
define CategoryName(...): Category == with {  
    <<exports: CategoryName>>  
}  
@

The script adds after \begin{+++} the following information, which is all issued on the same line as the \begin{+++}, i. e., no additional line breaks are added.

\definetype{CategoryName}  
\begin{adsection}{Type Constructor}  
\adtype{CategoryName}  
\end{adsection}

The code chunk looks after transformation as follows.

<<some code chunk>>=  
define CategoryName(...): Category == with {  
    <<exports: CategoryName>>  
}  
@ %def CategoryName  
\showexports{CategoryName}

Domain and package definitions are treated in a similar way (no define has to appear), only that the showexports command will be missing if there appears no <<exports: ...>> in the code chunk.

For the second case, an export code chunk, the input looks as follows.

\begin{+++}  
line 1  
...  
line n  
\end{+++}  
% empty lines or LaTeX comment lines  
<<exports: TypeConstructorName>>  
functionOrConstantName: Signature;  
@

The script adds after \begin{+++} the following information, which is all issued on the same line as the \begin{+++}, i. e., no additional line breaks are added.

\definename{functionOrConstantName: Signature}  
\begin{adsection}{Export of \adtype{TypeConstructorName}}  
\adname{functionOrConstantName: Signature}  
\verb;: Signature;  
\end{adsection}

The code chunk will not be modified.

The program basically reads in the whole file from standard input into the variable LineBuffer, transforms some lines, and finally writes the LineBuffer lines to standard output.

474aglobal variables 294a+   (293 474b 503 536b)  296b  475b
@LineBuffer=();
474b* 21+   458  486
#------------------------------------------------------------------
#---
#--- ALLPROSE
#--- Copyright (C) Ralf Hemmecke (ralf@hemmecke.de)
#--- http://www.hemmecke.de/aldor
#---
#------------------------------------------------------------------

global variables 294a
my($lineNumber, $line);
while (<>) {chomp; push @LineBuffer, $_;}
for($lineNumber = 0; $lineNumber < scalar(@LineBuffer); $lineNumber++) {
    do something for one line 475a
}
for $line (@LineBuffer) {print $line, "\n";}

Now, depending on the line that is currently considered, several things have to be done. First of all, to shorten the following code, the current line is remembered.

475ado something for one line 475a  (474b 536b)  475c
$line = @LineBuffer[$lineNumber];

The script remembers also in which state it currently is. In particular, whether it is inside a +++ environment or inside a code chunk. The line number of the \begin{+++} is remembered in the variable beginDocIndex in order to be able to add information there when it is found in the lines that follow.

475bglobal variables 294a+   (293 474b 503 536b)  474a  476a
my($beginDocIndex)=-1; # -1 means invalid
my($inDocEnvironment)=0;
475cdo something for one line 475a+   (474b 536b)  475a  536c
if ($line =~ /^\\begin{\+\+\+}/) {
    $beginDocIndex=$lineNumber;
    $inDocEnvironment=1;
} elsif ($line =~ /^\\end{\+\+\+}/) {
    $inDocEnvironment=0;
} elsif ($line =~ /^<<.*>>=/) {
    begin code chunk 476b
} elsif ($inCodeChunk) {
    in code chunk 480b
} else { # we are not in a code chunk: if ($inCodeChunk==0) {
    not in code chunk 483
}

Uses code 432.

At the beginning of a code chunk we set the variable inCodeChunk to some appropriate value. It is reset to 0 if a closing @ is detected.

476aglobal variables 294a+   (293 474b 503 536b)  475b  479a
my($inCodeChunk)=0;

Its value is 0 outside a code chunk, 1 inside a non-export code chunk, and 2 inside an export code chunk.

476bbegin code chunk 476b  (475c)  479b
if ($line =~ /^<<exports:.*>>=/) {# exports chunk?
    $inCodeChunk=2; #exports chunk
} else {
    $inCodeChunk=1; #normal code chunk
}

Uses code 432.
Convention 23 By convention there is a special type of code chunk that is written as
<<exports: TYPENAME>>=  
...  
@

where TYPENAME is the name of the type (category, domain, or package) to which these exports belong. Such code chunks are called export code chunks. In ALLPROSE function definitions are looked for in such code chunks. In other code chunks type definitions are tried to extract. By convention there should be at most one type (category, domain, or package) per code chunk, but there can be several per file.

In fact, this Perl script does not check what appears after the colon of <<exports: TYPE>>. Thus, an export code chunk is just detected by the string "exports:". However, the typename is checked for the generation of the argument of the showexports command. It is, therefore, considered good practice to put the corresponding typename into the name of the export code chunk.

An export code chunk is supposed to contain one function or constant definition, i. e., a name followed by a colon followed by its type. Examples are

<<: (TextWriter, %) -> TextWriter;  
1: %;  
add!: (%, %) -> %;

In fact, there can be any code inside an export code chunk, but only for those lines having the special format

functionOrConstantName: Signature;

this script adds a addefinename to the corresponding \begin{+++} line.

Furthermore, at the beginning of a code chunk, the script resets certain global variables to their initial state.

479aglobal variables 294a+   (293 474b 503 536b)  476a  480a
my($functionName)="";
my($functionSignature)="";
my($showexports)="";
my($typeNameDefinedInCodeChunk)=0; #false

Defines:
functionName, used in chunks 479b and 481.
functionSignature, used in chunks 479b and 481.

Uses showexports 370.
479bbegin code chunk 476b+   (475c)  476b
$functionName="";
$functionSignature="";
$showexports="";
$typeNameDefinedInCodeChunk=0; #false

Uses functionName 479a, functionSignature 479a, and showexports 370.

There is, in fact, another variable that is set inside a (non-export) code chunk and used globally. However, this variables should not be reset at the beginning of a code chunk, but rather be valid over a wider region until it is overridden. The variable thisTypeName contains the name of the most recently defined category, domain, or package name.

480aglobal variables 294a+   (293 474b 503 536b)  479a  504b
my($thisTypeName)="";

Defines:
thisTypeName, used in chunks 481–83.

Now, let us consider, what happens if the script treats a line inside a code junk. What should be done within a code chunk depends on its name. There are export code chunks and others.

480bin code chunk 480b  (475c)
if ($inCodeChunk == 2) {
    in exports code chunk 481
} else {
    in non-exports code chunk 482
}

An export code chunk is easy to handle, since it usually contains just one line and the closing @. If there are several lines, only the first line is taken into account. At the end of the export code chunk it is checked whether there is a corresponding +++ environment. If yes, then a addefinename and several other information is added at the end of the corresponding \begin{+++} line.

Note that the line containing the function or constant definition must appear on one line and must be terminated by a semicolon. In fact, only one semicolon per line is allowed.

481in exports code chunk 481  (480b)
if ($line =~ /^([^:]+)\s*:\s*(.*);/) {
    if ($functionName eq ’’) {
        $functionName=$1;
        $functionSignature=$2;
    }
} elsif ($line =~ /^\@/) {#end of code chunk
    if ($functionName ne "" && $beginDocIndex>=0) {
        $LineBuffer[$beginDocIndex] .=
          "\\addefinename{$functionName:$functionSignature}"
          . "\\begin{adsection}{Export of \\adtype{$thisTypeName}}"
          . "\\adname{$functionName:$functionSignature}"
          . "\\verb;: $functionSignature;"
          . ’\end{adsection}’
    }
    $inCodeChunk=0;
    $beginDocIndex=-1;
}

Uses adsection 250, code 432, functionName 479a, functionSignature 479a, and thisTypeName 480a.

In non-exports code chunks we look for type definitions (categories, domains, or packages). If a type name is found it will be put into the variable thisTypeName. In this case, if the end of the code chunk is reached, the following piece of code puts an appropriate addefinetype command right after the opening of the most recent +++ environment (together with a section that just contains the name of the type constructor).

If a code chunk contained a type definition and there is not already a %def keyword after the closing @, then the definition of the name of the type will be added on the line containing the closing @ of the chunk. Furthermore, a showexports statement will be added right after the code chunk, if the code chunk contained a type definition and there was a line that contained something of the form <<exports: TYPE>>. The variable showexports is used to hold data that should be added right after the code chunk.

ToDo 22 Note that extended domains are treated in a special way. See also showexports and generateexports.
482in non-exports code chunk 482  (480b)
if ($line =~ /^define\s+([A-Z]\w*)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=1; #true
} elsif ($line =~ /^([A-Z]\w*)\s*(\(\s*$|:|\(.*\)\s*:)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=1; #true
} elsif ($line =~ /^extend\s+([A-Z]\w*)\s*(\(\s*$|:|\(.*\)\s*:)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=2; #true extend typename
} elsif ($line =~ /<<exports: $thisTypeName>>/) {
    if ($typeNameDefinedInCodeChunk==2) {
        $showexports="\\showexports{extend.$thisTypeName}";
    } else {
        $showexports="\\showexports{$thisTypeName}";
    }
} elsif ($line =~ /^\@(\s*$|\s+%def)/) {#end of code chunk
    if ($typeNameDefinedInCodeChunk) {
        if ($beginDocIndex>=0) {
            # add after \begin{+++}
            $LineBuffer[$beginDocIndex] .=
              "\\addefinetype{$thisTypeName}"
              . ’\begin{adsection}{Type \useterm[Constructor]{constructor}}’
              . "\\adtype{$thisTypeName}"
              . ’\end{adsection}’;
            print STDERR "SETTYPE [$thisTypeName]\n";
        }

        if ($line =~ /^\@\s*$/) {#end of code chunk (no %def)
            $LineBuffer[$lineNumber] = "\@ %def $thisTypeName";
        }
        if ($showexports ne ’’) {
            my($i)=$lineNumber+1;
            $LineBuffer[$i] = $showexports . $LineBuffer[$i];
        }
    }
    $inCodeChunk=0;
    $beginDocIndex=-1;
}

Uses adsection 250, code 432, showexports 370, and thisTypeName 480a.

If we are neither in a code chunk nor in a +++ environment, we must remove the reference to the last begin of a +++ environment.

483not in code chunk 483  (475c)
if ($inDocEnvironment==0) {
    if ( ! ($line =~ /^(\s*(%.*)?|#line \d+ ".*")$/)) {
        $beginDocIndex=-1;
    }
    if ($line =~ /\\addescribetype{(.+)}/) {
        $thisTypeName=$1;
    }
}

Uses thisTypeName 480a.