34 TEX File Filter: Add Type Information

This Perl script takes an ALLPROSE Noweb form of an Aldor source as standard input and embellishes the documentation part with some information from the code part. Additionally, for types it adds a %def at the end of the code chunk and a showexports command at the beginning of the following line.

In fact, the script distinguishes three kinds of code chunks:

  1. chunks that contain a category, domain, or package definition,
  2. export code chunks, and
  3. all other code chunks.

The script transforms only the first two chunks and their corresponding API documentation. For the first case it expects input of the following form.

\begin{+++}  
line 1  
...  
line n  
\end{+++}  
% empty lines or LaTeX comment lines  
<<some code chunk>>=  
define CategoryName(...): Category == with {  
    <<exports: CategoryName>>  
}  
@

The script adds after \begin{+++} the following information, which is all issued on the same line as the \begin{+++}, i. e., no additional line breaks are added.

\definetype{CategoryName}  
\begin{adsection}{Type Constructor}  
\adtype{CategoryName}  
\end{adsection}

The code chunk looks after transformation as follows.

<<some code chunk>>=  
define CategoryName(...): Category == with {  
    <<exports: CategoryName>>  
}  
@ %def CategoryName  
\showexports{CategoryName}

Domain and package definitions are treated in a similar way (no define has to appear), only that the showexports command will be missing if there appears no <<exports: ...>> in the code chunk.

For the second case, an export code chunk, the input looks as follows.

\begin{+++}  
line 1  
...  
line n  
\end{+++}  
% empty lines or LaTeX comment lines  
<<exports: TypeConstructorName>>  
functionOrConstantName: Signature;  
@

The script adds after \begin{+++} the following information, which is all issued on the same line as the \begin{+++}, i. e., no additional line breaks are added.

\definename{functionOrConstantName: Signature}  
\begin{adsection}{Export of \adtype{TypeConstructorName}}  
\adname{functionOrConstantName: Signature}  
\verb;: Signature;  
\end{adsection}

The code chunk will not be modified.

The program basically reads in the whole file from standard input into the variable LineBuffer, transforms some lines, and finally writes the LineBuffer lines to standard output.

478aglobal variables 298a+   (297 478b 507 539b)  300b  479b
@LineBuffer=();
478b* 21+   462  490
#------------------------------------------------------------------
#---
#--- ALLPROSE
#--- Copyright (C) Ralf Hemmecke (ralf@hemmecke.de)
#--- http://www.hemmecke.de/aldor
#---
#------------------------------------------------------------------

global variables 298a
my($lineNumber, $line);
while (<>) {chomp; push @LineBuffer, $_;}
for($lineNumber = 0; $lineNumber < scalar(@LineBuffer); $lineNumber++) {
    do something for one line 479a
}
for $line (@LineBuffer) {print $line, "\n";}

Now, depending on the line that is currently considered, several things have to be done. First of all, to shorten the following code, the current line is remembered.

479ado something for one line 479a  (478b 539b)  479c
$line = @LineBuffer[$lineNumber];

The script remembers also in which state it currently is. In particular, whether it is inside a +++ environment or inside a code chunk. The line number of the \begin{+++} is remembered in the variable beginDocIndex in order to be able to add information there when it is found in the lines that follow.

479bglobal variables 298a+   (297 478b 507 539b)  478a  480a
my($beginDocIndex)=-1; # -1 means invalid
my($inDocEnvironment)=0;
479cdo something for one line 479a+   (478b 539b)  479a  539c
if ($line =~ /^\\begin{\+\+\+}/) {
    $beginDocIndex=$lineNumber;
    $inDocEnvironment=1;
} elsif ($line =~ /^\\end{\+\+\+}/) {
    $inDocEnvironment=0;
} elsif ($line =~ /^<<.*>>=/) {
    begin code chunk 480b
} elsif ($inCodeChunk) {
    in code chunk 484b
} else { # we are not in a code chunk: if ($inCodeChunk==0) {
    not in code chunk 487
}

Uses code 436.

At the beginning of a code chunk we set the variable inCodeChunk to some appropriate value. It is reset to 0 if a closing @ is detected.

480aglobal variables 298a+   (297 478b 507 539b)  479b  483a
my($inCodeChunk)=0;

Its value is 0 outside a code chunk, 1 inside a non-export code chunk, and 2 inside an export code chunk.

480bbegin code chunk 480b  (479c)  483b
if ($line =~ /^<<exports:.*>>=/) {# exports chunk?
    $inCodeChunk=2; #exports chunk
} else {
    $inCodeChunk=1; #normal code chunk
}

Uses code 436.
Convention 23 By convention there is a special type of code chunk that is written as
<<exports: TYPENAME>>=  
...  
@

where TYPENAME is the name of the type (category, domain, or package) to which these exports belong. Such code chunks are called export code chunks. In ALLPROSE function definitions are looked for in such code chunks. In other code chunks type definitions are tried to extract. By convention there should be at most one type (category, domain, or package) per code chunk, but there can be several per file.

In fact, this Perl script does not check what appears after the colon of <<exports: TYPE>>. Thus, an export code chunk is just detected by the string "exports:". However, the typename is checked for the generation of the argument of the showexports command. It is, therefore, considered good practice to put the corresponding typename into the name of the export code chunk.

An export code chunk is supposed to contain one function or constant definition, i. e., a name followed by a colon followed by its type. Examples are

<<: (TextWriter, %) -> TextWriter;  
1: %;  
add!: (%, %) -> %;

In fact, there can be any code inside an export code chunk, but only for those lines having the special format

functionOrConstantName: Signature;

this script adds a addefinename to the corresponding \begin{+++} line.

Furthermore, at the beginning of a code chunk, the script resets certain global variables to their initial state.

483aglobal variables 298a+   (297 478b 507 539b)  480a  484a
my($functionName)="";
my($functionSignature)="";
my($showexports)="";
my($typeNameDefinedInCodeChunk)=0; #false

Defines:
functionName, used in chunks 483b and 485.
functionSignature, used in chunks 483b and 485.

Uses showexports 375.
483bbegin code chunk 480b+   (479c)  480b
$functionName="";
$functionSignature="";
$showexports="";
$typeNameDefinedInCodeChunk=0; #false

Uses functionName 483a, functionSignature 483a, and showexports 375.

There is, in fact, another variable that is set inside a (non-export) code chunk and used globally. However, this variables should not be reset at the beginning of a code chunk, but rather be valid over a wider region until it is overridden. The variable thisTypeName contains the name of the most recently defined category, domain, or package name.

484aglobal variables 298a+   (297 478b 507 539b)  483a  508b
my($thisTypeName)="";

Defines:
thisTypeName, used in chunks 485–87.

Now, let us consider, what happens if the script treats a line inside a code junk. What should be done within a code chunk depends on its name. There are export code chunks and others.

484bin code chunk 484b  (479c)
if ($inCodeChunk == 2) {
    in exports code chunk 485
} else {
    in non-exports code chunk 486
}

An export code chunk is easy to handle, since it usually contains just one line and the closing @. If there are several lines, only the first line is taken into account. At the end of the export code chunk it is checked whether there is a corresponding +++ environment. If yes, then a addefinename and several other information is added at the end of the corresponding \begin{+++} line.

Note that the line containing the function or constant definition must appear on one line and must be terminated by a semicolon. In fact, only one semicolon per line is allowed.

485in exports code chunk 485  (484b)
if ($line =~ /^([^:]+)\s*:\s*(.*);/) {
    if ($functionName eq ’’) {
        $functionName=$1;
        $functionSignature=$2;
    }
} elsif ($line =~ /^\@/) {#end of code chunk
    if ($functionName ne "" && $beginDocIndex>=0) {
        $LineBuffer[$beginDocIndex] .=
          "\\addefinename{$functionName:$functionSignature}"
          . "\\begin{adsection}{Export of \\adtype{$thisTypeName}}"
          . "\\adname{$functionName:$functionSignature}"
          . "\\verb;: $functionSignature;"
          . ’\end{adsection}’
    }
    $inCodeChunk=0;
    $beginDocIndex=-1;
}

Uses adsection 254, code 436, functionName 483a, functionSignature 483a, and thisTypeName 484a.

In non-exports code chunks we look for type definitions (categories, domains, or packages). If a type name is found it will be put into the variable thisTypeName. In this case, if the end of the code chunk is reached, the following piece of code puts an appropriate addefinetype command right after the opening of the most recent +++ environment (together with a section that just contains the name of the type constructor).

If a code chunk contained a type definition and there is not already a %def keyword after the closing @, then the definition of the name of the type will be added on the line containing the closing @ of the chunk. Furthermore, a showexports statement will be added right after the code chunk, if the code chunk contained a type definition and there was a line that contained something of the form <<exports: TYPE>>. The variable showexports is used to hold data that should be added right after the code chunk.

ToDo 23 Note that extended domains are treated in a special way. See also showexports and generateexports.
486in non-exports code chunk 486  (484b)
if ($line =~ /^define\s+([A-Z]\w*)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=1; #true
} elsif ($line =~ /^([A-Z]\w*)\s*(\(\s*$|:|\(.*\)\s*:)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=1; #true
} elsif ($line =~ /^extend\s+([A-Z]\w*)\s*(\(\s*$|:|\(.*\)\s*:)/) {
    $thisTypeName=$1;
    $typeNameDefinedInCodeChunk=2; #true extend typename
} elsif ($line =~ /<<exports: $thisTypeName>>/) {
    if ($typeNameDefinedInCodeChunk==2) {
        $showexports="\\showexports{extend.$thisTypeName}";
    } else {
        $showexports="\\showexports{$thisTypeName}";
    }
} elsif ($line =~ /^\@(\s*$|\s+%def)/) {#end of code chunk
    if ($typeNameDefinedInCodeChunk) {
        if ($beginDocIndex>=0) {
            # add after \begin{+++}
            $LineBuffer[$beginDocIndex] .=
              "\\addefinetype{$thisTypeName}"
              . ’\begin{adsection}{Type \useterm[Constructor]{constructor}}’
              . "\\adtype{$thisTypeName}"
              . ’\end{adsection}’;
            print STDERR "SETTYPE [$thisTypeName]\n";
        }

        if ($line =~ /^\@\s*$/) {#end of code chunk (no %def)
            $LineBuffer[$lineNumber] = "\@ %def $thisTypeName";
        }
        if ($showexports ne ’’) {
            my($i)=$lineNumber+1;
            $LineBuffer[$i] = $showexports . $LineBuffer[$i];
        }
    }
    $inCodeChunk=0;
    $beginDocIndex=-1;
}

Uses adsection 254, code 436, showexports 375, and thisTypeName 484a.

If we are neither in a code chunk nor in a +++ environment, we must remove the reference to the last begin of a +++ environment.

487not in code chunk 487  (479c)
if ($inDocEnvironment==0) {
    if ( ! ($line =~ /^(\s*(%.*)?|#line \d+ ".*")$/)) {
        $beginDocIndex=-1;
    }
    if ($line =~ /\\addescribetype{(.+)}/) {
        $thisTypeName=$1;
    }
}

Uses thisTypeName 484a.