26.5 Split on Opening and Closing Regular Expression

The function splitBalancedParentheses can be used as follows.

while (<>) {  
    chomp;  
    ($before, $oparen, $text, $cparen, $after) =  
        splitBalancedParentheses($_, ’\(’, ’\)’);  
    print "[$before] [$oparen] [$text] [$cparen] [$after]\n";  
}

It basically looks in the given string for an opening and closing regular expression and returns the text before, the text that matched the opening regular expression, the text inbetween, the text that matched the closing regular expression and the text that is left over. The text inbetween is treated recursively. Informally speaking, the function splitBalancedParentheses split at the top-level LISP expression.

Via the call

splitBalancedParentheses($str, ’"’, ’"’);

it is even possible to extract text if the opening and closing regular expressions are identical. Of course the function will not recurse in such a case.

The function aborts with an error message if the parentheses are not balanced.

Note that regular expressions are allowed that can match several characters. It is also possible that the input string contains newlines between the parentheses.

The function splitBalancedParentheses works as follows. It looks for the opening regular expression. If such an expression cannot be found, the function indicates it by leaving the first four return parameters empty and gives back the first input parameter as the fifth entry. If the opening regular expression has been found then it saves the text that come before and the matching string for the opening regular expression. It then tries to find the matching closing regular expression by means of an auxiliary function findClosingParenthesis. It is a fatal error if no closing regular expression can be found.

317split on opening and closing regular expression 317  (326)  318
sub splitBalancedParentheses {
    my ($s, $openparen, $closingparen)=@_;
    my ($before, $oparen, $text, $cparen) = (’’, ’’, ’’, ’’);
    my ($after) = $s;
    if ($s =~ /($openparen)/) {
        $before = $‘;
        $oparen = $1;
        ($text, $cparen, $after) =
          findClosingParenthesis($’, $openparen, $closingparen);
    }
    return ($before, $oparen, $text, $cparen, $after);
}

Defines:
splitBalancedParentheses, used in chunk 304.

Uses findClosingParenthesis 318.
318split on opening and closing regular expression 317+   (326)  317
sub findClosingParenthesis {
    my ($s, $openparen, $closingparen)=@_;
    my ($before, $open, $paren, $text, $close, $after);
    my ($closingParenFound)=0;
    my ($t, $p);
    $paren = ’’;
    $text = ’’;
    while ($s =~ /($openparen|$closingparen)/) {
        $text .= $‘; # the part before the match
        $s = $’; # the part after the match
        $paren = $1;
        if ($paren =~ /$closingparen/) {$closingParenFound=1; last;}
        ($t, $p, $s) =
          findClosingParenthesis($s, $openparen, $closingparen);
        $text .= "$paren$t$p";
    }
    if ($closingParenFound==0) {die "Closing parenthesis not found";}
    return ($text, $paren, $s);
}

Defines:
findClosingParenthesis, used in chunk 317.