ANTLR allows us to add actions to the grammar file to store or retrieve information, generate output, and make semantic checks. These actions can be added within the grammar rules or at the top-level. All actions are enclosed in a double-</double-> pair, and may enclose any legal C code (or code from whatever the base language ANTLR is generating).
A special top-level action, called the header is included in all the C
sources files generated by ANTLR. It is useful for inserting file inclusions,
external declarations of variables and function prototypes, and type, struct,
and macro definitions in multiple files. All other top-level functions are
placed in the parser source file only, so variable declarations (i.e. space
allocations), and other non-shared code (such as the definition of
main()
may be placed in non-header top-level actions.
#header
followed by an action to be inserted into every source file generated by ANTLR.
Let's look at a typical header:
#header
<<
#include "charptr.h"
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#define DEBUG 1
#define SINGLE 1
#define PLURAL 2
#define NONARRAY 3
#define CALL 4
#define sym struct _sym
sym {
char *text;
int type;
int class;
int base;
int size;
};
extern symtab[];
#define ZZCOL
>>
This section of code starts by including the file "charptr.h". This
file includes definitions for pre-defined code which allows character pointers
to be used for handling lexeme texts. The next three lines include some of the
standard header files for dealing with output, type definitions, etc. that may
be used in later actions. After these, there is a definition for DEBUG which I
use to turn on some debugging output in the rest of the actions. Then, there are
some definitions which are used in the symbol table, the definition of the
structure of the symbol table elements, and an extern definition of the symbol
table itself. Notice that we don't actually allocate (i.e. create) the symbol
table here, because we want to ensure that it is not allocated as a separate
entity in each source file. Finally, ZZCOL
is defined. This tells
DLG to track line and column information as it scans the input character stream.
main()
which calls the ANTLR
macro may be included in a top-level action
to define a complete program.
Some PCCTS startup code may have to be included in a top-level action to initialize the token text storage system.
The length of any given action is limited, so you may have to split a long action into two or more shorter ones. Because all actions are directly inserted into the source, a single long action can be split into two consecutive actions with no effect on functionality. You will have to do this if the action is too long.
Here is an example of another top-level action:
<<
sym symtab[513];
sym *symptr = &(symtab[0]);
int indent = 0; /* current output indent */
int offset = 0; /* stack offset for var */
int reg = -1; /* next register to use */
char buf[513]; /* Used by rval_array and fact to pass a string */
#include "charptr.c"
main(argc, argv)
int argc;
char **argv;
{
if (argc != 1) {
error("no command-line args; use redirection for file I/O");
}
ANTLR (prog(), stdin);
return(0);
}
warn(fmt, a, b, c)
char *fmt;
int a, b, c;
{
fprintf(stderr, "line %d: ", zzline);
fprintf(stderr, fmt, a, b, c);
fprintf(stderr, "\n");
}
error(fmt, a, b, c)
char *fmt;
int a, b, c;
{
warn(fmt, a, b, c);
exit(1);
}
>>
This action declares the actual symbol table, and some other global
variables. It also includes the file charptr.c
which contains
startup code for the lexeme text system. Following this is the
main()
function for a compiler, which starts the parsing process by
calling ANTLR with the name of the starting rule and the character stream that
the lexer should read input from. After main()
, the definitions for
a few support functions are given. These can be called within other actions as
needed.
prog : << printf("#include \"nempl.h\"\n\n"); >>
decls
funcs
;
Here, when "prog" is executed, it starts by outputting a line to
stdout which will include the header file "nempl.h" in the output code for the
compiler. After this, it calls the parsing functions decls()
, and
funcs()
.
Actions can also be placed after or between the parts of a description:
stc :
K_INT <<printf ( "SINGLE\n" );>>
| K_PLURAL <<printf ( "PLURAL\n" );>>
;
This rule searches for either a K_INT token or a K_PLURAL token. If
it finds a K_INT, it outputs the string "SINGLE\n" to stdout. If it finds a
K_PLURAL, it outputs the string "PLURAL\n" to stdout. We'll see more examples
later.
Here is an example of how locals can be declared and used in a grammar rule:
paragraph: <<
{
int count = 0;
>>
( sentence
<<
count++;
>>
)+
<<
printf ( "%d sentences found\n", count );
}
>>
;
In this rule, the first action contains a left brace,
{. This is used to open a new scope in the output C code. The
closing action has the corresponding right brace, }, which is
used to close this scope. We do this because the C definition allows new local
variables to be defined whenever a new scope is opened (this is why you can have
local variables in a function). The first action opens a new scope, declares the
local variable count, and initializes it to 0. Each time the subrule
containing "sentence" is executed (which happens only while the next token is in
the first set of the non-terminal sentence), the sentence "count" will be
incremented. In the final action, we print out the number of sentences found in
the paragraph, then close the scope opened in the first action.
We can modify the rule to indicate that arguments are expected when the rule is called in other rules. Suppose we have a rule for handling declarations that must put all global definitions in a global symbol table, and all local definitions in a local symbol table. We might define this rule as follows:
decls[int scope]:
<<
{
int type;
>>
( FLOAT << type = 0; >>
| INT << type = 1; >>
)
NAME
<<
if ( $scope == GLOBAL )
enter ( $2, type, gtable);
else
enter ( $2, type, ltable);
}
>>
;
This rule takes the int "scope" as an argument, and uses it to
choose which table to use. The dollar sign before "scope" in the final action
indicates that "scope" is an argument to the rule, and has not been declared in
an action. Note, that we have also declared the local variable "type" here. If
we had not, the braces would not have been necessary.
Actually, they are not necessary at all because ANTLR automatically opens a new scope at the beginning of each rule, and closes it at the end. Thus, this particular pair of braces is redundant. Also note that there is no dollar sign in front of "type" when it is referenced in the actions. This is because it is declared within an action, rather than being an argument or return value for an ANTLR rule.
In a rule which calls "decls", the notation looks like this:
decls[n]
where n is a number or variable whose value will be
passed to the function decls()
. For example, our rule for "prog"
could be changed to read as:
prog : << printf("#include \"nempl.h\"\n\n"); >>
decls[0];
funcs
;
Here, GLOBAL would be defined as the value 0 so that top-level
declaration in the source code will be placed in the global symbol table by
decls()
.
Multiple values can be passed and accepted by rules by using a comma-separated list or values and argument declarations, respectively.
stc > [int class] :
K_INT <<$class = SINGLE;>>
| K_PLURAL <<$class = PLURAL;>>
;
Here, we look for the tokens K_INT and K_PLURAL, and set
class equal to a value which indicates which token we found. The
dollar sign before class in the action indicates that the variable is
a return value for the rule, and not defined in an action. The value is returned
to the calling function when the rule completes. Multiple values can be returned
to the caller by separating the values with commas.
In a rule which calls "stc", the notation looks like this:
stc > [v]
where v is a variable whose value will be set when the
function stc()
returns. For example, we might see:
decls : <<
{
register int type;
>>
stc > [type]
<<
printf ( "type is %d\n", type );
}
>>
;
Note that here "type" was a return value from "stc", but it is not
an argument or return value of "decls", thus, we need to allocate space for it
with a local declaration, and access it without using a dollar sign.
We have introduced several concepts since first talking about the layout of a grammar rule, so let's see how they all fit together. The general format for a rule is:
name [type1 arg1, ..., typeN argN] > [type1 rval1, ..., typeM rvalM]:
alternate 1
| ...
| alternate X
;
There are other things (such as error actions), but I'm not going
to go into them here.
intdecl: INT
VARNAME
;
We would like to add an action to this rule to store the newly
declared variable in the symbol table, but how do we get the name of the
variable? The answer is dollar attributes. The elements of a rule are numbered
starting at one. Actions do not count as elements, and subrules count as one
element.
Let's look at an example to solidify this:
arule: << printf ( "hello\n" ); >>
WORD /* This is $1 */
NUM /* This is $2 */
( alt1 | alt2 )* /* This is $3 */
<< printf ( "First WORD: %s\n", $1 );
printf ( "First NUM: %s\n", $2 );
>>
WORD /* This is $4 */
NUM /* This is $5 */
<< printf ( "Second WORD: %s\n", $4 );
printf ( "Second NUM: %s\n", $5 );
>>
;
This should be sufficient to collect and move data to wherever you
need it to generate the proper output. If you need some text from "alt1" or
"alt2" to generate output from this rule, you can separate the ( alt1 |
alt2 )
into a new rule, then declare a local character buffer in this
rule, and pass its address into the new rule, which copies the text you need
into the array.