SML# Document Version 4.0.0
30 A parser generator smlyacc and smllex

30.2 The structure of a smlyacc input file

An input file of smlyacc has the following general structure.

SML# code for user declarations
%%
YACC declarations for grammar rule interpretation
%%
descriptions of grammar rules and their attributes

  1. 1.

    The SML# code for user declarations section specifies any user level code which will be used in attribute specifications in the rule section.

  2. 2.

    The YACC declarations for grammar rule interpretation section include meta-level declarations for grammar rules such as non-terminal associations, and directives for smlyacc. The meta-level declarations for grammar rules are the same as the standard YACC system. The reader is referred to src/ml-yacc/doc/mlyacc.pdf for the details.

    When using the generated parser source in SML#, the following directives should be specified.

        %name <Name>
        %header (structure <Name>)
        %eop EOF SEMICOLON
        %pos int
        %term EOF
           | CHAR of char
            ...
        %nonterm id of Symbol.symbol
           | longid of Symbol.longsymbol
            ...
    
    • %namespecification. It specifies the name of the parser. The name <Name>_TOKENS will be used is the name of the generated toke structure.

    • %headerspecification. smlyacc will generate the parser program as a structure body in the following format.

          = struct
             ...
          end
      

      %header() specifies the code fragment that will be In order to use the generated parser in the SML# separate compilation mode, it needs to include the preamble of the structure declaration as above.

    • %posspecificaion. It specifies the type of the text position used in smllex.

    • %eopspecificaion. It defines the set of terminal symbols that end the parsing.

    • %termspecificaion. The set of terminal symbol names is defined as a form of datatype constructor. For each terminal symbol name, smlyacc generates a token forming function of the forms EOF : pos * pos -> token (without argument) or CHAR : char * pos * pos -> token (with arguments). The generated token functions are put into the <Name>_TOKENS structure and is shared by smllex.

    • %nontermspecificaion. The set of non-terminal symbols are defined as a form of constructor having the type of its attribute. The attribute type is the type of expression associated to the grammar rule of that non terminal symbol.

  3. 3.

    The descriptions of grammar rules and their attributes section defines the sets of terminal and non-terminal symbols, and the set of grammar rules and their associated action rules. The syntax for the rules as the same as the standard YACC system. The reader is referred to src/ml-yacc/doc/mlyacc.pdf for the details.