grammer of Racc


updated for v0.9.1

This is temporary grammer.


global structure

blocks in file

There's two block on toplevel. one is 'class' block, another is 'user code' block. 'user code' block MUST places after 'rule' block.

comment

You can insert comment about all places. Two style comment can be used, Ruby style (#.....) and C style (/*......*/) .

class block

class block is like this:

    class (class name)
      (rule block)
      [(precedance block)]
      [(start statement)]
      [(token convertionblock)]
    end

(class name) is a name of parser class. This have to be name of Ruby class.

rule block

'rule block' discripts grammer which is able to be understood by parser. For example:

    rule
      (token): (token) (token) (token).... (action) ;

      (token): (token) (token) (token).... (action)
             | (token) (token) (token).... (action)
             | (token) (token) (token).... (action)
             ;
    end

This resembles to yacc, but semicolon MUST NOT be ommited.

(action) is an action which is executed when its (token)s are found. (action) is like this:

        { print val[0]
          puts val[1] }

In (action), you cannot use '%' string, here document, '%r' regexp, '=begin' comment.

You can omit (action). Then, '' (void string) is used as action.

When you return from action, you MUST use "return( result )" or not return. Because (action) is embedded like this:

        result = val[0]       # added by racc
        print val[0]
        puts val[1]
        return result         # added by racc

Then, here's a sample of whole 'rule block'.

rule
  goal: definition ruls source { result = val } ;  # don't forget semicolon

  definition: /* none */   { result = [] }
    | definition startdesig  { result[0] = val[1] }  # can continue by '|'
    | definition
             precrule   # this line continue from upper line
      {
        result[1] = val[1]
      } ;

  startdesig: START TOKEN ;

end

You can use these spetial local variables in (action).

result ($$ in yacc)
value of left-hand side (lhs). "result = val[0]" is always executed before doing action.
val ($1,$2,$3... in yacc)
an array of value of right-hand side (rhs).
tok
an array of token sinbol of right-hand side. Default value of this element is decided as:
naked token string in racc file (TOK, XFILE, this_is_token, ...)
simbol of it (:TOK, :XFILE, :this_is_token, ...)
quoted string (':', '.', '(', ...)
same string (':', '.', '(', ...)

You can change this default by using 'token block'.

vstack ($0,$-1,$-2... in yacc)
a stack of values
sstack
a stack of token simbol
__state__
a stack of LALR status. DON'T TOUCH this stack !!

And at last, You can use spetial token simbol '$end'.
If this token is placed where token input finish, the parser will finish parsing. In Ruby script, '$end' is 'false'.

Operator precedance

This function is equal to '%prec' in yacc. To designate this block:

    prechigh
      nonassoc '++'
      left     '*' '/'
      left     '+' '-'
      right    '='
    preclow

'right' is '%right', 'left' is '%left'. While this example is written 'prechigh' upper and 'preclow' lower, but another format 'preclow...prechigh' is also premitted.

'%prec' can be used. format is like this:

  prechigh
    nonassoc UMINUS
    left '*' '/'
    left '+' '-'
  preclow

  rule
    exp: exp '*' exp
       | exp '-' exp
       | '-' exp       = UMINUS   # this!!!
           :

start statement

'%start' in yacc.

      start real_target

this statement won't be used forever, I think.

Convert Token Simbol

token simbol is, as default,

naked token string in racc file (TOK, XFILE, this_is_token, ...)
simbol of it (:TOK, :XFILE, :this_is_token, ...)
quoted string (':', '.', '(', ...)
same string (':', '.', '(', ...)

You can change this by 'token' block. This is example:

    token
      PLUS 'PlusClass'      # not use :PLUS but PlusClass
      MIN  'MinusClass'     # not use :MIN but MinusClass
    end

Almost all ruby value can be used by token simbol, but only 'false' and 'nil' are NOT. These are causes unexpected parse error.

If you want to use String as token simbol, spetial care is needed. For example:

    token
      class '"cls"'            # in code, "cls"
      PLUS '"plus\n"'          # in code, "plus\n"
      MIN  "\"minus#{val}\""   # in code, \"minus#{val}\"
    end
These are not BUG, but FEATURE. Don't bug report to me... :-)

User Code

'user block' is Ruby source code which is copied to output. In racc.rb, Three spetial user code 'driver' 'prepare' 'inner' are used.

format of user code is like this:

---- name_of_user_code
  ruby statement
  ruby statement
  ruby statement

---- name_of_other_user_code
  ruby statement
     :

If more 4 '-' exist on line head, racc think it is begin of user code. A name of user code must one word.

You can include other file as user code like this:

---- driver = file_name other_file and_other_file ....

print "these code is used as driver, too!!"

This statement makes racc to use 'file_name' 'other_file' 'and_other_file' as 'driver' user code.

---- driver = init.rb err.rb run.rb

print "this line is added, too\n"

Copyright(c) 1998-1999 Minero Aoki.