RaccによるC言語パーサ

2009-01-22

C言語のソースをパースしたいと思ってたときがあったな、と思い出した。 RubyでRaccを使って、K&Rに載ってるBNFを食わせてみたが、コンフリクトが発生する。

$ racc c.y
1 shift/reduce conflicts
2 reduce/reduce conflicts

本によれば「唯一ぶつかりがあるのは、if-elseのあいまいさである。」と書いてあるんだけど。「-v」オプションをつけて詳細を表示するとぶつかってるのは

state 2

7) declaration : declaration_specifiers _ init_declarator_list_opt ";"
13) declaration_specifiers_opt : declaration_specifiers _

"*" shift, and go to state 36
"*" [reduce using rule 13 (declaration_specifiers_opt)]
";" reduce using rule 41 (init_declarator_list_opt)
"(" [reduce using rule 66 (pointer_opt)]
SYMBOL [reduce using rule 66 (pointer_opt)]
$default reduce using rule 13 (declaration_specifiers_opt)

pointer_opt go to state 33
init_declarator_list_opt go to state 34
init_declarator_list go to state 35
init_declarator go to state 37
pointer go to state 38
declarator go to state 39

なかなか解消できなくて辛かったんだけど、 function_definitiondeclaration_specifiers から opt をはずすとコンフリクトが解消された。 これだと関数の戻り値の型宣言を省略できなくなるけど、別にいいよね。

で構文木生成のアクションを記入。 Ruby/RaccだとC/Yaccと比べてメモリの解放に悩まなくていいし型の異なる値も入れ放題なので全然楽だわ~。

Raccでアクションを省略すると自動的に1番目の値が返ることになってるけど、右辺がひとつだけだったら val[0] を、複数だったら val 全体を返すようにすれば自動的に構文木が生成できて楽かなーと思ったのでRaccを改造しようとしたんだけどうまくいかず…。

BNFとか見て初めてわかったんだけど、 TYPEDEFstorage_class_specifier になってるので extern とかを書けるところすべてに typedef が書けちゃうんだね。 なんか不思議な…。 あと declarator は配列の宣言とかも含むので、 function_definition

void test[] { }

とかわけのわからんものも有効になってしまう。 なんだかなぁ。

以下ソース:

  • function_definitiondeclaration_specifiers から opt を除去
  • 演算子に優先を振って expr にまとめた
#!/usr/bin/ruby

# Cパーサ
class CParser
prechigh
left '*' '/' '%'
left '+' '-'
left '<<' '>>'
left '<' '>' '<=' '>='
left '==' '!='
left '&'
left '^'
left '|'
left '&&'
left '||'

nonassoc ELSE
nonassoc LOWER
preclow
options no_result_var
rule
target
: translation_unit

translation_unit
: external_declaration { val }
| translation_unit external_declaration { val[0].push(val[1]) }

external_declaration
: function_definition
| declaration

function_definition
# こっちだとconflictが起こる
# : declaration_specifiers_opt declarator declaration_list_opt compound_statement
: declaration_specifiers declarator declaration_list_opt compound_statement { DefunNode.new(val[0], val[1], val[2], val[3]) }

declaration
: declaration_specifiers init_declarator_list_opt ';' { DeclNode.new(val[0], val[1]) }

declaration_list_opt : {nil} | declaration_list
declaration_list
: declaration { val }
| declaration_list declaration { val[0].push(val[1]) }

declaration_specifiers_opt : {nil} | declaration_specifiers
declaration_specifiers
: storage_class_specifier declaration_specifiers_opt { val[1] ? val : val[0] }
| type_specifier declaration_specifiers_opt { val[1] ? val : val[0] }
| type_qualifier declaration_specifiers_opt { val[1] ? val : val[0] }

storage_class_specifier
: AUTO | REGISTER | STATIC | EXTERN | TYPEDEF

type_specifier
: VOID | CHAR | SHORT | INT | LONG | FLOAT | DOUBLE | SIGNED | UNSIGNED
| struct_or_union_specifier
| enum_specifier
# | typedef_name

type_qualifier
: CONST | VOLATILE

struct_or_union_specifier
: struct_or_union identifier_opt '{' struct_declaration_list '}' { val }
| struct_or_union identifier { val }

struct_or_union
: STRUCT | UNION

struct_declaration_list
: struct_declaration { val }
| struct_declaration_list struct_declaration { val[0].push(val[1]) }

init_declarator_list_opt : {nil} | init_declarator_list
init_declarator_list
: init_declarator { val }
| init_declarator_list ',' init_declarator { val[0].push(val[2]) }

init_declarator
: declarator { InitDecl.new(val[0], nil) }
| declarator '=' initializer { InitDecl.new(val[0], val[2]) }

struct_declaration
: specifier_qualifier_list struct_declarator_list ';' { val }

specifier_qualifier_list_opt : {nil} | specifier_qualifier_list
specifier_qualifier_list
: type_specifier specifier_qualifier_list_opt { val }
| type_qualifier specifier_qualifier_list_opt { val }

struct_declarator_list
: struct_declarator
| struct_declarator_list ',' struct_declarator { val }

struct_declarator
: declarator
| declarator_opt ':' constant_expression { val }

enum_specifier
: ENUM identifier_opt '{' enumerator_list '}' { val }
| ENUM identifier { val }

declarator_opt : {nil} | declarator
declarator
: pointer_opt direct_declarator { DirectDecl.new(val[1], val[0]) }

direct_declarator
: identifier
| '(' declarator ')' { val }
| direct_declarator '[' constant_expression_opt ']' { val }
| direct_declarator '(' parameter_type_list ')' { val }
| direct_declarator '(' identifier_list_opt ')' { val }

pointer_opt : {nil} | pointer
pointer
: '*' type_qualifier_list_opt { val[1] ? val : val[0] }
| '*' type_qualifier_list_opt pointer { val[1] ? [[val[0], val[1]], val[2]] : [val[0], val[2]] }

type_qualifier_list_opt : {nil} | type_qualifier_list
type_qualifier_list
: type_qualifier
| type_qualifier_list type_qualifier { val }

parameter_type_list_opt : {nil} | parameter_type_list
parameter_type_list
: parameter_list
| parameter_list ',' '...' { val[0].push('...') }

parameter_list
: parameter_declaration { val }
| parameter_list ',' parameter_declaration { val[0].push(val[2]) }

parameter_declaration
: declaration_specifiers declarator { val }
| declaration_specifiers abstract_declarator_opt { val }

identifier_list_opt : {nil} | identifier_list
identifier_list
: identifier { val }
| identifier_list ',' identifier { val[0].push(val[2]) }

initializer
: assignment_expression
| '{' initializer_list_opt '}' { val[1] }
| '{' initializer_list ',' '}' { val[1] }

initializer_list_opt : {nil} | initializer_list
initializer_list
: initializer { val }
| initializer_list ',' initializer { val[0].push(val[2]) }

type_name
: specifier_qualifier_list abstract_declarator_opt { val }

abstract_declarator_opt : {nil} | abstract_declarator
abstract_declarator
: pointer
| pointer_opt direct_abstract_declarator { val }

direct_abstract_declarator_opt : {nil} | direct_abstract_declarator
direct_abstract_declarator
: '(' abstract_declarator ')'
| direct_abstract_declarator_opt '[' constant_expression_opt ']' { val }
| direct_abstract_declarator '(' parameter_type_list_opt ')' { val }

# typedef_name
# : identifier

statement
: labeled_statement
| expression_statement
| compound_statement
| selection_statement
| iteration_statement
| jump_statement

labeled_statement
: identifier ':' statement { LabelNode.new(val[0], val[2]) }
| CASE constant_expression ':' statement { CaseNode.new(val[1], val[3]) }
| DEFAULT ':' statement { DefaultNode.new(val[2]) }

expression_statement
: expression_opt ';'

compound_statement
: '{' declaration_list_opt statement_list_opt '}' { [val[1], val[2]] }

statement_list_opt : {nil} | statement_list
statement_list
: statement { val }
| statement_list statement { val[0].push(val[1]) }

selection_statement
: IF '(' expression ')' statement = LOWER { IfNode.new(val[2], val[4]) }
| IF '(' expression ')' statement ELSE statement { IfNode.new(val[2], val[4], val[6]) }
| SWITCH '(' expression ')' statement { SwitchNode.new(val[2], val[4]) }

iteration_statement
: WHILE '(' expression ')' statement { WhileNode.new(val[2], val[4]) }
| DO statement WHILE '(' expression ')' ';' { DoWhileNode.new(val[1], val[4]) }
| FOR '(' expression_opt ';' expression_opt ';' expression_opt ')' statement { ForNode.new(val[2], val[4], val[6], val[8])}

jump_statement
: GOTO identifier ';' { GotoNode.new(val[1]) }
| CONTINUE ';' { ContinueNode.new() }
| BREAK ';' { BreakNode.new() }
| RETURN expression_opt ';' { ReturnNode.new(val[1]) }

expression_opt : {nil} | expression
expression
: assignment_expression
| expression ',' assignment_expression { OpNode.new(val[1], [val[0], val[2]]) }

assignment_expression
: conditional_expression
| unary_expression assignment_operator assignment_expression { OpNode.new(val[1], [val[0], val[2]]) }

assignment_operator : '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='

conditional_expression
: expr
| expr '?' expression ':' conditional_expression { OpNode.new(val[1], [val[0], val[2], val[4]]) }

constant_expression_opt : {nil} | constant_expression
constant_expression
: conditional_expression

expr
: cast_expression
| expr '||' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '&&' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '|' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '^' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '&' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '==' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '!=' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '<' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '>' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '<=' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '>=' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '<<' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '>>' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '+' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '-' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '*' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '/' expr { OpNode.new(val[1], [val[0], val[2]]) }
| expr '%' expr { OpNode.new(val[1], [val[0], val[2]]) }

cast_expression
: unary_expression
| '(' type_name ')' cast_expression { val }

unary_expression
: postfix_expression
| '++' unary_expression { OpNode.new(:PREINC, val[1]) }
| '--' unary_expression { OpNode.new(:PREDEC, val[1]) }
| unary_operator cast_expression { OpNode.new(val[0], val[1]) }
| 'sizeof' unary_expression { OpNode.new(val[0], val[1]) }
| 'sizeof' '(' type_name ')' { OpNode.new(val[0], val[2]) }

unary_operator
: '&' { :ADDRESS }
| '~'
| '!'
| '*' { :ASTER }
| '+' { :UPLUS }
| '-' { :UMINUS }

postfix_expression
: primary_expression
| postfix_expression '[' expression ']' { val }
| postfix_expression '(' argument_expression_list_opt ')' { FuncallNode.new(val[0], val[2]) }
| postfix_expression '.' identifier { OpNode.new(val[1], [val[0], val[2]]) }
| postfix_expression '->' identifier { OpNode.new(val[1], [val[0], val[2]]) }
| postfix_expression '++' { OpNode.new(:POSTINC, val[0]) }
| postfix_expression '--' { OpNode.new(:POSTDEC, val[0]) }

primary_expression
: identifier
| constant
| string
| '(' expression ')' { val[1] }

argument_expression_list_opt : {[]} | argument_expression_list
argument_expression_list
: assignment_expression { val }
| argument_expression_list ',' assignment_expression { val[0].push(val[2]) }

constant
: integer_constant
| character_constant
| floating_constant
| enumeration_constant

identifier_opt : {nil} | identifier
identifier : SYMBOL { val[0].intern }
integer_constant : INTEGER_LITERAL
floating_constant : FLOAT_LITERAL
character_constant : CHAR_LITERAL
string : STRING_LITERAL
enumeration_constant : ENUM_CONST
end

---- header
DeclNode = Struct.new(:type, :vars)
DefunNode = Struct.new(:type, :name, :params, :stmts)
InitDecl = Struct.new(:var, :val)
DirectDecl = Struct.new(:name, :modify)

OpNode = Struct.new(:op, :oprand)
FuncallNode = Struct.new(:func, :args)
ReturnNode = Struct.new(:val)
class BreakNode; end
GotoNode = Struct.new(:label)
class ContinueNode; end
IfNode = Struct.new(:exp, :then, :else)
SwitchNode = Struct.new(:exp, :stmt)
ForNode = Struct.new(:beg, :cond, :end, :stmt)
WhileNode = Struct.new(:cond, :stmt)
DoWhileNode = Struct.new(:stmt, :cond)
LabelNode = Struct.new(:name, :stmt)
CaseNode = Struct.new(:val, :stmt)
DefaultNode = Struct.new(:stmt)

Reserved = %w(
void int short char long float double signed unsigned
if else switch while for do return goto continue break case default
struct union typedef enum sizeof
auto register extern static const volatile
)
Operator = %w(
* / % + - << >> < > <= >= == != & ^ | && ||
= += -= *= /= %= <<= >>= &= |= ^=
. -> ... ,
)
---- inner

def initialize
@@reserved = {}
Reserved.each {|w| @@reserved[w.intern()] = w.upcase.intern}

@@op_regex = /\A#{Regexp.union(Operator.sort_by {|x| -x.length})}/
end

def parse_file(f)
@f = f
@line_no = 0
@line = ""
do_parse
end

def next_token
loop do
@line = fetch_line(@line)
unless @line
return [false, :END_OF_FILE]
end

token = case @line
when /\A\s+/
nil
when %r!\A//.*\Z!
nil
when %r!\A/\*!
@line = block_comment($')
next
when /\A\d+\.\d+/ # てきとー
[:FLOAT_LITERAL, $&.to_f]
when /\A\d+/
[:INTEGER_LITERAL, $&.to_i]
when /\A"(.*?)"/ # てきとー
[:STRING_LITERAL, $1]
when /\A'(.*?)'/ # てきとー
[:CHAR_LITERAL, $1]
when /\A\w[\w\d_]*/
w = $&
s = w.intern
u = @@reserved[s]
u ? [u, u] : [:SYMBOL, w]
when @@op_regex
s = $&
[s, s]
when /\A./
s = $&
[s, s]
end

@line = $'
return token if token
end
end

def on_error(t, val, vstack)
raise ParseError, "\nunexpected token #{val.inspect}"
end

private
def fetch_line(line)
if line.empty?
line = @f.gets
@line_no += 1 if line
end
line
end

def block_comment(line)
until line =~ %r!\*/!
line = fetch_line(line)
end
$'
end

---- footer

if $0 == __FILE__
parser = CParser.new
p parser.parse_file(ARGF)
end

実行例:

$ racc c.y
$ echo 'int test() { return printf("%d\\n", 1 + 2 * 3); }' | ruby c.tab.rb
[#<struct DefunNode type=:INT, name=#<struct DirectDecl name=[:test, "(", nil, "
)"], modify=nil>, params=nil, stmts=[nil, [#<struct ReturnNode val=#<struct Func
allNode func=:printf, args=["%d\\n", #<struct OpNode op="+", oprand=[1, #<struct
OpNode op="*", oprand=[2, 3]>]>]>>]]>]