Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 Moose -- LR Parser Generator
Index -> Programming, C++ -> C++ Submissions
View previous topic Printable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message

PostPosted: Sun Mar 18, 2012 5:58 pm   Post subject: Moose -- LR Parser Generator

Moose -- LR Parser Generator in C++


Over the March break I had a lot of free time (and a copy of the dragon book). Therefore; I wrote a parser generator. It's called Moose, a play on words based on two other parser generators, bison and ANTLR (and has nothing to do with its capabilities).
It is capable of recognizing grammars of class LR(0). That might be upgraded to LR(1) or maybe GLR, as I make my way through the dragon book. It's currently very (VERY) minimalist, as you'll soon see.
Also, I don't understand how SDTs work yet so the generated parser might be a bit different from other parser generators. Basically [input].parse.cpp parses the input into an AST which, if successful, can be passed to [input].process.cpp, which walks through the tree (guaranteed to be deterministic by the LR parser) using recursive descent. This both allows you to use LR(0) grammars without having to first convert them to LL(k), and gives you the simplicity of parsing with recursion. This is just a way/workaround I came up with because I can't understand anything generated by DParser or other parser generators.
Offtopic: Random unimportant history lesson: I started writing a parser generator in Turing on January 2nd (according to my header comment), but that completely utterly failed. What happened was, whenever I encountered a need for a feature that Turing didn't have, I scratched up a quick-fix using the shortest possible code so I could get back to parser-generator-writing. This ended up in everything being in a wrong data structure (sets being stored as unsorted arrays, bimaps being stored as like 4 separate arrays, stacks being flexible arrays that were new'd a lot, queue's being arrays with index hacks, you get the idea). I gave up after almost half finishing, but always meant to finish it. Fast forward to March break Tuesday morning, I remembered it and decided it would be a good idea to restart, but this time, in C++. (spoiler: SUCCESS)

The Grammar Definition

The grammar definition is the main input to the parser generator. Much like almost any other parser generator in existence, it uses a modified form of EBNF for its grammar definition. Moose is not scannerless, and terminals are defined in the grammar itself. Before jumping into details, notice the following example grammar describing ways of transliterating Muammar Gaddafi's name into English:
        Grammar for English spellings of Muammar Gaddafi's name

name : first last | first middle last;
first : $M $OU $APPOSA $MM $AR;
middle : $AL $DASH;
last : $Q $A $D $A $F $I;

$M : "M";
$OU : "(o|u|ou)";
$APPOSA : "(\'?a?)";
$MM : "mm?";
$AR : "ar";
$AL : "([aAeE]l)?";
$DASH : "[ -]?";
$Q : "Q|G|Kh?";
$A : "a";
$D : "(d(d|h|hdh)?)|ht|zz";
$F : "ff?";
$I : "[iy]";

Note the following:
    C-style comments. (//-style comments are NOT VALID)
    The start symbol is automatically the left-hand-side of the first production in the file.
    Productions start with <symbol> : and end with ;
    Productions can use boolean OR with the vertical bar |
    Terminals start with $
    Terminals are defined as (Perl-styled) regular expressions.
    The regular expressions need to be escaped (see $APPOSA : "(\'?a?)";)
That's pretty much all you need to know. Fancy helpers like (), *, +, and ? are not provided, not only because you can write an equivalent grammar not requiring those, but also because it complicates things in the AST (and totally not because I CBA).
(There is, however, 1 other useful feature not shown above. You'll see it in the next example.)
(Be careful when writing your grammars -- the parser generator doesn't yet tell you your grammar's errors and thus may enter an infinite loop or crash if you mess up.)


The command line arguments are dead-simple. Here they are:
moose <input_file> [<prefix>]

    input_file = your grammar file. (it usually ends in .g)
    prefix = the prefix for all parser classes. (Default is "yy" if you don't specify anything. Suppose you wanted to write a Java parser; you might set this to "j" or something. Up to you.)

Example -- Calculator

Here is an example grammar you can input into the generator to get a math expression parser:
        Basic expression grammar
        By Tony Zou

        :       Term
        |       Expression $ADD Term
        |       Expression $SUB Term

        :       Factor
        |       Term $MULT Factor
        |       Term $DIV Factor

        :       $OPENBRACKET Expression $CLOSEBRACKET
        |       $NUMBER

$ADD            :   "\\+"                  ;
$SUB            :   "\\-"                  ;
$MULT         :  "\\*"               ;
$DIV            :   "\\/"                  ;
$OPENBRACKET    :   "\\("                  ;
$CLOSEBRACKET   :  "\\)"               ;
$NUMBER   :        "[0-9]+"                ;

%autowhitespace :        "[ \\r\\n\\t]*" ;

The %autowhitespace is a regex that the lexer optionally checks after each token and ignores. I wish I had more time to write a full tutorial, but March break is over. For now, enjoy!

parser generator output.png
 Filesize:  109.65 KB
 Viewed:  473 Time(s)

parser generator output.png
Moose download

 Filesize:  8.95 KB
 Downloaded:  296 Time(s)

Display posts from previous:   
   Index -> Programming, C++ -> C++ Submissions
View previous topic Tell A FriendPrintable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 1 Posts ]
Jump to: