Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Don't Like Any Programming Languages? Write Your Own!
Index -> General Programming
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
rdrake




PostPosted: Tue Nov 14, 2006 12:31 pm   Post subject: Don't Like Any Programming Languages? Write Your Own!

Too often we hear people complaining about the current languages out there. They whine the languages are too large, they're too small, they're too complicated, or even too simple. Not often do people actually step up and write their own compiler for a language that is entirely their own.

There are three steps in writing your own little compiler:

  1. Scanner
  2. Parser
  3. Code Generator
The scanner uses a series of regular expressions to identify certain lines of the program. These lines may be things like simple commands, or full blown block structures. They could declare variables, constants, or anything else.

The parser does something a little different, it takes the output from the scanner along with a series of grammatical rules on order to generate a tree of the program.

The code generator does as its name implies, generates the code.

We could write all of these tools from scratch, or use pre-existing ones to speed up the process. For these examples, I will be using the tools (f)lex, yacc (bison), and gcc. Flex is an open source implementation of lex, bison is a free software foundation version of yacc, and gcc is... well gcc. Lex/flex and yacc/bison can typically be used interchagably.

Note: If you are on a *nix based system, these tools should already be installed. If not, use your favourite method to fetch them. If on Windows, you're going to have to google up locations of win32 binaries.

Let's move into the interesting stuff now...

The Lexer
Let's make this first bit simple. Say we wanted to write a compiler that turns a virtual light switch on and off. We would just need a very simple lexer file, dubbed "switch.l".

code:
%{
#include <stdio.h>
%}

%%
on      printf("Light switch turned on! :-)\n");
off     printf("Light switch turned off. :-(\n");
%%
We compile the above example as follows.
code:
lex switch.l
cc lex.yy.c -o switch -ll
Please note if you get errors from the above compile, chances are the switch "-ll" may need to be changed to "-lfl". If you still get complaints, change "lex" to "flex".

You may be wondering why we care? Well, this will be addressed in the next section.

Yacc
The example above is all find and dandy for controlling just one light switch, but we want to control many. Let's create a lexer file which allows for slightly more complicated statements.
code:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+                  return NUMBER;
light                   return TOKLIGHT;
on|off                  return STATE;
target                  return TOKTARGET;
switch                  return TOKSWITCH;
\n                      /* ignore end of line */;
[ \t]+                  /* ignore whitespace */;
%%
Seems simple enough. All the above example does is match patterns using regular expressions then return the type of the matched pattern.

Here's where the magic comes in, our switch.y file.
code:
%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str)
{
        fprintf(stderr,"error: %s\n",str);
}

int yywrap()
{
        return 1;
}

main()
{
        yyparse();
}

%}

%token NUMBER TOKLIGHT STATE TOKTARGET TOKSWITCH

%%

commands: /* empty */
        | commands command
        ;


command:
        light_switch
        |
        switch_set
        ;

light_switch:
        TOKSWITCH STATE
        {
                printf("\tSwitch turned on or off\n");
        }
        ;

switch_set:
        TOKTARGET TOKSWITCH NUMBER
        {
                printf("\tChanged switch.\n");
        }
        ;
Ok, bit more complicated this time. However, the file is simpler than it may seem. The top portion before the "%%" defines the C includes, as well as a few functions. The portion after the "%%" defines the grammatical rules. This is where the magic really comes into play.

You see, all the scanner does is scan the text for patterns. It then reports these findings to the parser. The magic comes in when the parser takes the results from the scanner and acts accordingly.

Enough of this, let's compile the above example.
code:
lex switch.l
yacc switch.y
cc lex.yy.c y.tab.c -o switch
Finally, let's run our example.
code:
$ ./switch
target switch 10
        Changed switch.
switch on
        Switch turned on or off
switch off
        Switch turned on or off
target light 10
error: syntax error
Pretty simple, eh? Now you may be wondering what that error is at the bottom. Well, take a look at our lexer and yacc files and tell me why. If you want, edit the yacc file to allow for that declaration.

Well, I think that's just about it for now. We've built a nice, simple language for turing our light switches on and off. Our work here is done.
Sponsor
Sponsor
Sponsor
sponsor
Tony




PostPosted: Tue Nov 14, 2006 1:42 pm   Post subject: Re: Don't Like Any Programming Languages? Write Your Own!

rdrake wrote:
We've built a nice, simple language for turing our light switches on and off.

Excellent.

This is pretty much the approach that will be taken towards the openTuring project.
Latest from compsci.ca/blog: Tony's programming blog. DWITE - a programming contest.
Display posts from previous:   
   Index -> General Programming
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 2 Posts ]
Jump to:   


Style:  
Search: