Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Java Token Parser
Index -> Programming, Java -> Java Submissions
View previous topic Printable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
pyrnight




PostPosted: Sun Mar 02, 2008 10:09 am   Post subject: Java Token Parser

Well its like 3 weeks or so, maybe 4, into class, and I've written this for computer science class. I'm in grade 11. Enjoy Very Happy

PS: Its probably not perfect but its all I need for my PIC compiler project this semester Wink


code:


// The "TokenParse" class.
import java.io.*;

public class TokenParse
{

    public static final String filename = "YOURJAVAFILE.java";

    public static boolean isAlpha (char check)
    {
        // Ascii values... I should have cheated an used >= 'A'...
        // but I realized that after I coded this
        if (((int) check >= 65 && (int) check <= 90) || ((int) check >= 97 && (int) check <= 122))
            return true;
        else
            return false;
    }


    public static boolean isTokenSymb (char check)
    {
        // More to come?
        if (check == ';' || check == '=' || check == '{' || check == '}' || check == '[' || check == ']' || check == '(' || check == ')' || check == '!')
            return true;
        else
            return false;
    }


    public static int findWordEnd (int wcursor, String record)
    {
        // scan as long as a symbol defined in isTokenSymb or a whitespace character is found
        while (isTokenSymb (record.charAt (wcursor)) == false && record.charAt (wcursor) != ' ' && record.charAt (wcursor) != '\t' && record.charAt (wcursor) != '\n')
        {
            wcursor++;

            if (wcursor == record.length ()) // when the cursor is on the last character of the string, quit
                break;
        }
        return wcursor; // returns cursor position of the last character of the word (token)

    }


    public static int findStringEnd (int wcursor, String record)
    {
        int start = wcursor; // so it doesn't pick up its own begining quote

        while (record.charAt (wcursor) != '"' || wcursor == start)
            wcursor++;

        return wcursor + 1;
    }




    public static void main (String[] args) throws IOException
    {
        FileReader fr = new FileReader (filename);
        BufferedReader br = new BufferedReader (fr);

        int recCount = 0;
        String record = null;
        record = new String ();

        while ((record = br.readLine ()) != null)
        {
            recCount++;
            System.out.println (recCount + " (" + record.length () + "): " + record);

            // START TEH TOKENIZING

            if (record.length () > 0) // readLine() is quirky, and will return a zero length string if it read a newline only.
            {

                for (int cursor = 0 ; cursor < record.length () ; cursor++)
                {
                    if (record.charAt (cursor) == ' ' || record.charAt (cursor) == '\t' || record.charAt (cursor) == '\n') // ignore whitespace
                    {
                        // ITS QUIET IN HERE
                    }
                    else if (record.charAt (cursor) == '/' && record.charAt (cursor + 1) == '/') // comments
                    {
                        // output a token from '/' to end of line
                        System.out.println (record.substring (cursor));
                        break;
                    }
                    else if (isAlpha (record.charAt (cursor))) // words
                    {
                        System.out.println (record.substring (cursor, findWordEnd (cursor, record)));
                        cursor = (findWordEnd (cursor, record) - 1);

                    }
                    else if (isTokenSymb (record.charAt (cursor))) // { } [ ] ( ) etc
                    {
                        // To do: isSpTokenSymb ++ -- += -= << >> <<< <= >= != || && ^= *= %=
                        System.out.println (record.charAt (cursor));
                    }
                    else if (record.charAt (cursor) == '"') // strings
                    {
                        System.out.println (record.substring (cursor, findStringEnd (cursor, record)));
                        cursor = (findStringEnd (cursor, record) - 1);
                    }
                    else if (Character.isDigit (record.charAt (cursor)))
                        // To do: findNumbEnd (with L)
                        System.out.println (record.charAt (cursor));
                }
            }

        }
    }
}




Sponsor
Sponsor
Sponsor
sponsor
HeavenAgain




PostPosted: Sun Mar 02, 2008 10:15 am   Post subject: RE:Java Token Parser

so what does this do?
oh and, you do know there is a StringTokenizer class right? and by the looks of it regular expression would definitely help you out
pyrnight




PostPosted: Sun Mar 02, 2008 10:28 am   Post subject: RE:Java Token Parser

I know there is a StringTokenizer class, this was for a cs class though, that would kind of defeat the purpose of learning to code, using a premade class woudln't it?

What this does it seperate it into tokens on each new line

// The "TokenParse" class.
import
java.io.*
;
public
class
TokenParse
{
public
static
final
String
filename
=
"YOURJAVAFILE.java"
;
public
static
boolean
isAlpha
(
char
check
)
{
// Ascii values... I should have cheated an used >= 'A'...
// but I realized that after I coded this
if
(
(
(
int
)
check
>=
65
&&
(
int
)
check
<=
90
)
||
(
(
int
)
check
>=
97
&&
(
int
)
check
<=
122
)
)
OneOffDriveByPoster




PostPosted: Sun Mar 02, 2008 10:39 am   Post subject: Re: RE:Java Token Parser

pyrnight @ Sun Mar 02, 2008 10:28 am wrote:
java.io.*
Probably not one token. The parser may be more helpful if you could print the token types detected.
pyrnight




PostPosted: Sun Mar 02, 2008 4:29 pm   Post subject: RE:Java Token Parser

We're just making a compiler for a simple pic, so thigns like java.io.* or token types don't matter to much right now. and java.io.* is one token I think. Prove me wrong.
OneOffDriveByPoster




PostPosted: Sun Mar 02, 2008 6:29 pm   Post subject: Re: RE:Java Token Parser

pyrnight @ Sun Mar 02, 2008 4:29 pm wrote:
We're just making a compiler for a simple pic, so thigns like java.io.* or token types don't matter to much right now. and java.io.* is one token I think. Prove me wrong.

JLS 3.0 subsection 18.1:
code:
ImportDeclaration:
    import [ static ] Identifier { . Identifier } [ . * ] ;

You are making a Java compiler?
pyrnight




PostPosted: Mon Mar 03, 2008 9:54 pm   Post subject: RE:Java Token Parser

No. As I said before, we're making a compiler for a pic, that program was just to test the tokenization. And ok, you got me on the import thing, but its not relevant to what I need to do, so I'm not going to fix it, but thanks for the heads up.
Display posts from previous:   
   Index -> Programming, Java -> Java Submissions
View previous topic Tell A FriendPrintable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 7 Posts ]
Jump to:   


Style:  
Search: