Java Token Parser
Author |
Message |
pyrnight
|
Posted: Sun Mar 02, 2008 10:09 am Post subject: Java Token Parser |
|
|
Well its like 3 weeks or so, maybe 4, into class, and I've written this for computer science class. I'm in grade 11. Enjoy
PS: Its probably not perfect but its all I need for my PIC compiler project this semester
code: |
// The "TokenParse" class.
import java.io.*;
public class TokenParse
{
public static final String filename = "YOURJAVAFILE.java";
public static boolean isAlpha (char check)
{
// Ascii values... I should have cheated an used >= 'A'...
// but I realized that after I coded this
if (((int) check >= 65 && (int) check <= 90) || ((int) check >= 97 && (int) check <= 122))
return true;
else
return false;
}
public static boolean isTokenSymb (char check)
{
// More to come?
if (check == ';' || check == '=' || check == '{' || check == '}' || check == '[' || check == ']' || check == '(' || check == ')' || check == '!')
return true;
else
return false;
}
public static int findWordEnd (int wcursor, String record)
{
// scan as long as a symbol defined in isTokenSymb or a whitespace character is found
while (isTokenSymb (record.charAt (wcursor)) == false && record.charAt (wcursor) != ' ' && record.charAt (wcursor) != '\t' && record.charAt (wcursor) != '\n')
{
wcursor++;
if (wcursor == record.length ()) // when the cursor is on the last character of the string, quit
break;
}
return wcursor; // returns cursor position of the last character of the word (token)
}
public static int findStringEnd (int wcursor, String record)
{
int start = wcursor; // so it doesn't pick up its own begining quote
while (record.charAt (wcursor) != '"' || wcursor == start)
wcursor++;
return wcursor + 1;
}
public static void main (String[] args) throws IOException
{
FileReader fr = new FileReader (filename);
BufferedReader br = new BufferedReader (fr);
int recCount = 0;
String record = null;
record = new String ();
while ((record = br.readLine ()) != null)
{
recCount++;
System.out.println (recCount + " (" + record.length () + "): " + record);
// START TEH TOKENIZING
if (record.length () > 0) // readLine() is quirky, and will return a zero length string if it read a newline only.
{
for (int cursor = 0 ; cursor < record.length () ; cursor++)
{
if (record.charAt (cursor) == ' ' || record.charAt (cursor) == '\t' || record.charAt (cursor) == '\n') // ignore whitespace
{
// ITS QUIET IN HERE
}
else if (record.charAt (cursor) == '/' && record.charAt (cursor + 1) == '/') // comments
{
// output a token from '/' to end of line
System.out.println (record.substring (cursor));
break;
}
else if (isAlpha (record.charAt (cursor))) // words
{
System.out.println (record.substring (cursor, findWordEnd (cursor, record)));
cursor = (findWordEnd (cursor, record) - 1);
}
else if (isTokenSymb (record.charAt (cursor))) // { } [ ] ( ) etc
{
// To do: isSpTokenSymb ++ -- += -= << >> <<< <= >= != || && ^= *= %=
System.out.println (record.charAt (cursor));
}
else if (record.charAt (cursor) == '"') // strings
{
System.out.println (record.substring (cursor, findStringEnd (cursor, record)));
cursor = (findStringEnd (cursor, record) - 1);
}
else if (Character.isDigit (record.charAt (cursor)))
// To do: findNumbEnd (with L)
System.out.println (record.charAt (cursor));
}
}
}
}
}
|
|
|
|
|
|
|
Sponsor Sponsor
|
|
|
HeavenAgain
|
Posted: Sun Mar 02, 2008 10:15 am Post subject: RE:Java Token Parser |
|
|
so what does this do?
oh and, you do know there is a StringTokenizer class right? and by the looks of it regular expression would definitely help you out |
|
|
|
|
|
pyrnight
|
Posted: Sun Mar 02, 2008 10:28 am Post subject: RE:Java Token Parser |
|
|
I know there is a StringTokenizer class, this was for a cs class though, that would kind of defeat the purpose of learning to code, using a premade class woudln't it?
What this does it seperate it into tokens on each new line
// The "TokenParse" class.
import
java.io.*
;
public
class
TokenParse
{
public
static
final
String
filename
=
"YOURJAVAFILE.java"
;
public
static
boolean
isAlpha
(
char
check
)
{
// Ascii values... I should have cheated an used >= 'A'...
// but I realized that after I coded this
if
(
(
(
int
)
check
>=
65
&&
(
int
)
check
<=
90
)
||
(
(
int
)
check
>=
97
&&
(
int
)
check
<=
122
)
) |
|
|
|
|
|
OneOffDriveByPoster
|
Posted: Sun Mar 02, 2008 10:39 am Post subject: Re: RE:Java Token Parser |
|
|
pyrnight @ Sun Mar 02, 2008 10:28 am wrote: java.io.* Probably not one token. The parser may be more helpful if you could print the token types detected. |
|
|
|
|
|
pyrnight
|
Posted: Sun Mar 02, 2008 4:29 pm Post subject: RE:Java Token Parser |
|
|
We're just making a compiler for a simple pic, so thigns like java.io.* or token types don't matter to much right now. and java.io.* is one token I think. Prove me wrong. |
|
|
|
|
|
OneOffDriveByPoster
|
Posted: Sun Mar 02, 2008 6:29 pm Post subject: Re: RE:Java Token Parser |
|
|
pyrnight @ Sun Mar 02, 2008 4:29 pm wrote: We're just making a compiler for a simple pic, so thigns like java.io.* or token types don't matter to much right now. and java.io.* is one token I think. Prove me wrong.
JLS 3.0 subsection 18.1:
code: | ImportDeclaration:
import [ static ] Identifier { . Identifier } [ . * ] ; |
You are making a Java compiler? |
|
|
|
|
|
pyrnight
|
Posted: Mon Mar 03, 2008 9:54 pm Post subject: RE:Java Token Parser |
|
|
No. As I said before, we're making a compiler for a pic, that program was just to test the tokenization. And ok, you got me on the import thing, but its not relevant to what I need to do, so I'm not going to fix it, but thanks for the heads up. |
|
|
|
|
|
|
|