Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 [Tutorial] Regular Expressions (Basics)
Index -> Programming, Java -> Java Tutorials
View previous topic Printable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
rizzix




PostPosted: Tue Aug 26, 2003 3:21 pm   Post subject: [Tutorial] Regular Expressions (Basics)

You can describe regex as a language that describes patterns in strings.
for example lets take the string: "Postal Code: A1B 2C3"

In this string a regular pattern is observerd. "Postal Code: " is the title of the string and "A1B 2C3" is the actual postal code. The title is a constant series of characters (including the space), while the actual postal code is a Character|Number|Character|Space|Number|Character|Number
As you can see we have already described the pattern in the stirng, using simple english. But the regex language helps you do that a lot quicker and lets you add in a lot more detail.

This is how the pattern would be described in regex:
"^Postal Code: (\D\d\D\s\d\D\d)$"

We can also describe the tile just for the fun of it, it would be like this:
"^\p{Upper}\p{Lower}{5}\s\p{Upper}\p{Lower}{3}\D{1}(\D\d\D\s\d\D\d)$"

The ^ at the begining of the pattern indicates 'the begining of line' in the string we are to match the pattern to. Similarly, the $ at the end of the pattern indicates 'the end of line' in the string we are to match the pattern to.

\p{Upper} indicates any character from [A-Z]
\p{Lower} indicates any character from [a-z]
\D indicates any non digit character
\d indicates a digit character [0-9]
\s indicates any white-space character
. (a dot) indicates any character

We used just 3 constructs to describe the pattern, but we have also used quantifiers to make it more accurate.

x{n} (where x is construct, n is number) indicates that x should be matched exactly n times. In our case we matched a lowercase character 5 times. In other words: 5 lowercase characters.

x{n,} (where x is construct, n is number) indicates that x should be matched at least n times

x{n,m} (where x is construct, n and m are numbers) indicates that x should be matched at least n but not more than m times.

() (parenthesis) indicates a group. You may group parts of the pattern by enclosing them in parenthesis. After matching the pattern to a string, we can then extract parts of the string that match groups.

There are many more constructs and quantifiers to fine tune your pattern. Do take a look at the documentation for more info.

REGEX in Java.
To use regex in java you will first need to import java.util.regex.*;
Java:

   String[] postalCode = {
       "Postal Code: A1B 2C3",
       "Postal Code: M15 2Z3",
       "Postal Code: aaaaaa"
   };

   String result;
   Pattern p = Pattern.compile("^Postal Code: (\D\d\D\s\d\D\d)$");
   
   for (int i = 0; i < postalCode.length; i++) {
       Matcher m = p.matcher(postalCode[i]);
       if (m.matches()) {
           result = m.group(1);              // if valid postal code print it
       } else {
           result = "N/A";                      // else print "N/A"
       }
       System.out.println(result);
   }
 


The group we are to extract is group 1. Groups are named by their depth from left to right, counting from 1. Group 0 is the whole pattern.

For example: "(A)(B(C))"
0: (A)(B(C))
1: (A)
2: (B(C))
3: (C)

Sometimes you may only need to to match a group of characters. For this purpose you create character classes in your pattern.

[] (square brackets) indicates a character class.

For example:
"^[1,2][A,B]$" will match "1A", "2A", "1B" and "2B".
"^[0-9]a$" will match "0a", "1a" .. "9a"
"^[a-z]{3}$" will match "aaa", "bbb" .. "zzz"


It is important to understand the significance of the ^ and $, it helps make the pattern more specific. For example: "a{3}" will match "aaa", "aaar", "baaa", "baaat", or even "ts dfhaaatsdf sdf ..". If we modify the pattern to "a{3}$", it will now only match strings like "aaa", "baaa". If we modify the pattern to "^a{3}", it will now only match strings like "aaa", "aaar". To get the pattern to match "aaa" only, we need to modify it to "^a{3}$".

There is a lot more you can do with regex. But these basics will help you in most cases.
Sponsor
Sponsor
Sponsor
sponsor
Tony




PostPosted: Tue Aug 26, 2003 5:57 pm   Post subject: (No subject)

so is this thing only for input validation, or there are some other uses for string description?
Latest from compsci.ca/blog: Tony's programming blog. DWITE - a programming contest.
octopi




PostPosted: Tue Aug 26, 2003 9:08 pm   Post subject: (No subject)

In perl, and php regex's (I am pretty sure java got theres from perl, as did php) you can put something in ()'s, this will cause the language to put what is matched inside the ()'s into a varible. Usually $1 $2...etc... depending on how many brackets you use....
rizzix




PostPosted: Wed Aug 27, 2003 10:46 am   Post subject: (No subject)

well technically you an only match patterns with regex. but as octopi said you can group parts of the pattern and extract a part of a string (like substring on steroids) that matches that group.


for example: "Postal Code: A1B 2C3", imagine this string in a web page or something that you have downloaded. now you that webpage has a lot more than a postal code in it. but you only want that postal code. this is very difficult to do with just the basic string manipulation methods. if you use regex you can easily extract the postal code without much difficulty. All you need to do is create a regex like this: "Postal Code: (\D\d\D\s\d\D\d)"

notice i grouped the postal code part, cuz i want to extract that part.

to extract that part i do this
code:
String pcode = m.group(1);

now i got the postal code only, from all that gibberish.


of course this is just an example there are a lot of better more useful things you can do with regex. all you need to know is how to create patterns and extract data from matching strings, by grouping the part you want.
Display posts from previous:   
   Index -> Programming, Java -> Java Tutorials
View previous topic Tell A FriendPrintable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 4 Posts ]
Jump to:   


Style:  
Search: