Computer Science Canada Programming C, C++, Java, PHP, Ruby, Turing, VB   Username:   Password: Wiki   Blog   Search   Turing   Chat Room  Members
[Tutorial] Regular Expressions (Basics)
Author Message
rizzix

Posted: Tue Aug 26, 2003 3:21 pm   Post subject: [Tutorial] Regular Expressions (Basics)

You can describe regex as a language that describes patterns in strings.
for example lets take the string: "Postal Code: A1B 2C3"

In this string a regular pattern is observerd. "Postal Code: " is the title of the string and "A1B 2C3" is the actual postal code. The title is a constant series of characters (including the space), while the actual postal code is a Character|Number|Character|Space|Number|Character|Number
As you can see we have already described the pattern in the stirng, using simple english. But the regex language helps you do that a lot quicker and lets you add in a lot more detail.

This is how the pattern would be described in regex:
"^Postal Code: (\D\d\D\s\d\D\d)\$"

We can also describe the tile just for the fun of it, it would be like this:
"^\p{Upper}\p{Lower}{5}\s\p{Upper}\p{Lower}{3}\D{1}(\D\d\D\s\d\D\d)\$"

The ^ at the begining of the pattern indicates 'the begining of line' in the string we are to match the pattern to. Similarly, the \$ at the end of the pattern indicates 'the end of line' in the string we are to match the pattern to.

\p{Upper} indicates any character from [A-Z]
\p{Lower} indicates any character from [a-z]
\D indicates any non digit character
\d indicates a digit character [0-9]
\s indicates any white-space character
. (a dot) indicates any character

We used just 3 constructs to describe the pattern, but we have also used quantifiers to make it more accurate.

x{n} (where x is construct, n is number) indicates that x should be matched exactly n times. In our case we matched a lowercase character 5 times. In other words: 5 lowercase characters.

x{n,} (where x is construct, n is number) indicates that x should be matched at least n times

x{n,m} (where x is construct, n and m are numbers) indicates that x should be matched at least n but not more than m times.

() (parenthesis) indicates a group. You may group parts of the pattern by enclosing them in parenthesis. After matching the pattern to a string, we can then extract parts of the string that match groups.

There are many more constructs and quantifiers to fine tune your pattern. Do take a look at the documentation for more info.

REGEX in Java.
To use regex in java you will first need to import java.util.regex.*;
 Java: String[] postalCode = {        "Postal Code: A1B 2C3",        "Postal Code: M15 2Z3",        "Postal Code: aaaaaa"    };    String result;    Pattern p = Pattern.compile("^Postal Code: (\D\d\D\s\d\D\d)\$");        for (int i = 0; i < postalCode.length; i++) {        Matcher m = p.matcher(postalCode[i]);        if (m.matches()) {            result = m.group(1);              // if valid postal code print it        } else {            result = "N/A";                      // else print "N/A"        }        System.out.println(result);    }

The group we are to extract is group 1. Groups are named by their depth from left to right, counting from 1. Group 0 is the whole pattern.

For example: "(A)(B(C))"
0: (A)(B(C))
1: (A)
2: (B(C))
3: (C)

Sometimes you may only need to to match a group of characters. For this purpose you create character classes in your pattern.

[] (square brackets) indicates a character class.

For example:
"^[1,2][A,B]\$" will match "1A", "2A", "1B" and "2B".
"^[0-9]a\$" will match "0a", "1a" .. "9a"
"^[a-z]{3}\$" will match "aaa", "bbb" .. "zzz"

It is important to understand the significance of the ^ and \$, it helps make the pattern more specific. For example: "a{3}" will match "aaa", "aaar", "baaa", "baaat", or even "ts dfhaaatsdf sdf ..". If we modify the pattern to "a{3}\$", it will now only match strings like "aaa", "baaa". If we modify the pattern to "^a{3}", it will now only match strings like "aaa", "aaar". To get the pattern to match "aaa" only, we need to modify it to "^a{3}\$".

There is a lot more you can do with regex. But these basics will help you in most cases.

Tony

Posted: Tue Aug 26, 2003 5:57 pm   Post subject: (No subject)

so is this thing only for input validation, or there are some other uses for string description?
Tony's programming blog. DWITE - a programming contest.
octopi

Posted: Tue Aug 26, 2003 9:08 pm   Post subject: (No subject)

In perl, and php regex's (I am pretty sure java got theres from perl, as did php) you can put something in ()'s, this will cause the language to put what is matched inside the ()'s into a varible. Usually \$1 \$2...etc... depending on how many brackets you use....
rizzix

Posted: Wed Aug 27, 2003 10:46 am   Post subject: (No subject)

well technically you an only match patterns with regex. but as octopi said you can group parts of the pattern and extract a part of a string (like substring on steroids) that matches that group.

for example: "Postal Code: A1B 2C3", imagine this string in a web page or something that you have downloaded. now you that webpage has a lot more than a postal code in it. but you only want that postal code. this is very difficult to do with just the basic string manipulation methods. if you use regex you can easily extract the postal code without much difficulty. All you need to do is create a regex like this: "Postal Code: (\D\d\D\s\d\D\d)"

notice i grouped the postal code part, cuz i want to extract that part.

to extract that part i do this
 code: String pcode = m.group(1);

now i got the postal code only, from all that gibberish.

of course this is just an example there are a lot of better more useful things you can do with regex. all you need to know is how to create patterns and extract data from matching strings, by grouping the part you want.
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

Page 1 of 1  [ 4 Posts ]
 Jump to:  Select a forum  CompSci.ca ------------ - Network News - General Discussion     General Forums   -----------------   - Hello World   - Featured Poll   - Contests     Contest Forums   -----------------   - DWITE   - [FP] Contest 2006/2008   - [FP] 2005/2006 Archive   - [FP] 2004/2005 Archive   - Off Topic     Lounges   ---------   - User Lounge   - VIP Lounge     Programming -------------- - General Programming     General Programming Forums   --------------------------------   - Functional Programming   - Logical Programming   - C     C   --   - C Help   - C Tutorials   - C Submissions   - C++     C++   ----   - C++ Help   - C++ Tutorials   - C++ Submissions   - Java     Java   -----   - Java Help   - Java Tutorials   - Java Submissions   - Ruby     Ruby   -----   - Ruby Help   - Ruby Tutorials   - Ruby Submissions   - Turing     Turing   --------   - Turing Help   - Turing Tutorials   - Turing Submissions   - PHP     PHP   ----   - PHP Help   - PHP Tutorials   - PHP Submissions   - Python     Python   --------   - Python Help   - Python Tutorials   - Python Submissions   - Visual Basic and Other Basics     VB   ---   - Visual Basic Help   - Visual Basic Tutorials   - Visual Basic Submissions     Education ----------- - Student Life   Graphics and Design ----------------------- - Web Design     Web Design Forums   ---------------------   - (X)HTML Help   - (X)HTML Tutorials   - Flash MX Help   - Flash MX Tutorials   - Graphics     Graphics Forums   ------------------   - Photoshop Tutorials   - The Showroom   - 2D Graphics   - 3D Graphics     Teams ------ - dTeam Public

 Style: Appalachia blueSilver eMJay subAppalachia subBlue subCanvas subEmjay subGrey subSilver subVereor Search: