Computer Science Canada

[Regex-tut] Capturing Groups: Inside the Regex

Author:  wtd [ Sat Nov 13, 2004 10:50 pm ]
Post subject:  [Regex-tut] Capturing Groups: Inside the Regex

Recap

Groups surrounded by parentheses remember what they matched and store it in a global variable.

This is great, but...

What if I want to figure out what a group matched before I'm done matching?

Variables like $1, $2, $3, etc. reflect whatever the expression actually matched. It shouldn't be too surprising that someone figured out a way to do something similar in the regex.

Let's say we want to match a string between two quotes. We want to allow either single or double quotes, but the key is that they have to match.

So, we create an expression to match a single or double quote, and then a bunch of characters.

Note: the "." character means essentially "anything".

code:
/ ['"] ( .* ) /x


But, how do I figure out which of the two quotes was matched? Simple, put parentheses around that too.

code:
/ ( [ ' " ] ) ( .* ) /x


Now, I need to match that same character at the end of the expression. A backslash followed by the number of the matching parens will work.

code:
/ ( [ ' " ] ) ( .* ) \1 /x


And putting it all together:

code:
input = gets.chomp

if input =~ / ( [ ' " ] ) ( .* ) \1 /x
   puts "Match the string: #{$2}."
end


: