Computer Science Canada

[Regex-tut] Non-Greedy Matches

Author:  wtd [ Sun Nov 14, 2004 12:03 am ]
Post subject:  [Regex-tut] Non-Greedy Matches

Recap

In the previous installments, dealing with negative sets, I used:

code:
/ ( [ ' " ] ) ( [^ \1 ]* ) \1 /x


To match the contents of a quoted string, where those contents didn't include the quote character itself. I used the above rather than:

code:
/ ( [ ' " ] ) ( .* ) \1 /x


Because:

code:
.*


Matches any chatacter zero or more times, and would have matched the quote as well.

Why was that necessary?

After all, I had specified that the string to be matched should end with a quote. The match should have been complete when it found a quote to match the first one.

But it didn't, and it didn't stop because * is "greedy" (+ also behaves this way). If there's another quote, it'll fly right past the one where we would have expected it to stop.

A quick fix

To fix this, we either have to specify that the string being matched cannot contain a matching quote, which we did, quite successfully, but at the cost of making the regex more complex, or, we simply change the behavior of the * and make it "non-greedy".

Following the * (or +) with a question mark will do the trick. In keeping with regex tradition, it's very short.

code:
/ ( [ ' " ] ) ( .*? ) \1 /x


: