
-----------------------------------
wtd
Sun Nov 14, 2004 12:03 am

[Regex-tut] Non-Greedy Matches
-----------------------------------
Recap

In the previous installments, dealing with negative sets, I used:

/ ( [ ' " ] ) ( [^ \1 ]* ) \1 /x

To match the contents of a quoted string, where those contents didn't include the quote character itself.  I used the above rather than:

/ ( [ ' " ] ) ( .* ) \1 /x

Because:

.*

Matches any chatacter zero or more times, and would have matched the quote as well.

Why was that necessary?

After all, I had specified that the string to be matched should end with a quote.  The match should have been complete when it found a quote to match the first one.

But it didn't, and it didn't stop because * is "greedy" (+ also behaves this way).  If there's another quote, it'll fly right past the one where we would have expected it to stop.

A quick fix

To fix this, we either have to specify that the string being matched cannot contain a matching quote, which we did, quite successfully, but at the cost of making the regex more complex, or, we simply change the behavior of the * and make it "non-greedy". 

Following the * (or +) with a question mark will do the trick.  In keeping with regex tradition, it's very short.

/ ( [ ' " ] ) ( .*? ) \1 /x
