Posted: Tue Feb 03, 2009 9:51 pm Post subject: Pygments; need a regex to replace \b
Hello,
I'm coding up a Pygments lexer for OpenCOBOL and require a solid replacement for the word boundary \b regex operator. COBOL includes hyphen along with underscore as part of valid identifiers.
I'm feeling lazy today, and don't feel like experimenting or thinking.
code:
(?<=(\w|[-])) # start of word?
(?=(\w|[-])) # end of word?
These aren't right. I need something that doesn't consume any characters but detects transitions ala \b.
Any gurus in the crowd?
Cheers
Sponsor Sponsor
OneOffDriveByPoster
Posted: Tue Feb 03, 2009 10:55 pm Post subject: Re: Pygments; need a regex to replace \b
I think it may be a bad idea, but is it possible to modify the LOCALE so that it is like the C locale but with '-' in the character class you want?
btiffin
Posted: Thu Feb 12, 2009 10:43 pm Post subject: Re: Pygments; need a regex to replace \b
Update: Being lazy just didn't work out. Ignoring the problem eventually got to me.
(and thanks for the one option OneOffDriveByPoster)
Pygments lexers usually use
code:
r'\b(keyword1|keyword2|otherword)\s*\b'
to scan for special highlight words. I ended up with
Posted: Sat Aug 25, 2012 9:25 am Post subject: Re: Pygments; need a regex to replace \b
I finally got round to making a pull request for Pygments that adds the COBOL highlighting. Georg was helpful. Turns out adding dash to a word boundary is as easy as
code:
r"\b(?!-)"
Yayy for smart people that see trees AND forests. No need for the potentially (and needlessly) CPU sucking ?<= back scanning operator.