Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Pygments; need a regex to replace \b
Index -> Programming, Python -> Python Help
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
btiffin




PostPosted: Tue Feb 03, 2009 9:51 pm   Post subject: Pygments; need a regex to replace \b

Hello,

I'm coding up a Pygments lexer for OpenCOBOL and require a solid replacement for the word boundary \b regex operator. COBOL includes hyphen along with underscore as part of valid identifiers.

I'm feeling lazy today, and don't feel like experimenting or thinking.
code:
(?<=(\w|[-]))   # start of word?
(?=(\w|[-]))   # end of word?
These aren't right. I need something that doesn't consume any characters but detects transitions ala \b.

Any gurus in the crowd?
Cheers
Sponsor
Sponsor
Sponsor
sponsor
OneOffDriveByPoster




PostPosted: Tue Feb 03, 2009 10:55 pm   Post subject: Re: Pygments; need a regex to replace \b

I think it may be a bad idea, but is it possible to modify the LOCALE so that it is like the C locale but with '-' in the character class you want?
btiffin




PostPosted: Thu Feb 12, 2009 10:43 pm   Post subject: Re: Pygments; need a regex to replace \b

Update: Being lazy just didn't work out. Ignoring the problem eventually got to me.

(and thanks for the one option OneOffDriveByPoster)

Pygments lexers usually use
code:
r'\b(keyword1|keyword2|otherword)\s*\b'

to scan for special highlight words. I ended up with
code:
r'(^|(?<=[^0-9a-z_\-]))(keyword1|keyword2|otherword)\s*($|(?=[^0-9a-z_\-]))'

for a version of word boundary zero length (and trailing space chewing) match that includes hypen.
re.IGNORECASE | re.MULTILINE in effect.

I'm more than open to criticism or belittling betterments; as Python regex is not really my thing.

And to see the fruits of the labour, overtly promoting OpenCOBOL again; be warned Wink
http://opencobol.add1tocobol.com/#what-is-ocdoc

Cheers
Edit; added the opencobol FAQ link
btiffin




PostPosted: Sat Aug 25, 2012 9:25 am   Post subject: Re: Pygments; need a regex to replace \b

I finally got round to making a pull request for Pygments that adds the COBOL highlighting. Georg was helpful. Turns out adding dash to a word boundary is as easy as

code:

r"\b(?!-)"


Yayy for smart people that see trees AND forests. No need for the potentially (and needlessly) CPU sucking ?<= back scanning operator.

Cheers
Display posts from previous:   
   Index -> Programming, Python -> Python Help
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 4 Posts ]
Jump to:   


Style:  
Search: