Regular Expressions Overdrive
Issue: 1.1 (August/September 2002)
Author: Didier Barbas
Author Bio: Didier has been a dilettante programmer and linguist for more than 20 years. Unusual for a Frenchman, he speaks 11 languages, including Korean and PowerPC machine-language; he manages the Korean branch of a Dutch company that doesn't do banking, chemicals, or consumer products. Go figure!
Article Description: Advanced regular expressions.
Article Length (in bytes): 18,156
Starting Page Number: 46
Article Number: 1016
1016.zip Updated: 2013-03-11 19:07:55
Related Link(s): None
Excerpt of article text...
This article assumes that you already have covered the basics of regular expressions (RegExes), and at least read Matt Neuburg's article on page ## of this issue. We will focus here on techniques that will make your coding (and your life) easier. These techniques are answers to real-life problems, some of my own, and some to questions asked on the REALbasic discussion lists. I will also show that regular expressions are not always the right tool -- some require extra help or are just not fit for the task.
Just don't bother.
A discussion we had some time ago on one of the REALbasic discussion lists was on how to suppress extra spaces in a text. The pattern that will come up immediately to most people is
[\t ]+, to be replaced with a single space. In the discussion, it was argued that the correct pattern should be
[\t ][\t ]+, since RB's RegEx engine should start matching only when there are at least two tabs or spaces. It was, however, noted that the speed difference on average-sized texts was quite negligible, at least from the stand-point of a human being (applied to this article, which has few double spaces,
[\t ][\t ]+is six times faster than
[\t ]+). On the other hand, all this discussion, while fascinating, was quite academic since a) we're talking microseconds or milliseconds, not seconds, and b) another fellow had come up with an example using replaceAll, which was very much faster. I tweaked it a little bit further and made it even faster by changing inStr to inStrB, and by adding a line of code to first remove odd-numbers of spaces:
...End of Excerpt. Please purchase the magazine to read the full article.