Special

Introducing the “Welcome to Xojo” Bundle!

New to Xojo and looking for guidance? We've put together a terrific bundle to welcome you! Xojo Bundle

This bundle includes six back issues of the magazine -- all of year 21 in printed book and digital formats -- plus a one-year subscription (beginning with 22.1) so you'll be learning all about Xojo for the next year. It's the perfect way to get started programming with Xojo. And you save as much as $35 over the non-bundle price!

This offer is only available for a limited time as supplies are limited, so hurry today and order this special bundle before the offer goes away!

Article Preview


Buy Now

Issue 12.4 ('Game Center')
Instant purchase and download via GumRoad!

COLUMN

Lookie Here, Lookie There

Using Regular Expression Lookarounds

Issue: 12.4 (July/August 2014)
Author: Kem Tekinay
Author Bio: Kem Tekinay is a Macintosh consultant and programmer who started with Xojo when it was still REALbasic. He is the author of RegExRX (http://www.mactechnologies.com/index.php:i?page=downloads#regexrx), the popular regular expression editor for Mac and Windows.
Article Description: No description available.
Article Length (in bytes): 10,377
Starting Page Number: 79
Article Number: 12415
Related Web Link(s):

http://www.mactechnologies.com/index.php

Excerpt of article text...

The concept behind regular expression is actually pretty simple, even if the language itself can be a bit dense. A series of tokens represent one or more characters in your text, and if those tokens match something, you get a result that includes everything that was matched. Easy, right?

If that's all there was to it, it also would be easy to use and easy to explain (well, easier, at least), but limited in usefulness. See, there are times when the same text will or won't match depending on what's around it. For example, suppose you wanted to match cat, but only if it was directly after the word female? Using subgroups (covered last time) can help, but there is another way: Lookarounds.

Pointing The Way

Lookarounds let the regex engine examine surrounding text without including it in the match, but to understand them, you first have to know what's going on internally.

When you create a pattern, you're telling the engine to use each token to examine your text one character at a time. If there is no match, it moves on to the next character, but if there is a match, it takes note and advances an internal pointer. For every subsequent character that matches your pattern, the pointer is advanced again and again until it either runs out of tokens, meaning the complete match has been found, or the match fails. In the latter case, it backtracks the pointer as far as it can (as defined by your pattern) and tries again. At each step, that pointer is advanced or rewound so it can keep track of the start of the match and all the text that should be included.

Imagine you were doing this manually. You would open your text in a word processor and position your cursor at the beginning of the document. If the first character doesn't fit your criteria, you'd press the right arrow key to advance the cursor until you got to a character that does fit. You'd note that starting position somewhere, then press the right arrow again, examining each character in turn. Eventually you would find the text you were looking for, or you'd start pressing the left arrow until you got back to a point where you could start again. The regex engine is doing essentially the same thing, keeping track of its internal pointer and start position.

...End of Excerpt. Please purchase the magazine to read the full article.