Special

Introducing the “Welcome to Xojo” Bundle!

New to Xojo and looking for guidance? We've put together a terrific bundle to welcome you! Xojo Bundle

This bundle includes six back issues of the magazine -- all of year 21 in printed book and digital formats -- plus a one-year subscription (beginning with 22.1) so you'll be learning all about Xojo for the next year. It's the perfect way to get started programming with Xojo. And you save as much as $35 over the non-bundle price!

This offer is only available for a limited time as supplies are limited, so hurry today and order this special bundle before the offer goes away!

Article Preview


Buy Now

Issue 2.1

COLUMN

Multiple String Searches

An example of a finite state machine

Issue: 2.1 (August/September 2003)
Author: Thomas Reed
Author Bio: Thomas Reed has been programming as a hobbyist for more than 20 years, and fell in love with the Mac in 1984.
Article Description: No description available.
Article Length (in bytes): 10,403
Starting Page Number: 30
Article Number: 2114
Resource File(s):

Download Icon 2114.zip Updated: 2013-03-10 14:51:11

Related Web Link(s):

http://home.earthlink.net/~thomasareed/realbasic/
http://www.monkeybreadsoftware.de/realbasic/plugins.html

Excerpt of article text...

A finite state machine is commonly used to analyze a "language": any system of codes used to represent some meaning. One obvious example of a language would be English, in which 26 different symbols (letters) are used to build words. Another example would be the tokenized representation of C source code, created as an intermediate step in compilation by a C compiler. An array of objects in a REALbasic program could also be considered a language.

The finite state machine analyzes a language by using a set of states and transitions. The machine begins in an initial state, and each symbol in the language being analyzed transitions the machine to a new state. Certain states are "accept" states, meaning the state represents some target condition. For example, when analyzing an English language sentence, a non-alphabetic character might put a finite state machine into an accept state which signifies a word end, and the machine would add that word to a list.

In this article, I will discuss an example of a finite state machine that searches a block of text in a single pass for any number of keywords. For example, suppose you want to search a lengthy table of entree prices at restaurants all over town for "cod", "tuna", and "trout". Figure 1 illustrates a state machine containing these keywords. The algorithm begins in the start state-- state zero. Each character in the search text will trigger a transition into another state. In the event it reaches one of the accept states (3, 7, and 11 in the figure), a match has been found.

The machine must recognize when multiple matches are possible for the same character. For example, when searching for keywords "the" and "bathe", a match for "bathe" also represents a match for "the". For this reason, searches will return the character position of the last matched character instead of the first. The client code must then decide which match is the desired one, and must determine the start position based on its length.

To implement this algorithm, we will use four tables (arrays), which we call the match table, goto table, fail table, and output table. These tables define all behaviors of the finite state machine. Each table will have an element for every possible state. (The maximum possible number of states is the sum of the length of each keyword.) Setting up these four tables is time consuming and, for a machine with many states, can consume significant resources. Further, these tables cannot be changed once they are initialized, so they will only search for the defined set of keywords. For this reason, this algorithm is best suited for searches in a very large block of text, or for searches that will be repeated many times on many blocks of text.

The match table defines which characters signal a transition from each state and the goto table defines the new state after the transition. Elements in the match table will contain one of three things: a character, a constant which indicates that multiple characters cause a transition from that state, or a constant indicating that there are no defined transitions from that state.

...End of Excerpt. Please purchase the magazine to read the full article.