Special

Introducing the “Welcome to Xojo” Bundle!

New to Xojo and looking for guidance? We've put together a terrific bundle to welcome you! Xojo Bundle

This bundle includes six back issues of the magazine -- all of year 21 in printed book and digital formats -- plus a one-year subscription (beginning with 22.1) so you'll be learning all about Xojo for the next year. It's the perfect way to get started programming with Xojo. And you save as much as $35 over the non-bundle price!

This offer is only available for a limited time as supplies are limited, so hurry today and order this special bundle before the offer goes away!

Article Preview


Buy Now

Issue 3.6

COLUMN

Levenshtein Distance

Calculating the degree of difference between two strings

Issue: 3.6 (July/August 2005)
Author: Thomas Reed
Author Bio: Thomas Reed has been programming as a hobbyist for more than 20 years, and fell in love with the Mac in 1984.
Article Description: No description available.
Article Length (in bytes): 9,566
Starting Page Number: 36
Article Number: 3617
Related Link(s): None

Excerpt of article text...

Imagine that you are creating a spell checker for your application. At first glance, everything seems pretty straightforward. Later, you happen to be searching Google for recipes for grilled halibut, only you mistype it "halivut". Up pops a nice little message at the top of the results window reading "Did you mean halibut?" Hey, that would be a nice feature to integrate into your spell checker! "But wait!" you say, "How do I do that?"

The solution to this problem involves calculating what is often called the string or edit distance. In other words, you need to know the degree of difference between two strings. One popular algorithm for evaluating this difference is the Levenshtein Distance algorithm.

The Levenshtein Distance is not only useful for spell checkers, but has many other applications. For example, suppose you are designing a database and want to guard against duplicate entries. It would be useful to be able to display a warning when entering a record for "Smith, Martin" when the database already contains an entry for "Smith, Marty".

An example from the sciences involves analysis of DNA sequences. Researchers often need to find similar sequences. This is another task well-suited for a string distance algorithm. Other applications of the Levenshtein algorithm have included dialect analysis, translation, and speech synthesis.

Now that we've established that it is a useful and worthwhile algorithm, let's move to the next step. How does it actually work? As with many of the algorithms I have discussed in this column lately, determining the similarity or difference of two strings is a trivial problem for a human, but seems like a very difficult problem for a computer to solve. Fortunately, again following recent trends, the Levenshtein Distance algorithm is very easy to understand and implement.

The Levenshtein algorithm defines three operations that may be performed on each character to transform one string into another: insertion, deletion, and substitution. For example, to transform "he" into "the" requires only the insertion of a "t", and the reverse requires only deletion of the "t". A transformation of "horse" into "house" requires only a substitution of a "u" character for the "r".

...End of Excerpt. Please purchase the magazine to read the full article.