Special

Introducing the “Welcome to Xojo” Bundle!

New to Xojo and looking for guidance? We've put together a terrific bundle to welcome you! Xojo Bundle

This bundle includes six back issues of the magazine -- all of year 21 in printed book and digital formats -- plus a one-year subscription (beginning with 22.1) so you'll be learning all about Xojo for the next year. It's the perfect way to get started programming with Xojo. And you save as much as $35 over the non-bundle price!

This offer is only available for a limited time as supplies are limited, so hurry today and order this special bundle before the offer goes away!

Article Preview


Buy Now

Issue 2.4

COLUMN

Data Validation

Using checksums to verify data integrity

Issue: 2.4 (March/April 2004)
Author: Thomas Reed
Author Bio: Thomas Reed has been programming as a hobbyist for more than 20 years, and fell in love with the Mac in 1984.
Article Description: No description available.
Article Length (in bytes): 5,625
Starting Page Number: 34
Article Number: 2415
Related Link(s): None

Excerpt of article text...

Important data can be damaged in a variety of ways. A crash or damaged hardware can corrupt data on a storage medium, such as a hard drive. Noise in telecommunications hardware can modify data streams. A data entry operator might make a mistake in transcription. The user might even do harm by manually modifying data that is not meant to be edited directly. Because of this, it is often desirable to validate data before using it. In this column, we'll explore different kinds of checksum.

A checksum is extra data transmitted or stored with the original data. The checksum is designed such that some calculation with the data generates a value that can be compared to the checksum. If the calculation result and the checksum do not match, then the data is invalid.

The simplest example of a checksum is a parity bit, something that anyone who used a modem in years past has heard of. The parity bit is a ninth bit associated with every byte. There are two types of parity: even and odd. With even parity, the value of the parity bit is chosen so that, out of all nine bits, an even number of bits have the value 1. For example, with the byte 1111 0000, the parity bit would have a value of 0, while the parity bit associated with the byte 1111 0001 would be 1. Odd parity reverses these rules, requiring that the total number of bits set to 1 be odd.

Parity is a very fast form of data validation to calculate. Unfortunately, this is probably its only advantage. One major problem with parity is that it can only catch about half of the errors. For example, if the byte 1111 0000 is corrupted into 1110 1000, the algorithm would not detect the transposition of two of the bits. Using parity also adds an extra bit for every 8, making for an 11% increase in the amount of data.

Another type of checksum involves adding all the bytes in a string of data, and throwing away any bits that overflow, resulting in an integer between -128 and 127. The checksum value could be a byte equal to the negative of this number, so that adding all the bytes, including the checksum byte, would result in 0. In the case that this calculation results in a non-zero value, the string is not valid. This type of checksum is fast to calculate and verify, and it obviously requires less extra data than the parity bit.

This method is also significantly more reliable at verifying data than the parity bit. Using parity, if two bits are corrupt, there is a 50% chance that one incorrect bit will complement the other in such a way that the same parity bit would be generated. However, when summing a string of bytes, if two bytes are corrupt, there is only a 1-in-256 (or 0.4%) chance that one byte will complement the other so as to be undetectable using the checksum.

...End of Excerpt. Please purchase the magazine to read the full article.