Article 16102: : Byte Order Marker

Special

Introducing the “Welcome to Xojo” Bundle!

New to Xojo and looking for guidance? We've put together a terrific bundle to welcome you! Xojo Bundle

This bundle includes six back issues of the magazine -- all of year 21 in printed book and digital formats -- plus a one-year subscription (beginning with 22.1) so you'll be learning all about Xojo for the next year. It's the perfect way to get started programming with Xojo. And you save as much as $35 over the non-bundle price!

This offer is only available for a limited time as supplies are limited, so hurry today and order this special bundle before the offer goes away!

Recent issues

22.2 cover
Mar/Apr 2024

22.1 cover
Jan/Feb 2024

Article Preview

Buy Now

Issue 16.1 ('London 2017')
Instant purchase and download via GumRoad!

FEATURE

Byte Order Marker

How to Implement a Byte Order Marker (BOM) with Xojo

Issue: 16.1 (January/February 2018)
Author: Eugene Dakin
Author Bio: Eugene works as a Senior Oilfield Technical Specialist. He has university degrees in the disciplines of Engineering, Chemistry, Biology, Business, and a Ph.D. in Chemical Engineering. He is the author of dozens of books on Xojo available on the xdevlibrary.com website.
Article Description: No description available.
Article Length (in bytes): 11,481
Starting Page Number: 12
Article Number: 16102
Resource File(s):

project16102.zip Updated: 2018-01-01 22:32:50

Related Link(s): None

Excerpt of article text...

Data in UTF form can be confusing, and adding endianness can be overwhelming. I have great news, as the Byte Order Marker can help remove this confusion when opening a file or receiving a file.
A byte order mark (BOM) are the hexadecimal numbers FE FF which are placed at the beginning of a file, or data stream, which are used to automatically determine the type of encoding of the data. It is common to write programs in many languages, and the way that non-english ASCII characters are shown is by using different encodings. Byte Order Mark should be invisible to the user, and programs should automatically read this data and decode the text appropriately.
There is an issue with just writing the text UTF16LE, which means Unicode Transformation Format in 16-bit blocks in Little Endian format. UTF is the way that characters are converted to numbers and back to characters again by the computer.
In the early days of computers, most of the text was written in English, which required about 128 characters to include capitals, small letters, and some accent characters. When other languages were starting to be on the internet, there quickly needed to be more characters than just those for English. The characters were expanded to UTF-16. When even more unique characters were needed (and an example is with the many characters in the Mandarin [Chinese] language), then UTF-32 was created.
Another issue was that not all computers stored information the same. Intel processors wrote data in Little Endian (LE) format, while old Mac computers wrote data in Big Endian (BE) format, and these formats were also added onto the end of the UTF type.
With all of these different format types, there needed to be a way to detect the format of a text document or HTML that was sent over the internet. This was when the Byte Order Mark (BOM) was created. When the hexadecimal value &hFEFF is added with the encoding, then the value will change depending on the UTF and Endian type. The following table shows the values of &hFEFF when the first 32 bits are read by the computer.

...End of Excerpt. Please purchase the magazine to read the full article.