Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Explainer: Text formats

By: hoakley
28 November 2025 at 15:30

textutil, and Textovert its wrapper, convert between nine different formats, most of them in widespread use for documents that are largely based on text. This article explains a little about each of them, and its sequel tomorrow looks at how PDF differs. In each case, I give an example file size for a document containing the words
This is a test.
in a total of 15 characters.

Plain text

Conventionally, plain text files in macOS are most usually encoded using Unicode UTF-8, requiring just 15 bytes for the hex bytes
54 68 69 73 20 69 73 20 61 20 74 65 73 74 2e. Of course that contains no font or layout information, just the raw content.

Rich Text (RTF)

This was introduced and its specification developed by Microsoft during the late 1980s and 90s, for cross-platform interchange, primarily between its own products. Support for this in Mac OS X came in Cocoa and its rich text editor TextEdit, inherited from NeXTSTEP. The format contains two main groups of features, styled text with fonts, and simple layout that has been extended to include the embedding of images and other non-text content.

RTF files consist of text, originally ASCII but now with Unicode support. Although not actually a mark-up language, its source code appears similar.

Each RTF file opens with the ‘magic’ characters {\rtf introducing information about conformity of the code. Following that is a preamble that is likely to contain platform-specific information, a font table and colour tables. The latter should include an expanded colour table for macOS. Then follows content, typically setting the font and size, with the paragraph content. For the example file, size is 378 bytes.

RTFD

RTF has several shortcomings, particularly in handling embedded images, so NeXTSTEP extended it to a bundle format, Rich Text Format Directory, RTFD, that transferred to Mac OS X. RTF content of a document is stored in a file named TXT.rtf, alongside separate files containing scalable images that can include PDF, and the whole directory is treated as if it was a single file. Although this works well in macOS, it never caught on in Windows, so hasn’t achieved the popularity it deserved. As the example file doesn’t have any images, its size as RTFD is also 378 bytes.

Microsoft Word

From its inception in 1983 until it switched to docx, Microsoft Word’s native file format has had the extension .doc. This is a binary format that has been successfully reversed for OpenOffice and LibreOffice open source, so incorporated into many products, including Cocoa and macOS.

From 2002, Microsoft Word has used a series of XML-based formats, since 2006 conforming to standards published first by Ecma then ISO/IEC, using the extension .docx, and known as Office Open XML. Support has been incorporated into macOS.

The .doc version of the example file requires 19 KB, while the .docx version takes only 4 KB.

HTML

This has evolved through a series of versions since its release in 1993, and is the markup language that dominates the web. Its structure should be well-known, and consists of an opening document type declaration followed by tagged elements containing metadata and content. Support for writing HTML is built into the Cocoa HTML Writer in macOS. This uses CSS to define styles in the header that are then applied to sections of the content, for example
<body>
<p class="p1">This is a test.</p>
</body>

The example requires only 538 bytes of HTML.

webarchive

This format is proprietary to Apple and its Safari browser, and when viewed in a capable text editor such as BBEdit, is shown as consisting of the serialised contents of a displayed web page, in XML format. In fact, as fds corrects me below, “a .webarchive is better described as a collection of web resources serialized via NSKeyedArchiver into binary plist format, bundled together into a single file in yet another property list, also saved in the binary property list format.”

When viewed in an editor such as BBEdit, after its opening XML and document type declaration as a property list, this consists of a dictionary of key-value pairs, themselves including sub-dictionaries of Web Resources. The content of each, its WebResourceData, is encoded in Base-64, making it impossible to read in a text editor. Although these can be large, for the example only 778 bytes of storage is required, showing the efficiency of the binary property list format.

WordML

Between the original .doc and the Ecma .docx formats, Microsoft Word used an intermediate WordProcessingML (or WordML) format in XML. After a standard XML header, this declares
<?mso-application progid=”Word.Document”?>
followed by a list of schemas. Although of largely historical interest now, some old Word documents may remain in this format. The example file requires 1 KB of storage.

ODT

This is OpenDocument Text, another XML-based format that was developed around the same time as WordML, and supported by many free apps and ‘office’ suites. Its opening structure is similar to that of WordML, but references oasis and OpenDocument sources. The example here requires 2 KB of storage.

Pages

One significant omission from the list of text formats supported by textutil is that used by Apple’s own Pages. This proprietary format changed significantly in 2009. Currently, a .pages document is a Zipped bundle containing thumbnail JPEG previews of the document, and two folders of files. Content appears to be saved in Apple iWork Archive files with the .iwa extension, and quite unlike RTFD.

Explainer: Preferences

By: hoakley
1 November 2025 at 16:00

When you run a command tool you invoke its options in the command you enter. Those options are supplied each time you run the tool, and don’t persist. Apps are different, in having a GUI that usually offers the user options, and in relying on information that persists until you next run that app. Those are its preferences, settings or defaults, depending on how you look at them.

In traditional Unix, persistent preferences may be implemented as configurations, defined in a plain text config file. In classic Mac OS window settings and a great deal more were saved as resources, in the resource fork of the app or its documents. This resulted in one neat feature that’s seldom seen in macOS today, saving a document’s window settings to its file, so they will be reused the next time that document is opened.

One of the innovations in NeXTSTEP was the human-readable property list used to store serialised objects such as preferences. These consist of designated variables used by the app that are converted into a representation that can be expressed in text. For example, if an app lets the user decide whether to use US or metric units of measurement, that could be stored in memory as a Boolean variable, true or false, and serialised as the word true or false in a property list used to store the app’s preferences.

Contents

Accommodating all the preferences needed by an app usually requires a dictionary of those serialised values, each given a key for identification, and having an explicit or implicit data type. Thus, that user option might become
key: metricUnits
value: true, a Boolean with two possible values.

Mac OS X replaced the old NeXTSTEP format for property lists with two formatting schemes, XML and JSON, with XML the standard for app preferences. This is a file containing dictionaries of key-value pairs representing the serialised data:
<dict>
<key>metricUnits</key>
<true/>
<key>filePrefix</key>
<string>MyFile</string>
</dict>

Initially, all property lists were stored as plain text, but that’s woefully inefficient, so between Mac OS X 10.2 and 10.4 a more compact binary format replaced that, and remains the standard today, as implemented in the UserDefaults API.

cfprefsd

Although developers can handle their app’s defaults/preferences with their own code if they wish, macOS provides the defaults server cfprefsd, and that convenient API that is used by the great majority of apps. Under that, early in an app’s initialisation cfprefsd automatically opens that app’s preferences, then loads its key-value pairs to make them available to the app as it’s setting itself up.

cfprefsd is transparent to the developer, whose code simply accesses key-value pairs as they are required. cfprefsd may opt to keep the whole preference file in memory, and manage it however it sees fit. Thus the property list’s contents on disk may not represent those held in memory for the app, and any changes to the property list file may be overwritten when cfprefsd saves changed values from memory.

For a simple app, working with cfprefsd should also be straightforward. The app’s preference property list is opened by cfprefsd shortly after the app is launched, and the app’s code works through UserDefaults to make any changes to key-value pairs while the app is running. As the app is shut down, cfprefsd updates the preference file, and the user is once again free to change or delete that property list as they wish. However, there’s ample scope for that to become more complicated, or to misuse it.

Problems

Many apps today aren’t that simple in their structure, and use helper apps and other executable code that may still be running with access to the app’s preferences even though the main app is shut down. When the user thinks it’s safe to modify the contents of that property list, it may still be in the care of cfprefsd. The preferred approach then is to use the defaults command tool, which should work with cfprefsd rather than competing with it.

In the past, UserDefaults and cfprefsd weren’t always reliable, and some developers worked around their problems with a combination of the official API and performing their own direct manipulation of preference files. Those dangerous practices should have died out now.

Because an app’s preferences are accessed early as it’s being launched, any bugs or incompatibilities in those key-value pairs can have fatal effects before the app is fully open. For example, if a new version of an app reuses an existing preference key with a different data type, if it reads an old version of its preferences, that will throw an error. If that’s not handled well, that can cause the new version of the app to crash when launched.

Fortunately, all apps have to be able to create their own preference file for when they’re first run. There’s scope for further bugs there, when the file created isn’t updated to work with changed key-value pairs in a newer version of the app. That may result in an app that crashes when launched even when there’s no existing preference file saved, a problem for which there’s no workaround.

Finally, many apps have multiple preference files. If they run in a sandbox, the copy they use normally is in the Data/Library/Preferences folder in their container, in ~/Library/Containers. But they may also have a different property list in ~/Library/Preferences, and sometimes a master copy in /Library/Preferences as well. While I’m sure cfprefsd knows which to access, you may need to check by inspecting each file’s timestamps.

UserDefaults have improved significantly with SwiftUI, further integrating persistent storage of preferences. Although they can still trip up the unwary, provided you understand how they work and don’t try fighting the system, they should seldom cause substantial problems.

Further reading

UserDefaults (Apple)
Preferences and Settings Programming Guide (Apple) from 2013
Thomas Tempelmann’s Prefs Editor works with cfprefsd

❌
❌