Reading view

There are new articles available, click to refresh the page.

Explainer: Data and metadata

Files, documents and everything else we store on our Macs consist of data. For an image, those are the pixels that have to be displayed for that image, for an illustrated book it’s the laid-out pages of text and pictures.

Associated with each of those is additional information about what’s in the data, such as the datestamp of its creation both as data and as that file, details of its creator, and about how it was created, such as the camera used. Those are data about its data, thus metadata.

Until 1984 and the first Mac, it was almost universal that most metadata was contained in the same file as the data, although some, such as a file’s datestamps, were stored separately in that file’s record in the file system, in its attributes. The Mac tried to change that by introducing a second fork to files, their resource fork, intended to contain metadata. Unfortunately, while that became standard on Macs, dominant operating systems like MS-DOS didn’t change, and continued to embed data and metadata together in flat files.

A lot has changed over the nearly 42 years since the first Mac, and now macOS has multiple sources of metadata for its files.

File attributes

APFS file records contain an extensive set of attributes, including

  • time of creation
  • time of modification of data
  • time of attribute modification
  • time of access
  • file name
  • owner, group and permissions.

These are largely common to other modern file systems.

Extended attributes

Mac OS X brought the extension of classic resource forks to other metadata objects, as extended attributes (xattr), named using a reverse-URL scheme, such as com.apple.FinderInfo containing metadata for the Finder. In this scheme, the traditional resource fork becomes a xattr of type com.apple.ResourceFork. Many of these are now used by macOS for security and privacy protection, but the user can add xattrs containing copyright information, names of creators, an arbitrary description, a text headline, and others. Anyone can define their own type of xattr, and some apps make good use of them for storing metadata.

Their main disadvantages are:

  • Xattrs rarely transfer to other platforms, making most Mac-only.
  • They’re commonly stripped when transferred even between Macs, or when shared in iCloud Drive. Apple has a system of tags to determine which xattrs should be stripped and which retained, but those aren’t as widely used as they deserve.
  • Most xattrs aren’t shown by the Finder, either in Preview panes or in Get Info dialogs.

For largely historical reasons, even Apple doesn’t take fullest advantage of xattrs. For example, Finder Comments, which are shown in Get Info, are primarily stored in a folder’s hidden .DS_Store file and only secondarily in a com.apple.metadata:kMDItemFinderComment xattr.

Embedded metadata

Because so few file systems use extended attributes or their equivalents, most file-specific metadata is now embedded in file data. For some file formats, such as those widely used by word processors and spreadsheets, this is relatively straightforward, particularly when using XML-based formats.

It becomes more complicated and less reliable when used with images, as in Exchangeable Image File Format, EXIF. Although usually treated as a metadata standard, in fact EXIF encompasses both data and metadata formats embedded in a single file for convenience.

EXIF metadata can include camera settings such as aperture and shutter speed, image metrics such as colourspace, date and time of creation, location and copyright information, and a thumbnail version of the image (which is arguably data rather than metadata). How the metadata is embedded with data is determined by the format of the image data. For JPEG data, EXIF metadata is stored in an Application Segment of the image, but for TIFF data there’s a sub-image file directory that can spread the metadata anywhere within the data.

The danger is that apps that edit data can inadvertently damage or remove the EXIF metadata, and that’s all too common among image editors that may need to rewrite the whole of the data when saving an edited image. Fortunately macOS relies on its own QuickLook thumbnails rather than embedded EXIF thumbnails, as some image editors don’t update the latter reliably.

There are more subtle disadvantages to embedding metadata with data. In Apple’s preferred model, there are separate datestamps in file attributes for saved changes to data and extended attributes, allowing them to be distinguished.

Summary

In macOS, metadata can be stored

  • in the file system as attributes,
  • as extended attributes,
  • embedded with the file data.

It’s hardly surprising how often it goes missing, or is overlooked.

❌