Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Planning complex Time Machine backups for efficiency

By: hoakley
28 October 2024 at 15:30

Time Machine (TM) has evolved to be a good general-purpose backup utility that makes best use of APFS backup storage. However, it does have some quirks, and offers limited controls, that can make it tricky to use with more complex setups. Over the last few weeks I’ve had several questions from those trying to use TM in more demanding circumstances. This article explains how you can design volume layout and backup exclusions for the most efficient backups in such cases.

How TM backs up

To decide how to solve these problems, it’s essential to understand how TM makes an automatic backup. In other articles here I have provided full details, so here I’ll outline the major steps and how they link to efficiency.

At the start of each automatic backup, TM checks to see if it’s rotating backups across more than one backup store. This is an unusual but potentially invaluable feature that can be used when you make backups in multiple locations, or want added redundancy with two or more backup stores.

Having selected the backup destination, it removes any local snapshots from the volumes to be backed up that were made more than 24 hours ago. It then creates a fresh snapshot on each of those volumes. I’ll consider these later.

Current versions of TM normally don’t use those local snapshots to work out what needs to be backed up from each volume, but (after the initial full backup) should rely on that volume’s record of changes to its file system, FSEvents. These observe two lists of exclusions: those fixed by TM and macOS, including the hidden version database on each volume and recognised temporary files, and those set by the user in TM settings. Among the latter should be any very large bundles and folders containing huge numbers of small files, such as the Xcode app, as they will back up exceedingly slowly even to fast local backup storage, and can tie up a network backup for many hours. It’s faster to reinstall Xcode rather than restore it from a backup.

Current TM backups are highly efficient, as TM can copy just the blocks that have changed; older versions of TM backing up to HFS+ could only copy whole files. However, that can be impaired by apps that rewrite the whole of each large file when saving. Because the backup is being made to APFS, TM ensures that any sparse files are preserved, and handles clone files as efficiently as possible.

Once the backup has been written, TM then maintains old backups, to retain:

  • hourly backups for the last 24 hours, to accompany hourly local snapshots,
  • daily backups over the previous month,
  • weekly backups stretching back to the start of the current backup series.

These are summarised in the diagram below.

tmseqoutline1

Local snapshots

TM makes two types of snapshot: on each volume it’s set to back up, it makes a local snapshot immediately before each backup, then deletes that after 24 hours; on the backup storage, it turns each backup into a snapshot from which you can restore backed up files, and those are retained as stated above.

APFS snapshots, including TM local snapshots, include the whole of a volume, without any exceptions or exclusions, which can have surprising effects. For example, a TM exclusion list might block backing up of large virtual machine files resulting in typical backups only requiring 1-2 GB of backup storage, but because those VMs change a lot, each local snapshot could require 25 GB or more of space on the volume being backed up. One way to assess this is to check through each volume’s TM exclusion list and assess whether items being excluded are likely to change much. If they are, then they should be moved to a separate volume that isn’t backed up by TM, thus won’t have hourly snapshots.

Some workflows and apps generate very large working files that you may not want to clutter up either TM backups or local snapshots. Many apps designed to work with such large files provide options to relocate the folders used to store static libraries and working files. If necessary, create a new volume that’s excluded completely from TM backups to ensure those libraries and working files aren’t included in snapshots or backups.

TM can’t run multiple backup configurations with different sets of exclusions, though. If you need to do that, for instance to make a single nightly backup of working files, then do so using a third-party utility in addition to your hourly TM backups.

This can make a huge difference to free space on volumes being backed up, as the size of each snapshot can be multiplied by 24 as TM will try to retain each hourly snapshot for the last 24 hours.

Macs that aren’t able to make backups every hour can also accrue large snapshots, as they may retain older snapshots, that will only grow larger over time as that volume changes from the time that snapshot was made.

While snapshots are a useful feature of TM, the user has no control over them, and can’t shorten their period of retention or turn them off altogether. Third-party backup utilities like Carbon Copy Cloner can, and may be more suitable when local snapshots can’t be managed more efficiently.

iCloud Drive

Like all backup utilities, TM can only back up files that are in iCloud Drive when they’re downloaded to local storage. Although some third-party utilities can work through your iCloud Drive files downloading them automatically as needed, TM can’t do that, and will only back up files that are downloaded at the time that it makes a backup.

There are two ways to ensure files stored in iCloud Drive will be backed up: either turn Optimise Mac Storage off (in Sonoma and later), or download the files you want backed up and ‘pin’ them to ensure they can’t be removed from local storage (in Sequoia). You can pin individual files or whole folders and their entire contents by selecting the item, Control-click for the contextual menu, and selecting the Keep Downloaded menu command.

Key points

  • Rotate through 2 or more backup stores to handle different locations, or for redundancy.
  • Back up APFS volumes to APFS backup storage.
  • Exclude all non-essential files, and bundles containing large numbers of small files, such as Xcode.
  • Watch for apps that make whole-file changes, thus increasing snapshot and backup size.
  • Store large files on volumes not being backed up to minimise local snapshot size.
  • If you need multiple backup settings, use a third-party utility in addition to TM.
  • To ensure iCloud Drive files are backed up, either turn off Optimise Mac Storage (Sonoma and later), or pin essential files (Sequoia).

Further reading

Time Machine in Sonoma: strengths and weaknesses
Time Machine in Sonoma: how to work around its weaknesses
Understand and check Time Machine backups to APFS
Excluding folders and files from Time Machine, Spotlight, and iCloud Drive

A brief history of FileVault

By: hoakley
19 October 2024 at 15:00

Encrypting all your data didn’t become a thing until well after the first release of Mac OS X. Even then, the system provided little support, and most of us who wanted to secure private data relied on third-party products like PGP (Pretty Good Privacy).

pgp2003

FileVault 1

Apple released the first version of FileVault, now normally referred to as FileVault 1 or Legacy FileVault, in Mac OS X 10.3 Panther in 2003. Initially, that only encrypted a user’s Home folder into a sparse disk image, then in 10.5 Leopard it started using sparse bundles instead. These caused problems with Time Machine backup when it too arrived in Leopard, and proved so easy to crack that in 2006 Jacob Appelbaum and Ralf-Philipp Weinmann released a tool, VileFault, to decrypt FileVault disk images.

filevault2004

FileVault 1 was controlled in the Security pane of System Preferences, shown here in 2004.

newuser2004

Each new user added in the Accounts pane could have their Home folder stored in an encrypted disk image. Encryption keys were based on the user’s password, with a master password set for all accounts on the same Mac.

FileVault 2

FileVault 2 was introduced in Mac OS X 10.7 Lion in 2011, and at last provided whole-volume encryption based on the user password. Encryption was performed using the XTS-AES mode of AES with a 256-bit key, by the CPU. At that time, more recent Intel processors had instructions to make this easier and quicker, but all data written to an encrypted volume had to be encrypted before it was written to disk, and all data read from it had to be decrypted before it could be used. This imposed significant overhead of around 3%, which was more noticeable on slower storage such as hard disks, and with slower Macs.

Apple didn’t implement this by modifying the HFS+ file system to add support for encryption, but by adding encryption support to CoreStorage, the logical volume manager. In theory this would have enabled it to encrypt other file systems, but I don’t think that was ever done.

Turning FileVault on and off was quite a pain, as the whole volume had to be encrypted or decrypted in the background, a process that could take many hours or even days. Most users tried to avoid doing this too often as a result so, while FileVault 2 was secure and effective, it wasn’t as widely used as it should have been.

These screenshots step through the process of enabling FileVault in 2017.

lockratsec

Control was in the FileVault tab in System Preferences.

filevault01

iCloud Recovery was added as an alternative to the original recovery key.

filevault02

Encryption began following a restart, and then proceeded in the background for however long it took. Shrewd users enabled FileVault when a minimum had been installed to the startup volume, to minimise time taken for encryption.

filevault03

With a minimal install, it was possible to complete initial encryption in less than an hour. With full systems, it could take days if you were unlucky.

Although FileVault has had a few security glitches, it has done its job well. Perhaps its greatest threat came in the early days of macOS Sierra, when Ulf Frisk developed a simple method for retrieving the FileVault password for any Mac with a Thunderbolt port. An attacker could connect a special Thunderbolt device to a sleeping or locked Mac, force a restart, then read the password off within 30 seconds. This exploited a vulnerability in the handling of DMA, and was addressed by enabling VT-d in EFI, in Sierra 10.12.2 and 10.12.4.

Hardware encryption

The next big leap forward came at the end of 2017, with the release of the first Macs with T2 chips, as intermediates on the road to Apple silicon. One of Apple’s goals in T2 and Apple Silicon chips was to make encrypted volumes the default. To achieve that, T2 and M-series chips incorporate secure enclaves and perform encryption and decryption in hardware, rather than using CPU cycles.

The Secure Enclave incorporate the storage controller for the internal SSD, so all data transferred between CPU and SSD passes through an encryption stage in the enclave. When FileVault is disabled, data on protected volumes is still encrypted using a volume encryption key (VEK), in turn protected by a hardware key and a xART key used to protect from replay attacks.

filevaultpasswords1

When FileVault is enabled, the same VEK is used, but it’s protected by a key encryption key (KEK), and the user password is required to unwrap that KEK, so protecting the VEK used to perform encryption/decryption. This means that the user can change their password without the volume having to be re-encrypted, and allows the use of special recovery keys in case the user password is lost or forgotten. Keys are only handled in the secure enclave.

Securely erasing an encrypted volume, also performed when ‘erasing all content and settings’, results in the secure enclave deleting its VEK and the xART key, rendering the residual volume data inaccessible even to the secure enclave itself. This ensures that there’s no need to delete or overwrite any residual data from an encrypted volume: once the volume’s encryption key has been deleted, its previous contents are immediately unrecoverable.

eacas

Coverage of boot volumes by encryption varies according to the version of macOS. Prior to macOS Catalina, where macOS has a single system volume, the whole of that is encrypted; in Catalina, both System and Data volumes are encrypted; in Big Sur and later, the Signed System Volume (SSV) isn’t encrypted, nor are Recovery volumes, but the Data volume is.

External disks

Hardware encryption and FileVault’s ingenious tricks aren’t available for external disks, but APFS was designed to incorporate software encryption from the outset. As with internal SSDs, the key used to encrypt the volume contents isn’t exposed, but accessed via a series of wrappers, enabling the use of recovery keys if the user password is lost or forgotten. This involves a KEK and VEK in a similar manner to FileVault on internal SSDs. As the file system on the volume is also encrypted, after the KEK and VEK have been unwrapped, the next task in accessing an encrypted volume is to decrypt the file system B-tree using the VEK.

Enabling FileVault has been streamlined in recent years, as shown here in System Settings last year, for an external SSD, thus not using hardware encryption.

filevault1

FileVault control has moved to Privacy & Security in System Settings.

filevault2

The choice of iCloud Recovery or a recovery key remains.

filevault3

Because only the Data volume is now encrypted, enabling FileVault before populating the Home folder allows encryption to be almost instantaneous, on an external disk.

Virtual machines

The most recent enhancement to FileVault protection extends support to Sequoia virtual machines running on Sequoia hosts. Apple hasn’t yet explained how that one works, although I suspect the word exclave is likely to appear in the answer.

If your Mac has a T2 or Apple silicon chip and you haven’t enabled FileVault, then you’re missing one of the Mac’s best features.

Who’s been accessing my storage? Reading a disk’s history

By: hoakley
17 October 2024 at 14:30

Have you ever wondered whether someone else has changed your Mac’s storage? Or which version of macOS formatted each of its volumes? As all good forensic investigators know, APFS keeps detailed records of the formatting and modification of each volume. This article explains how you can read and interpret them. As in the tale of Goldilocks and the Three Bears, you may be able to tell who has been eating your APFS porridge.

Information available

Each APFS volume stores details of its history in the volume superblock apfs_superblock_t. Those include information on how that volume was created in apfs_formatted_by, and up to the last 8 times the volume has been modified, in apfs_modified_by.

Although you’ll need a forensic disk analysis tool to get full details, some of that data is easy to access. Select a volume in the Finder, and Get Info will give a time and date that volume was last formatted.

Run First Aid in Disk Utility on that volume’s container, and there’s even more information given about each volume within the container, including those you can’t see. If you’d rather not run a full check and repair, then you should see the same information in Terminal by using
diskutil verifyVolume disk10
where disk10 is the device name for the container. If you prefer you can use fsck_apfs directly, but verifyVolume should use that command’s options most efficiently.

One lingering problem you may encounter in Disk Utility is that it still fails frequently because it can’t unmount volumes for checking. If you encounter that error when trying to run First Aid on a container, try manually unmounting each volume within that container. If all else fails, diskutil verifyVolume appears to be better at handling the problem.

Workthrough

diskfirstaid1

As shown above, when run on one of my external SSDs, information about two APFS volumes was returned, itself something of a surprise. The volume I expected gave
The volume ThunderBay3 was formatted by diskmanagementd (1412.81.1) and last modified by apfs_kext (2313.1.2).
and the surprise, which isn’t mounted, thus effectively hidden, gave
The volume Update was formatted by com.apple.Mobile (1677.50.1) and last modified by apfs_kext (1677.141.2).
The Finder’s Get Info dialog for ThunderBay3 gave a volume creation date of 11 February 2020, and last modification of 20 December 2020.

Taking the visible volume ThunderBay3 first, APFS says that it was formatted by its own formatting tool, diskmanagementd, in APFS version 1412.81.1, which came in macOS 10.15 Catalina (see the Appendix below). A look through details of versions released pins that down to 10.15.3, released on 28 January 2020, which tallies with the creation date from the Finder. Its last modification was performed by a general APFS function, in APFS version 2313.1.2, which is that current for macOS 15.0 and 15.0.1.

The hidden Update volume has had quite a different history, as it was created in APFS version 1677.50.1 in Big Sur, to be more precise in macOS 11.0.1 released on 12 November 2021. That wasn’t a conventional volume creation either, and was performed by com.apple.Mobile, part of the Big Sur installer. It was last modified using APFS version 1677.141.2, which came in macOS 11.6 on 13 September 2021. Since then it appears to have been left unmounted and unused.

The history of that container therefore reads:

  • ThunderBay3 created by the user on 11 February 2020 in macOS 10.15.3
  • Update created by a macOS installer after 12 November 2021 in macOS 11.0.1
  • Update last mounted after 13 September 2021 in macOS 11.6
  • ThunderBay3 currently in use.

Conclusions

The hidden Update volume contains a restore log apparently left behind after the 11.5.2 update, together with some empty folders. These demonstrate that it was a temporary volume created by Big Sur’s new macOS installer, but never cleaned up afterwards, and left abandoned for the last three years. As Big Sur was the first version of macOS to use Apple’s new installer that created a Signed System Volume, this is likely to be present on other external disks that were mounted when any version of Big Sur was installed. Although it takes little space, it’s a surprising omission that no subsequent installer has seen fit to clean this up by deleting the volume.

Otherwise, information about the visible and mounted volume appears consistent, and confirms what I recall of its history. No one has been eating this bear’s APFS porridge.

Appendix: APFS and macOS version details

APFS major version numbers change with major version of macOS:

  • APFS version 0.3 or 249.x.x in macOS 10.12
  • 748.x.x in 10.13
  • 945.x.x in 10.14
  • 1412.x.x in 10.15
  • 1677.x.x in macOS 11
  • 1933.x.x in 12.0-12.2.1
  • 1934.x.x 12.3 and later
  • 2142.x.x in 13
  • 2235.x.x in 14.0-14.3.1
  • 2236.x.x in 14.4 and later
  • 2313.x.x in 15.

Minor version numbers increment according to the minor version of macOS, and patch numbers wander without pattern. Those can be checked by looking at the changes given for each macOS update listed on this page.

Disk Images: How read-write disk images have gone sparse

By: hoakley
14 October 2024 at 14:30

Until about three years ago, most types of disk image had fixed size. When you created a read-write disk image (UDRW) of 10 GB, it occupied the same 10 GB on disk whether it was full or empty. The only two that could grow and shrink in size were sparse bundles and sparse disk images, with the former generally preferred for its better resizing and performance.

This changed silently in Monterey, since when read-write disk images have been automatically resized by macOS, and saved in APFS sparse file format. As this remains undocumented, this article explains how this works, and how and where you can use it to your advantage.

Sparse

In this context, the word sparse refers to two very different properties:

  • Sparse bundles and sparse (disk) images are types of disk image that can change in size, and can be stored on a wide range of file systems and storage, including on NAS and other networked storage.
  • Sparse files are stored in a highly efficient file format available in modern file systems, where only the data in a file is stored, and empty space within that file doesn’t waste storage space. It’s available in APFS, but not in HFS+.

Requirements

For read-write disk images to be sparse, the following are required:

  • The disk image must be saved to, and remain on, an APFS volume.
  • The file system within the disk image can be either APFS or HFS+, but not FAT or ExFAT.
  • The disk image must be created and unmounted first. For that initial mount, the disk image isn’t a sparse file, so occupies its full size on disk.
  • Whenever that disk image is mounted again, and has sufficient free space within its set limit, it will be saved in sparse file format, and occupy less than its full size on disk.

Demonstration

  1. Create a new read-write disk image of at least 1 GB size with an internal file system of APFS on an APFS volume, using DropDMG, Disk Utility, or an alternative.
  2. Once it has been created and mounted, unmount it in the Finder.
  3. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is the same as that set.
  4. (Optional) Open the disk image using Precize, and confirm that it’s not a sparse file.
  5. Double-click the disk image to mount it in the Finder, and wait at least 10 seconds before unmounting it.
  6. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is now significantly less than that set.
  7. (Optional) Open the disk image using Precize, and confirm that it has now become a sparse file.

How it works

When you create the disk image, macOS creates and attaches its container, and creates and mounts the file system within that. This is then saved to disk as a regular file occupying the full size of the disk image, plus the overhead incurred by the disk image container itself. No sparse files are involved at this stage.

When that disk image is mounted next, its container is attached through diskarbitrationd, then its file system is mounted. If that’s APFS (or HFS+), it undergoes Trimming, as with other mounts. That coalesces free storage blocks within the image to form one contiguous free space. The disk image is then saved in APFS sparse file format, skipping that contiguous free space. When the file system has been unmounted and the container detached, the space used to store the disk image has shrunk to the space actually used within the disk image, plus the container overhead. Unless the disk image is almost full, the amount of space required to store it on disk will be smaller than the full size of the disk image.

This is summarised in the diagram below.

SparseDiskImage1

The size of read-write disk images is therefore variable depending on the contents, the effectiveness of Trimming in coalescing free space, and the efficiency of APFS sparse file format.

Conversion to sparse file

When mounting an APFS file system in a read-write disk image, APFS tests whether the container backing store is a sparse format, or a flat file. In the case of a newly created read-write disk image that hasn’t yet been converted into a sparse file, that’s detected prior to Spaceman (the APFS Space Manager) scanning for free blocks within its file system. When free blocks are found, APFS sets the type of backing store to sparse, gathers the sparse bytes and ‘punches a hole’ in the disk image’s file extents to convert the container file into sparse format. That appears in the log as:
handle_apfs_set_backingstore:6172: disk5s1 Set backing store as sparse
handle_apfs_set_backingstore:6205: disk5 Backing storage is a raw file
_punch_hole_cb:37665: disk3s5 Accumulated 4294967296 sparse bytes for inode 30473932 in transaction 3246918, pausing hole punching

where disk5s1 is the disk image’s mounted volume, and disk3s5 is the volume in which the disk image container is stored.

Trimming for efficient use of space

That conversion to sparse format is normally only performed once, but from then on, each time that disk image is mounted it’s recognised as having a sparse backup store, and Spaceman performs a Trim to coalesce free blocks and optimise on-disk storage requirements. For an empty read-write disk image of 2,390,202 blocks of 4,096 bytes each, as created in a 10 GB disk image, log entries are:
spaceman_scan_free_blocks:4106: disk5 scan took 0.000722 s, trims took 0.000643 s
spaceman_scan_free_blocks:4110: disk5 2382929 blocks free in 7 extents, avg 340418.42
spaceman_scan_free_blocks:4119: disk5 2382929 blocks trimmed in 7 extents (91 us/trim, 10886 trims/s)
spaceman_scan_free_blocks:4122: disk5 trim distribution 1:0 2+:0 4+:0 16+:0 64+:0 256+:7

accounting for a total of 9.8 GB.

Changes made to the contents of the disk image lead to a gradual reduction in Trim yield. For example, after adding files to the disk image and deleting them, instead of yielding the full 9.8 GB, only 2,319,074 blocks remain free, yielding a total of 9.5 GB.

For comparison, initial Trimming on a matching empty sparse bundle yields the same 9.8 GB. After file copying and deletion, and compaction of the sparse bundle, Trimming performs slightly better, yielding 2,382,929 free blocks for a total of 9.6 GB. Note that Trimming of sparse bundles is performed by APFS Spaceman separately from management of bands in backing storage, which isn’t a function of the file system.

Size efficiency

Although read-write disk images stored as sparse files are efficient in their use of disk space, they’re still not as compact as sparse bundles. For an empty 10 GB image, the read-write type requires 240 MB on disk, but a sparse bundle only needs 13.9 MB. After light use storing files, then deleting the whole contents, a 10 GB read-write disk image grows to occupy 501 MB, but following compaction a sparse bundle only takes 150 MB. That difference may not remain consistent over more prolonged use, though, and ultimately compacting sparse bundles may cease freeing any space at all.

It’s also important to remember that sparse bundles need to be compacted periodically, if any of their contents are deleted, or they may not reduce in size after deletions. Read-write disk images can’t be compacted, and reclaim disk space automatically.

Benefits and penalties

Read-write disk images saved as sparse files are different from sparse bundles in many ways. Like any sparse file, the disk image still has the same nominal size as its full size, and differs in the space taken on disk. Sparse bundles should normally only have band files sufficient to accommodate their current size, so their nominal size remains similar to the space they take on disk. The result is that, while read-write disk images in sparse file format will help increase free disk space, their major benefit is in reducing ‘wear’ in SSDs by not wasting erase-write cycles storing empty data.

Unlike the band file structure in sparse bundles, which can be stored on almost any disk, APFS sparse files have to be treated carefully if they are to remain compact. Moving them to another file system or over a network is likely to result in their being exploded to full size, and I have explained those limitations recently.

Both read-write disk images and sparse bundles deliver good read performance, but write performance is significantly impaired in read-write disk images but not sparse files. Encryption of disk images and sparse bundles also has significant effects on their performance, and in some cases write performance is badly affected by encryption. I have previously documented their performance in macOS Monterey, and will be updating those figures for Sequoia shortly.

Summary

  • Read-write disk images saved on APFS storage in Monterey and later are no longer of fixed size, and should use significantly less disk space unless full, provided the disk image has been unmounted at least once since creation, and the file system in the disk image is APFS or HFS+.
  • This is because macOS now saves the disk image in sparse file format.
  • To retain the disk image in sparse file format, it needs to remain in APFS volumes, and normal precautions are required to maintain its efficient use of space. The disk image can’t be manually compacted, in the way that sparse bundles require to be.
  • The main benefit of this strategy is to minimise erase-write cycles, so reducing ‘wear’ on SSDs.
  • Sparse bundles are still more efficient in their use of disk space, and have higher write speeds, but read-write disk images are now closer in their efficiency and performance.

Previous articles

Introduction
Tools

APFS incompatibilities and how to live with them

By: hoakley
8 October 2024 at 14:30

APFS has many of the features of modern file systems that make most efficient use of space, and others that aren’t found in more traditional file systems like HFS+. While those should all work well when used locally with other APFS volumes, they can prove incompatible with other file systems, and may have untoward side effects. This article looks at those features that you could encounter problems with, and how best to work around them.

Sparse files

Most modern file systems store files that contain significant amounts of blank or absent data in a special sparse file format. In APFS, this is implemented by only allocating file extents to those blocks that do contain data. In many cases, this can save significant disk space. Initially, sparse files were unusual in APFS, but over time they have become increasingly common, and can now be found in some types of disk image, virtual machine storage, and in some databases. Note that disk image types whose name includes the word sparse, sparse bundles and sparse disk images, don’t use sparse file format at all, but are the victims of an unfortunate name collision.

Neither macOS nor APFS can simply convert regular files to sparse format; for a file to be written in sparse format, the code writing it must explicitly skip the empty space. As a result, sparse files are prone to explode to full size when they’re copied or moved unless that’s between two local volumes both using APFS. Examples of where you should expect them to explode include:

  • copy or move to HFS+, as that has no sparse file format
  • copy between Macs using AirDrop or file sharing
  • back up to network storage, although local Time Machine backups to APFS should preserve them
  • copy or move to any other local file system other than APFS, even if that file system has its own support for sparse files.

In each of those cases, sparse files explode in size as they’re copied from the source volume. If you have a 100 GB sparse file that only takes 20 GB of local disk space, when it’s copied over the network, or to a local HFS+ volume, the full 100 GB has to be transferred.

One potential workaround is to compress the sparse file before transferring it. All good compression algorithms will work efficiently on the blank space in the file, so when compressed its size could be as small as the original sparse file. However, when it’s decompressed, even on another APFS volume, it will explode to full size. For disk images, that can be corrected by mounting them, as APFS will then trim their contents, and the disk image should be saved back into sparse format.

Clone files

These are two distinct files that share common data, normally the result of duplicating the original within the same APFS volume. Those two cloned files then only require the storage of the whole file, plus those data blocks that differ between them. This only works within the same volume, and the moment that either of the clones is moved or copied to any other volume, it assumes full size, as it can no longer share data with the other clone.

However, most other file systems don’t support file cloning in this way. When you duplicate a file in an HFS+ volume, there’s no shared data between them, and the two require twice the amount of space as one does.

Snapshots

Snapshots consist of a complete copy of the volume at an instant in time, so require a copy of the file system metadata for that volume, and retain copies of storage blocks for each file as they are changed subsequent to the snapshot being made, so you could roll back that volume to its state at the moment of the snapshot.

Although Time Machine backups contain snapshots of the volume they back up, snapshots can’t normally be copied to another disk or volume. Some have been able to make a complete copy of a disk including its snapshots using the dd command tool, but that should be considered experimental. In all other circumstances, snapshots stay where they were made, but you can always copy from an existing snapshot to reconstitute a volume.

Directory hard links

These aren’t available in APFS, but are supported in HFS+, where they’re used extensively in Time Machine backups. They work like regular hard links, but act on directories rather than individual files. They can’t be copied in any way to an APFS volume, but can be used to reconstitute a volume.

Extended attributes

These are additional metadata associated with files and folders, and are fully supported in both APFS and HFS+. However, many are treated as being ephemeral, and may not be preserved during copying or other actions. The system of flags used to determine which are preserved is detailed in this article.

Several other file systems also support extended attributes, but copying between them is unlikely to transfer them between file systems. In some cases, extended attributes are preserved using AppleDouble format, in which each file can have a hidden shadow with its name prefixed with ._ (dot-underscore) characters. These are most often seen in FAT and ExFAT volumes, but are prone to confuse users of other computers.

Key points

  • APFS sparse files are only preserved when copying or moving files between local APFS volumes. In other circumstances they explode to full size.
  • Good compression methods can keep a sparse file to a similar size, but decompression explodes them to full size. Disk images may then be restored as sparse files after they have been mounted again from APFS.
  • Clone files aren’t preserved when copied or moved to any other volume.
  • Snapshots can’t be copied at all, although they can be reconstituted as volumes.
  • APFS doesn’t support the directory hard links used in Time Machine backups to HFS+, but a backup can be reconstituted as a volume.
  • Extended attributes are preserved between APFS and HFS+, but not with other file systems, except in the shadow files of AppleDouble format, seen in FAT and ExFAT volumes.

Disk Images: Introduction

By: hoakley
7 October 2024 at 14:30

A disk image is a file or a bundle containing what could otherwise be the contents of a disk. It’s a common way to store and move complete file systems in a neat package, for items that need to be separated from the physical storage provided by a drive. macOS uses disk images for some tasks of great importance, including:

  • Recovery and Hardware Diagnostics systems,
  • additions to macOS such as Safari, its supporting frameworks, and dyld caches, in cryptexes,
  • networked storage for Time Machine backups, in sparse bundles,
  • lightweight virtual machines on Apple silicon Macs.

You could use them to store encrypted data on unencrypted volumes, and they’re often used for delivering Apple and third-party software.

Disk images are poorly documented for both users and developers, and have changed significantly over the last few years. Articles in this series explain how to choose between different types of disk image, how to create and use them, and what to do when they go wrong.

Containers and file systems

Disk images consist of two distinct components: the file or bundle itself functioning as a container, and the file system contained inside it. When referred to in this context, disk image containers are completely unrelated to the sandbox containers found in ~/Library/Containers.

This distinction is important in several respects, although it isn’t apparent when you use disk images in the Finder. Preparing a disk image for access involves two separate functions: attaching its container, and mounting any file systems found inside it. When that’s performed by the Finder, perhaps by double-clicking the disk image, those two actions appear fused into one. Similarly, removing the disk image requires all its mounted file systems to be unmounted first, then the container is detached.

One feature that’s widely confused is the encrypted disk image. This involves encryption of the whole container, rather than using an encrypted file system within it. Now that Disk Utility no longer supports the creation of encrypted HFS+ volumes, one remaining alternative is to use an encrypted disk image containing an HFS+ volume.

If you want an analogy for disk images, attaching the container is like connecting an external disk, and once that has been performed, the file systems contained by that disk have to be mounted before you can access their contents.

Types

There are many different types of disk image in use, of which the two this series is most concerned with are plain UDIF read-write disk images (UDRW), and sparse bundles (UDSB). Others you may encounter include:

  • plain UDIF read-only (UDRO),
  • various compressed versions of UDRW,
  • CD/DVD master for export (UDTO),
  • sparse disk image (UDSP), a single file rather than a bundle.

Those specify the container format; within each, there’s the usual choice of file systems, although throughout these articles it will normally be assumed that APFS will be used unless otherwise specified.

The word sparse in sparse bundle and sparse disk image doesn’t refer to APFS sparse files, but to the fact that those types of disk image can grow and diminish in size, and normally try to occupy the minimum amount of disk space. This is an unfortunate name collision.

Structure

With the exception of sparse bundles, all disk images are contained within a single file of opaque structure.

Sparse bundles consist of a single bundle folder containing:

  • bands, a folder containing the contents of the disk image in band files
  • info.plist and its backup copy info.bckup, containing settings including band size
  • lock, an empty file
  • mapped, a folder containing small data files to match all of the band files except the first
  • token, an empty file.

Container size

Until this changed in Monterey (or thereabouts), non-sparse disk images had fixed container sizes. Create a UDIF read-write disk image (UDRW) of 10 GB, and the space occupied by it on disk was approximately 10 GB, whether it was empty or full. Although it remains undocumented, when stored on APFS volumes, UDRW disks can now take advantage of APFS sparse file format, and will normally only occupy the disk space required for the contents of their file system.

This is only true once the disk image has been mounted for the first time after it has been created, mounted and unmounted. To see this, create a test read-write disk image (for example, using Disk Utility) of 10 GB size. Then unmount it, and use the Finder’s Get Info command to inspect its size on disk. That will be 10 GB. Then mount the disk image again, pause a couple of seconds, unmount it, and Get Info will show its size on disk is now much smaller.

As I’ll explain in detail in a later article, this is because each time that disk image is mounted, if its internal file system is HFS+ or APFS, its contents will be trimmed, and saved to disk in sparse file format, which omits all its empty space. This only applies to read-write disk images when they’re stored in APFS file systems; copy them to HFS+ and they’ll explode to full size, as HFS+ doesn’t support the sparse file format.

Considering just the two leading types, empty sizes for a 100 GB disk image are:

  • a sparse bundle is 35.4 MB empty, 53.3 GB when containing about 51 GB files, stored across 6,359 band files.
  • a read-write disk image (UDRW) shrinks to 333.8 MB once stored as an empty sparse file, 53.6 GB when containing about 51 GB files, in a single file container.

Performance

Some types of disk image perform poorly, and can be very slow to write to. Recent versions of macOS have brought improvements, although some options such as encryption can still impair performance significantly. For the two leading types, when their container is stored on the internal SSD of an Apple silicon Mac, with native read and write speeds of around 6-7 GB/s:

  • an initially empty unencrypted sparse bundle reads at 5.1 GB/s, and writes at 4.8 GB/s.
  • an initially empty unencrypted read-write disk image (UDRW) reads at 5.3 GB/s, and writes at only 1 GB/s.

Tests were performed using my utility Stibium, across a range of 160 files of 2 MB to 2 GB size in randomised order, with macOS 15.0.1.

Key points

  • Disk images consist of a file or bundle containing one or more file systems; the container and its contents are distinct.
  • To access the contents of a disk image, the container is first attached, then the file system(s) within it are mounted. In the Finder, those two processes appear as a single action.
  • Encrypted disk images encrypt the container, and don’t necessarily contain encrypted file systems.
  • Disk images come in many different types, and can contain different file systems.
  • Sparse bundles have a file and folder structure inside their bundle folder, with their data saved in band files; all other disk images are single files.
  • Sparse bundles grow and shrink according to the size of files stored within them.
  • In recent macOS, and on APFS, read-write disk images (UDRW) are stored in APFS sparse file format, enabling them to grow and shrink as well.
  • In recent macOS, unencrypted sparse bundles have read and write performance close to that of the disk they’re stored on. Read-write disk images read at similar speeds, but write more slowly, at about 20% of read speed.

Which disk format?

By: hoakley
27 September 2024 at 14:30

macOS supports several different disk formats, each of which has its purposes. This article explains which to choose from those you can create by formatting a disk in Sequoia. macOS also supports some other formats like NTFS for reading only, which I won’t cover here.

Disk structure

Before any file system can be formatted on a disk, the storage in the disk must be partitioned using one of three standard schemes:

  • GUID Partition Map (GPT), standard for most disks and file systems in macOS;
  • Master Boot Record (MBR), formerly used for MS-DOS and Windows;
  • Apple Partition Map, an old format used by Macs, and worth avoiding unless you know it’s required.

Even if the whole of the disk is going have just one file system or volume, one of those is required, as it stores information about how the space on the disk is allocated.

Volumes and containers

Apart from APFS, most file systems you’re likely to use require a whole partition on the disk for each volume. For example, an HFS+ disk with two volumes is divided into two partitions, each of which contains one HFS+ volume. Because partitioning is intended to be static, that means those two volumes have fixed size, and don’t share free space between them. Once created, it’s possible to change partitioning, and Disk Utility will try to do so non-destructively, without losing any data in the volumes, but that isn’t always possible.

APFS volumes are different, and share free space within a partition, termed a container in APFS terminology. APFS containers are essentially similar to HFS+ volumes, as they’re static partitions, but APFS volumes are sized dynamically within their static container.

Thus, if you have a disk with two HFS+ volumes and two APFS volumes, that disk will have at least three partitions:

  • one for each of the two HFS+ volumes,
  • one as the container for the two APFS volumes, although they could instead each be given their own container.

You will see some claims that APFS volumes are more like directories or folders in HFS+, but those are confusing and should be ignored. Each APFS volume has its own file system, and dragging a file from one volume to another results in two completely separate copies of that file, using twice the storage space of one – that’s not how folders behave!

Available formats

Disk Utility version 22.7 in macOS Sequoia 15.0 can format the following file systems using a GUID Partition Map:

  • APFS, not encrypted and case-insensitive
  • APFS, encrypted and case-insensitive
  • APFS, unencrypted and case-sensitive
  • APFS, encrypted and case-sensitive
  • HFS+ journalled and case-insensitive (JHFS+)
  • HFS+ journalled and case-sensitive
  • ExFAT
  • MS-DOS (FAT32).

Using a Master Boot Record or Apple Partition Map, the following formats are available:

  • HFS+ journalled and case-insensitive (JHFS+)
  • HFS+ journalled and case-sensitive
  • ExFAT
  • MS-DOS (FAT32).

APFS is not compatible with Master Boot Record or Apple Partition Map.

The command tool diskutil additionally offers FAT, FAT12, FAT16, Free Space, and HFS+ without journalling, although those aren’t available in Disk Utility.

Note that, contrary to Disk Utility’s Help book, encrypted HFS+ formats are no longer available in Disk Utility, although with the availability of encrypted APFS formats, I can’t think why anyone would want to use encrypted HFS+. That’s because support for encryption was added later to HFS+, whereas it has been designed in from the start of APFS, and is far superior.

Which format for Macs?

The default format for disks to be accessed by Macs is now APFS, not encrypted and case-insensitive, and should be used with all Macs running Mojave and later.

Case-sensitivity can in some circumstances be important. The two situations where this is likely to be required is for the storage of Time Machine backups, and in volumes that might need to store native files from iOS or iPadOS, which use case-sensitive APFS.

Encrypted APFS is different from FileVault used on internal SSDs. However, if you enable FileVault on a macOS installation on external storage, that’s implemented as encrypted APFS. On the internal SSDs of Macs with T2 or Apple silicon chips, FileVault provides additional protection to the keys used for its hardware encryption performed in the Secure Enclave, and should always be enabled. Encrypted APFS is strongly recommended for volumes on external disks that contain private or sensitive data, such as Time Machine backups.

Thus, Time Machine backups should be made to case-sensitive encrypted or unencrypted APFS, and that’s what Time Machine will create for you.

APFS or HFS+?

APFS was designed for use primarily on SSDs rather than hard disks, and doesn’t have features intended to maintain performance when used on hard disks. Unless a volume on a hard disk is intended to store Time Machine backups, it may therefore still be preferable to use either of the supported HFS+ formats. Deciding which is better depends on how that volume will be used.

Because SSDs are largely unaffected by fragmentation of used or free space, APFS isn’t designed to minimise fragmentation, indeed some of its best features inevitably increase fragmentation. In particular, the file system metadata in APFS volumes may become badly fragmented, leading to poor performance on hard disks. Two extreme examples are boot disks and those used to store largely static media libraries.

Recent versions of macOS will happily boot from external hard disks, and despite their relatively poor transfer rates, are far from unusable. Problems come as you use that hard disk, and perform file operations in your Home folder. Over the course of a few weeks or months, fragmentation increases, particularly in file system metadata, and its performance declines.

Media libraries that are most frequently read from, and whose files undergo relatively little change, have fewer changes in their file system metadata, and may well never suffer from noticeable performance impairment. This is also true for Time Machine backups, although some users report deteriorating performance after many months or a year or two.

Time Machine in Sequoia will no longer start a new backup series on HFS+; if you try adding a disk with a single HFS+ volume on it as backup storage, it automatically converts the disk to an APFS container with a single APFS case-sensitive volume, and uses that to store its backups. However, you can still use third-party backup utilities including Carbon Copy Cloner to make backups to HFS+ volumes, although APFS is recommended now. That can have several significant disadvantages, among them:

  • Snapshots can’t be made of backup storage.
  • APFS special file types aren’t supported. For sparse files, this can lead to backups being substantially larger than the source files. Worse still, if you restore from a backed-up sparse file, it won’t be automatically converted back to sparse file format on APFS.
  • Block copying isn’t normally supported, again making backups larger, and increasing the time required to back up.
  • Where available, incremental backups may become unmanageable because of the number of files and folders.

As a general principle, both APFS and HFS+ can be backed up to APFS backup storage, while HFS+ storage is only suitable for backups from HFS+.

Which format for Windows compatibility?

Although other computers can read HFS+ and sometimes even APFS volumes, few of their users know how to access the Mac’s native file systems. If you need your storage to be accessible from other computers, one of the two supported MS-DOS/Windows formats is best.

Information given in the Disk Utility Help book is slightly misleading over the choice between FAT32 and ExFAT. FAT32 offers a maximum volume size of 2 TB, with files up to almost 4 GB each, but was primarily intended for magnetic media.

Unless the system intending to read the volume is limited to FAT32, it’s generally preferable to use ExFAT instead. That supports massive volumes and file sizes, and was optimised for use in flash memory. ExFAT is thus the more commonly encountered format for USB flash drives (thumb drives, memory sticks) and SD cards, where it’s the default format for SDXC and SDUC cards larger than 32 GB.

FAT32 and ExFAT are supported with both GUID Partition Map and Master Boot Record schemes. Unless the other computer or system is very old, prefer GUID Partition Map. Neither volume format supports encryption, so if you want to protect files you should encrypt them separately, or store them in an accessible encrypted archive format.

Summary

  • For general purposes, default disk format should use a GUID Partition Map with either plain or encrypted APFS.
  • Encrypted APFS should be used for volumes on external disks containing private or sensitive data. File Vault should be enabled on internal SSDs.
  • New Time Machine backups can now only be made to case-sensitive APFS, either plain or encrypted.
  • Hard disks with active file systems may suffer poor performance with APFS, and HFS+ with journalling may still be preferable, but it has significant limitations and disadvantages.
  • Hard disks for more static use, such as for media libraries and backups, should be safe with APFS, but may eventually suffer poor performance.
  • Unless there are good reasons for using FAT32, format USB flash drives, SDXC and SDUC cards that need to be used with non-Mac systems using ExFAT in a GUID Partition Map scheme.

From quarantine to provenance: how xattrs are copied

By: hoakley
13 September 2024 at 14:30

In the previous article, I outlined what extended attributes do, and how they work in macOS. I also started to explain how some are considered ephemeral, while others are persistent. This article continues from there, by documenting how macOS decides what to do with them when a file containing xattrs is copied.

Although Apple does now explain a little about this in the context of the FileProvider framework and syncing with cloud services, the only useful documentation is provided in man xattr_name_with_flags, and two source code files that are part of the open source copyfile component.

In 2013, as part of its enhancements for iCloud in particular, Apple added support for flags on xattrs to indicate how those xattrs should be handled when the file is copied in various ways. Rather than change the file system, Apple opted for what’s perhaps best seen as an elegant kludge: appending characters to the end of the xattr’s name.

If you work with xattrs, you’ve probably already seen this in those whose name ends with a hash # then one or more characters: that’s actually the flags, not part of the name, what Apple refers to as a ‘property list’. To avoid confusion I won’t use that term here, but refer to them as xattr flags. A common example of this is com.apple.lastuseddate#PS, which is seen quite widely. In recent years, Apple has added one flag, B, and there’s another to come with Sequoia.

Xattr flags

Flags can be upper or lower case letters C, N, P, S or B, and invariably follow the # separator, which is presumably otherwise forbidden from use in a xattr’s name. Upper case sets or enables that property, while lower case clears or disables that property. There are currently (macOS 14.6.1) five properties:

  • C: XATTR_FLAG_CONTENT_DEPENDENT, which ties the flag and the file contents, so the xattr is rewritten when the file data changes. This is normally used for checksums and hashes, text encoding, and position information. The xattr is preserved for copy and share, but not in a safe save.
  • P: XATTR_FLAG_NO_EXPORT, which doesn’t export or share the xattr, but normally preserves it during copying.
  • N: XATTR_FLAG_NEVER_PRESERVE, which ensures the xattr is never copied, even when copying the file.
  • S: XATTR_FLAG_SYNCABLE, which ensures the xattr is preserved during syncing with services such as iCloud Drive. Default behaviour is for xattrs to be stripped during syncing, to minimise the amount of data to be transferred, but this will override that.
  • B: XATTR_FLAG_ONLY_BACKUP, which keeps the xattr only in backups, including Time Machine (added recently).

These operate within another general restriction of xattrs: their name cannot exceed a maximum of 127 UTF-8 characters.

Defaults

macOS provides a standard ‘whitelist’ of default flag settings for different types of xattr. These aren’t contained in a configuration file, but are baked into the xattr flag code, where as of macOS 14.6.1 the following default flags are set for different types of xattr (* here represents the wild card):

  • com.apple.quarantinePCS
  • com.apple.TextEncodingCS
  • com.apple.metadata:kMDItemCollaborationIdentifierB
  • com.apple.metadata:kMDItemIsSharedB
  • com.apple.metadata:kMDItemSharedItemCurrentUserRoleB
  • com.apple.metadata:kMDItemOwnerNameB
  • com.apple.metadata:kMDItemFavoriteRankB
  • com.apple.metadata:* (except those above) – PS
  • com.apple.security.*S
  • com.apple.ResourceForkPCS
  • com.apple.FinderInfoPCS
  • com.apple.root.installedPC

Copy intents

Also contained in the source code is a table of intents, that explains how different types of copy are affected by different combinations of xattr flag. Currently, those are:

  • XATTR_OPERATION_INTENT_COPY – a simple copy, preserves xattrs that don’t have flag N or B
  • XATTR_OPERATION_INTENT_SAVE – save, where the content may be changing, preserves xattrs that don’t have flag C or N or B
  • XATTR_OPERATION_INTENT_SHARE – share or export, preserves xattrs that don’t have flag P or N or B
  • XATTR_OPERATION_INTENT_SYNC – sync to a service such as iCloud Drive, preserves xattrs if they have flag S, or have neither N nor B
  • XATTR_OPERATION_INTENT_BACKUP – back up, e.g. using Time Machine, preserves xattrs that don’t have flag N

Use

If you want a xattr preserved when it passes through iCloud, you therefore need to give it a name ending in the xattr flag S, such as co.eclecticlight.MyTest#S. Sure enough, when xattrs with that flag are passed through iCloud Drive, those xattrs are preserved even if the default rule would treat them differently. Similarly, to have a xattr that is stripped even when you just make a local copy of that file, append #N to its name.

There’s a further limit imposed on xattrs synced by FileProvider, including those for iCloud Drive, that strips all individual xattrs that are larger than a certain size. Apple gives that as “about 32KiB total for each item”, and my measurements performed in the recent past put that at about 32,650 bytes, slightly less than 32,767.

In itself, this information is valuable if you ever use any metadata stored in xattrs. It’s used in my intergrity-checking utilities Dintch, Fintch and cintch to ensure the xattr containing a file’s hash isn’t stripped by passage through iCloud Drive, for instance. On Tuesday morning next week, once Sequoia has been released, I’ll explain how Apple has extended this system to achieve something that many have been wishing for.

From quarantine to provenance: extended attributes

By: hoakley
12 September 2024 at 14:30

One of the innovative features in classic Mac OS was its use of resource forks, allowing structured metadata to be attached to any file. When Mac OS X merged that with the more traditional Unix approach adopted by NeXTSTEP, those were nearly lost. Classic Mac apps were restructured from storing most of their components, including their executable code, in their resource fork, when Mac OS X flattened those into an app bundle consisting of a hierarchy of separate files in folders, without any resources.

For the first four years of Mac OS X resource forks were reluctantly tolerated, until the solution came in 10.4 with the introduction of extended attributes, including one to contain what had previously been stored in the resource fork, which became an extended attribute or xattr with the name com.apple.ResourceFork.

All files in HFS+ and APFS (and other file systems) contain a fairly standard set of metadata known as attributes, information about a file such as its name, datestamps and permissions. Xattrs are extensions to those that contain almost any other type of metadata, the first notable xattr coming in Mac OS X 10.5, named com.apple.quarantine. That contains quarantine information for apps and other files downloaded from the internet, in a format so ancient that the quarantine flag is stored not in binary but as text.

The quarantine xattr provides a good demonstration of some of the valuable properties of xattrs: it can be attached to any file (or folder) without changing its data, and isn’t included when calculating CDHashes for code signatures. It can thus be added safely without any danger of altering the app or its code, although it does change the way in which macOS handles the code, by triggering security checks used to verify it isn’t malicious. Once those have been run, the flag inside the quarantine xattr can be changed to indicate it has been checked successfully.

Far from being a passing phase, or dying out as some had expected, xattrs have flourished since those early days. This has happened largely unseen by the user: few change anything revealed in the Finder’s Get Info dialog, although they’re used to store some forms of visible metadata such as Finder tags, and the URL used to download items from the internet. Editing xattrs is normally performed silently: you’re not made aware of changes in the quarantine xattr, and in most cases the only way to manage xattrs is to use the xattr command tool, or one of very few apps like xattred that can edit and manage them.

Examples

Among the well-known and important xattrs you can encounter are:

  • com.apple.quarantine the quarantine xattr, containing a quarantine flag
  • com.apple.rootless marks items individually protected by System Integrity Protection (SIP)
  • com.apple.provenance contains data about the origin of apps that have been quarantined
  • com.apple.metadata:kMDItemCopyright records copyright info
  • com.apple.metadata:kMDItemWhereFroms the origin of downloaded file as a URL
  • com.apple.metadata:_kMDItemUserTags Finder tags
  • com.apple.TextEncoding reveals text file encoding
  • com.apple.ResourceFork a classic Mac resource fork

Storage

In APFS and HFS+, xattrs aren’t stored with file data, nor with a file or folder’s normal attributes.

fileobjects

For smaller extended attributes up to 3,804 bytes, their data is stored with the xattr in the file system metadata. Larger extended attributes are stored as data streams, with separate records, but still separately from the file data. Apple doesn’t give a limit on the maximum size of xattrs, but they can certainly exceed 200 KB, and each file and folder can have an effectively unlimited number of them.

Persistence

Most file systems to which macOS can write either handle xattrs natively (HFS+, APFS), or macOS uses a scheme to preserve them, as in the hidden files written to FAT and ExFAT volumes. NFS is an important exception, and files copied to NFS will have all their xattrs stripped. Neither are extended attributes unique to Macs: most file systems used by Linux support them, and even Windows can at a push.

Because xattrs contain a wide range of metadata, some are treated as being ephemeral, others as persistent. Moving files with xattrs around within the same volume shouldn’t affect their xattrs, as that takes place within the same file system. Copying files to another volume, even if both use APFS, may leave some xattrs behind if they’re considered to be ephemeral.

iCloudDriveFileSummary4

The most complex situation is when a file with xattrs is moved to iCloud Drive. The Mac that originated that file is likely to retain most if not all of its xattrs, because the local copy remains within the same volume and file system. However, not all xattrs are copied up to iCloud storage, so other Macs accessing that file may only see a small selection of them. The rules for which xattrs are to be preserved during file copying, including in iCloud Drive, are baked into macOS, and the subject of the next article.

A brief history of Time Machine

By: hoakley
7 September 2024 at 15:00

In the days before Mac OS X, Apple didn’t provide a serious backup utility, and by the time we were starting to move up from Classic Mac OS the standard choice was normally Dantz Development’s Retrospect, first released in 1989 and still available today in version 19.

timelretrospect

idiskbackup2004

Time Machine wasn’t the first utility in Mac OS to back up local storage. In 2004, Apple’s first cloud subscription service .Mac included a Backup app that backed up local files to iDisk in the cloud, something that still isn’t supported today with iCloud.

In the following years, AirPort Wi-Fi systems flourished, and Apple decided to launch a consumer NAS incorporating an AirPort Extreme Base Station with a 500 GB or 1 TB hard disk. Software to support that was dubbed Time Machine, and was released in Mac OS X 10.5 Leopard on 26 October 2007, 17 years ago. The first Time Capsule was announced in January 2008, and shipped a month later.

timemachine1

timemachine2

Time Machine’s pane in System Preferences changed little until Ventura’s System Settings replaced it.

timemachine2

The application’s restore interface featured a single Finder-like window, much like today’s. Internally, Time Machine scheduled its backups using a system timer and launchd, making backups every hour regardless of what else the Mac might have been doing at the time.

The initial version of Time Machine was both praised and slated. Unlike Mike Bombich’s rival Carbon Copy Cloner, it couldn’t create bootable backups, and there were problems with FileVault encryption, which at that time could only encrypt Home folders, rather than whole volumes. Despite those, its introduction transformed the way that many used their Macs, and made it more usual for users to have backups.

TMbackup105

From its release, Time Machine was dependent on features of the HFS+ file system to create its Finder illusion. Every hour the backup service examined the record of changes made to the file system since the last backup was made, using its FSEvents database. It thus worked out what had changed and needed to be copied into the backup. During the backup phase itself, it only copied across those files that had been created or changed since the last backup was made.

TMbackuphardlinks

It did this by using hard links in the backup, and Apple added a new feature to its HFS+ file system to support this, directory hard links. Where an entire folder had remained unchanged since the last backup, Time Machine simply created a hard link to the existing folder in that backup. Where an existing file had been changed, though, the new file was written to the backup inside a changed folder, which in turn could contain hard links to its unchanged contents.

This preserved the illusion that each backup consisted of the complete contents of the source, while only requiring the copying of changed files, and creation of a great many hard links to files and folders. It was also completely dependent on the backup volume using the HFS+ file system, to support those directory hard links.

Without directory hard links, backups would quickly have become overwhelmed by hard links to files. If you had a million files and folders on the backup source volume, every hourly backup would have had to create a total of a million copied files or hard links. Directory hard links thus enabled the efficiency needed for this novel scheme to work.

timemachinefail

Apple later introduced what it termed Mobile Time Machine, intended for notebooks that could be away from their normal backup destination for some time. In around 10,000 lines of code, Mac OS X came to create something like a primitive snapshot, only on HFS+.

When macOS introduced its new DAS-CTS scheduling and dispatch system for background activities, in (about) Sierra, Time Machine’s backups were added to that. That proved unfortunate at the time because of a bug in that system, which failed on Macs left running continuously for several days, when backups could become infrequent and irregular.

When Apple released the first version of APFS on Mac OS X in High Sierra, its new snapshot feature was immediately incorporated into Time Machine to replace the earlier Mobile variant. Initially, APFS snapshots were also used instead of the FSEvents database to determine what should be backed up. Since then, making each backup of an APFS volume has involved creating a snapshot that’s stored locally on the APFS volume being backed up. In High Sierra and Mojave, the structure of backups themselves didn’t change, so they still required an HFS+ volume and relied on directory hard links.

TMbackup1015

Catalina introduced a more complicated scheme to replace snapshots as the usual means for determining what to back up. This was presumably because computing a snapshot delta had proved slow. As the backup destination remained in HFS+ format that could’t use snapshots, it continued to rely on directory hard links.

Big Sur and its successors with Signed System Volumes (SSVs) retained the option to continue backing up to HFS+ volumes, but added the ability to back up APFS volumes to APFS backup storage at last.

tmbackup14a

When backing up to APFS, Time Machine reverses the design used in High Sierra: instead of using snapshots to determine what needed to be backed up before creating a backup using traditional hard links, most of the time Time Machine determines what has changed using the original method with FSEvents, then creates each backup as a synthetic snapshot on the backup store. Unlike earlier versions, in Big Sur and later Time Machine can’t back up the System volume.

Once Time Machine has made a detailed assessment of the items to be backed up, it forecasts the total size to be copied. The local snapshot is copied to an .inprogress folder on the backup volume, and backup copying proceeds. Where possible, only changed blocks of files are copied, rather than having to copy the whole of every file’s data, an option termed delta-copying that can result in significant savings. Old backups are removed both according to age, and to maintain sufficient free space on the backup volume, in what Time Machine refers to as age-based and space-based thinning.

Data copied to assemble the backup on the backup volume is formed into a synthetic snapshot used to present the contents of that backup both in the Time Machine app and the Finder. Those snapshots are presented in /Volumes/.timemachine/ although they’re still stored on the backup volume.

Although modern Time Machine backups to APFS are both quicker and more space-efficient, the structure of backup storage poses problems. Copying backup stores on HFS+ was never easy, but there are currently no tools that can transfer those on APFS to another disk.

Behind the familiar interfaces of its app and settings, Time Machine has come a long way over the last few years, from building an illusion using huge numbers of hard links to creating synthetic snapshots.

What is Macintosh HD now?

By: hoakley
2 September 2024 at 14:30

Perhaps you just tried to save a document, only to be told you don’t have sufficient permissions to do so, or attempted to make another change to what’s on your Mac’s internal storage, with similar results. You then select the Macintosh HD disk in the Finder and Get Info. No wonder that didn’t work, as you only have read-only access to that disk. But if you unlock it and try to make any changes to permissions, you see

xpermserror

What’s going on?

Between macOS Mojave, with its single system volume, and Big Sur, the structure of the Mac system or boot volume has changed, with Catalina as an intermediate. Instead of Macintosh HD (or whatever you might have renamed it to) being one volume on your boot disk, it’s now two intertwined and joined together. What you see now as Macintosh HD isn’t even a regular APFS volume, but a read-only snapshot containing the current macOS. No wonder you can’t change it.

Root

Select the boot disk Macintosh HD in the Finder, and it appears to have four visible folders, Applications, Library, System and Users, just like it always did. Press Command-Shift-. to reveal hidden folders and all the usual suspects like bin, opt and usr are still where they should be. That’s the root of the combined System and Data volumes, and what’s shown there is a combination of folders on both volumes, with the top level or root on the Sealed System Volume (SSV).

The contents of those folders are also the result of both volumes being merged together using what Apple terms firmlinks:

  • Applications contains apps installed in your own Applications folder on the Data volume, and those bundled in macOS on the SSV. You can see just the latter in the path System/Applications, where they appear to be duplicated, but aren’t really.
  • Library comes only from the Data volume, and all its contents are on that volume. But inside it, in the path Library/Apple/System/Library are some components that should appear in the main System/Library.
  • System comes only from the SSV, although it has some contents merged into it using firmlinks, such as those folders in Library.
  • Users also comes only from the Data volume, and includes all Home folders for users.

So while the root of Macintosh HD might be in the SSV, much of its contents are on the Data volume, and can be written to, even though the root is a read-only snapshot, thanks to those firmlinks.

Data volume

There are two places that mounted volumes are listed in the Finder: the hidden top-level folder Volumes, where Macintosh HD is just a link back to the root complete with its merged volumes, and in System/Volumes, where what’s shown as Macintosh HD is in fact not the merged volumes, but only the Data volume. You can confirm that by looking at what’s in System/Volumes/Macintosh HD/System, where you only see the parts of the System folder that are stored on the Data volume, and not those stored on the SSV.

What is more confusing there is that System/Volumes/Macintosh HD/Applications is the same merged folder containing both user and bundled apps as in the top-level Applications folder. That’s an artefact resulting from the way that its firmlink works.

But if you open the Get Info dialog on System/Volumes/Macintosh HD, you’ll see the same as with the root Macintosh HD disk, information about the root and not the Data volume.

Mounted in System/Volumes are several other volumes like VM and Preboot, and (depending on whether this is an Intel or Apple silicon Mac) folders such as Recovery and xarts, that you really don’t want to mess with.

Permissions problems

Tackling problems that appear to be the result of incorrect permissions is best done at the lowest folder level. If you’re trying to save a document to the Documents folder inside your Home folder, select that and Get Info on it. Chances are that you are the owner and have both Read & Write permissions as you should. In that case, the problem most likely rests with privacy protection as in Privacy & Security settings. You then suffer Catch-22, as you can only effect changes to those by closing and opening the app, and as you can’t save your document before closing the app, you’re at risk of losing its contents. You may have better luck trying a different folder, creating a new one inside your Home folder, or using the Save As… command instead (which may be revealed by holding the Option key when opening the File menu).

Full layout

In case you’re wondering exactly which folders are merged into the hybrid Macintosh HD ‘volume’, those are shown below in increasing levels of detail, starting with the broad layout.

BootVolGpVentapfs

Then to a simplified version of the full layout.

BigSurIntSimple

Finally, in complete detail.

BigSurIntegrated

Happy navigating!

Links, aliases and the decline of paths

By: hoakley
26 August 2024 at 14:30

One of the fundamental requirements of any decent operating system and its primary file system is to provide objects that link to files and directories in storage. These allow code and users to access those files and folders indirectly, by referencing them from elsewhere. In most systems there are two methods of achieving this, by referencing the directory path and name of the file/folder, or using a reference to an identity number unique to that file system and part of that file’s attributes, normally the inode number. These form symbolic and hard links respectively.

Symlinks

linksetc3

A symlink is a separate file containing the path to the file or folder that it links to. It thus has its own File System Object, and its own tiny data containing the path to the original. This works well as long as that path is preserved in its entirety. The inode of the original file is unaware of any symlinks to it, so deleting the original file also breaks every symlink to that file.

Symlinks are easily created in Terminal using a command like
ln -s /Users/myname/Movies/myMovie.mov /Users/myname/Documents/Project1/myNewMovie.mov
which creates a tiny file myNewMovie.mov containing the path /Users/myname/Movies/myMovie.mov referring to the file it links to.

You can’t create symlinks using the Finder but can using third-party apps, and they should now work fully with QuickLook thumbnails and previews. They essentially take no space on disk. Paths used can be shorter and relative, allowing an enclosing folder to be moved within that volume, which would break a full absolute path. They can also specify files on other volumes, so long as the path to them remains correct.

Paths in symlinks work most reliably in the macOS SSV, as it’s both structured and static. Although the disciplined user can use them effectively in their Home folder, they are prone to unintended effects such as the renaming of an intermediate directory, moving any component in the path, and even changing the Unicode normalisation of a character in the path. As symlinks only contain a path, there’s no fallback to try resolving a broken path, making them inherently fragile, not robust.

Hard links

linksetc2

In APFS, a hard link is actually a single file, with one File System Object, that has two or more references in the form of Siblings that are associated with the file inode. That object has a single set of File Extents, so each of the siblings refers to exactly the same file and data, although siblings will normally have different paths. In the object, the reference count equals the number of siblings; when siblings are deleted, that count is decremented, and the object is only removed when the count reaches zero.

Hard links are easily created in Terminal using a command like
ln /Users/myname/Movies/myMovie.mov /Users/myname/Documents/Project1/myNewMovie.mov
to create a new hard link named myNewMovie.mov to the original myMovie.mov.

You can’t create hard links in the Finder, and because they use a single set of File Extents, they essentially take no space on disk.

Hard links look and work exactly like the original, and can be moved around freely within the same volume as they’re not dependent on paths. Copy one to another volume, though, and the copy will be a complete unlinked file. Hard links to files and to directories were one of the essential ingredients of Time Machine backups on HFS+, but as APFS doesn’t support directory hard links, Time Machine now has to use a different backup format for storage on APFS.

Aliases and bookmarks

Aliases originated in System 7, when Mac OS lacked the BSD/Unix features that were to come in Mac OS X. At that time, paths were seldom used, making symlinks implausible, so the original Finder alias was instead built on the equivalent of an inode number. They have evolved into a combination offering path information and inode number that should be able to resolve links that would break the path in symlinks, and can extend to other volumes, unlike hard links.

Despite their robustness and versatility, macOS provides no bundled command tools for the creation of aliases or their resolution into paths, making them of no use in shell scripts or the command line, although the free alisma addresses these. Neither have they been integrated into APFS as distinct objects in the file system, where they’re just another file.

Bookmarks are a variant of aliases intended to be used in files for similar purposes. For example, the list of files shown in the Open Recent menu command in apps is assembled from a file containing bookmarks to each of those documents. Most lists of apps and files built by Launch Services and other sub-systems are similarly reliant on bookmarks.

Demonstration

To demonstrate how these three different types of link behave, create a folder in your Home Documents folder (~/Documents), and copy a test document into that. Next to that file, create another folder named links, and within that create a symlink (using a full absolute path), a hard link, and an alias to the original file.

links

Copy the file containing the links to another volume, and note how the hard link becomes a separate local file instead of a link, but the symlink and alias continue to link to the original, whose path is unchanged.

Rename the folder enclosing the test document and links folder, and note how the symlink is then broken, although QuickLook may still show a cached thumbnail for it. Note how this also breaks the copy of the symlink on the other volume, as the path used by both of them no longer exists.

Finally, move the folder of links to be alongside that containing the original document, in the ~/Documents folder, and note how the symlink is broken there too.

Of the three types of link, the only one that proves robust to all these changes is the alias.

Dual-booting your Mac with multiple versions of APFS

By: hoakley
22 August 2024 at 14:30

Since its first public release seven years ago in High Sierra, APFS has changed greatly. Connect a bootable external disk with Sonoma installed to a Mac running macOS 10.13 and it won’t know what to make of it. That’s because many of the features used by APFS today didn’t exist until Catalina and Big Sur. This article explains how your Mac can cope with running such different versions of APFS, and how to avoid the problems that can arise when a Mac can start up in two or more different versions of macOS.

Fences

Disk structures, APFS and macOS fall into three main phases:

  • High Sierra and Mojave, which can run 32-bit code, boot from a single integrated volume, and don’t understand Signed System Volumes (SSVs); they run APFS versions 748.x.x to 945.x.x.
  • Catalina (macOS 10.15), which can’t run any 32-bit code, and boots from a group of volumes where the system is contained in its own read-only volume; it runs APFS version 1412.x.x.
  • Big Sur and later, which can’t run any 32-bit code, and boot from an SSV that’s a specially sealed snapshot on the System volume; they run APFS version 1677.x.x and later.

Fuller and finer details of the changes with different versions of macOS are given in the Appendix at the end.

If you need your Mac to run more than one version of macOS, it’s easiest and most compatible if the versions installed come from only one of those phases. Two phases are more of a challenge, and all three needs great care to ensure that one version doesn’t damage the file systems of another.

When you do need to run macOS from two or more phases, the cleanest and safest way is to only mount disks and volumes from one phase at a time. For example, when running Mojave, if you unmount disks and volumes for all later versions of macOS, then you won’t see any notifications warning you of Incompatible Disk. This disk uses features that are not supported on this version of macOS. Unfortunately, unless all your boot systems are on external disks, this isn’t easy to achieve.

Warnings

Generally speaking, APFS tools including those run by Disk Utility (including fsck_apfs, used by First Aid) are backward-compatible, but may not be as reliable when a tool from an older version of macOS is run on a newer version.

First Aid and fsck_apfs normally report versions of APFS tools used on the volumes and containers they’re checking. These draw attention to any potential for incompatibilities, such as:
The volume [volume name] was formatted by diskmanagementd (1412.141.1) and last modified by apfs_kext (945.250.134).
telling you that volume was created by macOS 10.15, and last changed by macOS 10.14.

Other messages you might see include warnings like: warning: container has been mounted by APFS version 2236.141.1, which is newer than 1412.141.3.7.2, which is less helpful.

If you’re going to run multiple versions of macOS on the same Mac, you’ll have to get used to those. For reference, APFS versions decode to macOS versions as:

  • 249.x.x is the beta release from macOS 10.12 Sierra
  • 748.x.x is 10.13 High Sierra
  • 945.x.x is 10.14 Mojave
  • 1412.x.x is 10.15 Catalina
  • 1677.x.x is 11 Big Sur
  • 1933.x.x is 12 Monterey up to 12.2
  • 1934.x.x is 12.3 Monterey and later
  • 2142.x.x is 13 Ventura
  • 2235.x.x is 14 Sonoma up to 14.3
  • 2236.x.x is 14.4 Sonoma and later
  • 2311.x.x and later is macOS 15 Sequoia beta.

Dangers

To minimise the risk of any problems arising, any manipulation of the file system should be performed by Disk Utility, the diskutil command tool or others, for the same or later version of macOS. Thus, if you have Mojave and Big Sur installed, you could use tools from either to maintain Mojave file systems, but should only use those for Big Sur when maintaining Big Sur’s file systems.

This becomes more difficult when file system maintenance needs to be performed in Recovery mode. Once the Mac has started up in Recovery, check the version of macOS before opening Disk Utility, using fsck_apfs, diskutil or anything similar.

This is one situation where versions of macOS with an SSV can become tricky, because of their different associations with Recovery systems. On Apple silicon Macs, Big Sur doesn’t use a paired Recovery volume, whereas Monterey and later do. To be confident that the Mac starts up in the correct version of Recovery, it should have been running that version normally before you start it up in Recovery. Once in Recovery, check the version of macOS before proceeding to work with its file systems.

Appendix: APFS and macOS version details

APFS major version numbers change with major version of macOS:

  • macOS 10.12 has APFS version 0.3 or 249.x.x, which shouldn’t be used at all.
  • 10.13 has 748.x.x, which doesn’t support Fusion Drives, but has basic support for volume roles.
  • 10.14 has 945.x.x, the first version to support Fusion Drives.
  • 10.15 has 1412.x.x, the first version to support the multi-volume boot group, and introduces extended support for volume roles, including Data, Backup and Prelogin.
  • 11 has 1677.x.x, the first version to support the SSV, and Apple silicon. On M1 Macs, it doesn’t support the paired Recovery volume, though.
  • 12 has 1933.x.x until 12.2, thereafter 1934.x.x, which support the paired Recovery volume on Apple silicon.
  • 13 has 2142.x.x, and is probably the first to support trimming of UDRW disk images and their storage as sparse files.
  • 14 has 2235.x.x, until 14.3, thereafter 2236.x.x.

Minor version numbers increment according to the minor version of macOS, and patch numbers wander without pattern.

❌
❌