Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

How do APFS volume roles work?

By: hoakley
21 November 2024 at 15:30

Since Catalina and Big Sur, macOS has started up not from a single volume, but a whole boot volume group. Among those are the System and Data volumes, intertwined by their firmlinks, a paired Recovery volume, and hidden volumes for virtual memory swap space and preboot firmware. To help macOS know which is which, each of those has a role assigned. This article explores how that works, how you can hand-craft your own Time Machine backup volume, and wonders what a Sidecar backup is.

Volume roles

Tucked away in the superblock of each APFS volume is an unsigned 16-bit integer setting that volume’s roles, chosen from 18 values ranging from None to Prelogin. Although I’m sure I’ve seen these disclosed in Disk Utility in the past, at the moment it appears the only way to read a volume’s set roles is in the command line, using the diskutil command tool, which can also create roles for a new volume, and change them for existing volumes.

The volume superblock of those that are part of a boot volume group also contains a UUID identifying that group.

Boot disk structures on Intel and Apple silicon Macs differ, as shown in the diagrams below.

BootDiskStructureIntelSeq

That on the internal storage of Intel Macs consists of two partitions, of which only one is an APFS container.

BootDiskStructureMSeq

Apple silicon Macs have three APFS containers, with their own volume groups.

According to Apple’s ageing APFS Reference, now over four years since its last update, roles found in Macs include:

  • System (S), for a bootable system,
  • Data (D), for mutable system components and mutable data,
  • Preboot (B), for boot loader ‘firmware’,
  • Recovery (R), for a Recovery system,
  • VM (V), for virtual memory swap space,
  • Update (E), whose purpose isn’t clear,
  • XART (X), for hardware security on Apple silicon,
  • Hardware (H), for firmware data in iOS, but also present on Apple silicon,
  • Backup (T), for Time Machine backup stores.

There are also some that may not be encountered on Macs:

  • Enterprise (Y), for enterprise-managed data in iOS,
  • Installer (I), for install logs etc.,
  • Sidecar (C), for Time Machine.

Finally, there are three that don’t currently have a documented character code:

  • User, for Home directories,
  • Prelogin, for system data used before login,
  • Baseband, for radio firmware in iOS.

Using volume roles

You can view, set and change volume roles in the diskutil command tool, using its apfs command set. Although not listed now by Disk Utility, the command
diskutil apfs list [containerReference]
displays role information about every volume in the container with the given reference. Omit that option and you’ll get information for containers on all mounted disks. Passing a container reference of disk9, for example, might reveal that volume disk9s1 has a Backup role, when it’s the current Time Machine backup store.

This is potentially useful information when you’re trying to understand some of the complex structures that can occur within containers. If you follow Apple’s advice when creating multiple boot volume groups, you’ll install two or more versions of macOS within the same container. If anything goes wrong with that, then it’s essential to be able to identify which are within each boot volume group, something that should be shown clearly by diskutil apfs list.

When adding a new APFS volume to an existing container using the addVolume command, you can pass an option -role to set its role using the single characters given in the lists above, such as T for a Time Machine backup store. If that option is omitted, then no role is assigned as a default.

You can change the role of an existing APFS volume using the changeVolumeRole (or chrole) verb
diskutil apfs chrole [volumeDevice] [role]
for example,
diskutil apfs chrole disk9s1 T
to set disk9s1 to a Time Machine backup role.

This enables you to investigate how volume roles work.

Investigating backup roles

There are enormous problems in trying to perform surgery on boot volume groups, as you’re unable to pair System and Data volumes with firmlinks, or set the volume group UUID in each volume’s superblock. But there are two roles that merit further investigation, Backup and Sidecar, both apparently for use with Time Machine backup stores. I have seen it suggested that older Time Machine backups are stored on volumes with a Backup role, while newer backups are on those with Sidecar roles. So I created two test volumes, both using case-sensitive APFS, as Time Machine likes.

The Finder displayed the Backup volume using its distinctive icon for Time Machine backup stores, and Time Machine appeared happy to add it as a store, although it would need to change the volume’s permissions to set the User to read-only access.

The Sidecar volume vanished from the Finder’s normal list of mounted volumes, although it remained accessible in /Volumes, which it was shown with the regular volume icon, not that for Time Machine backup stores. That’s very different behaviour from current Time Machine backup stores. There’s another problem with the explanation given for these two roles: older Time Machine backups are made to HFS+ not APFS volumes, which don’t have volume roles at all.

Apple’s other use for the name Sidecar refers to the use of an iPad as a secondary display for a Mac, and doesn’t involve APFS volumes at all. So I’m left wondering whether Sidecar volumes are the unused remains of an old now-abandoned backup project, or the promise of something in the future.

Key point

Discover and investigate APFS volume roles using the diskutil apfs list command, passing the reference to a container, e.g. disk9, if you wish to be more specific.

Planning complex Time Machine backups for efficiency

By: hoakley
28 October 2024 at 15:30

Time Machine (TM) has evolved to be a good general-purpose backup utility that makes best use of APFS backup storage. However, it does have some quirks, and offers limited controls, that can make it tricky to use with more complex setups. Over the last few weeks I’ve had several questions from those trying to use TM in more demanding circumstances. This article explains how you can design volume layout and backup exclusions for the most efficient backups in such cases.

How TM backs up

To decide how to solve these problems, it’s essential to understand how TM makes an automatic backup. In other articles here I have provided full details, so here I’ll outline the major steps and how they link to efficiency.

At the start of each automatic backup, TM checks to see if it’s rotating backups across more than one backup store. This is an unusual but potentially invaluable feature that can be used when you make backups in multiple locations, or want added redundancy with two or more backup stores.

Having selected the backup destination, it removes any local snapshots from the volumes to be backed up that were made more than 24 hours ago. It then creates a fresh snapshot on each of those volumes. I’ll consider these later.

Current versions of TM normally don’t use those local snapshots to work out what needs to be backed up from each volume, but (after the initial full backup) should rely on that volume’s record of changes to its file system, FSEvents. These observe two lists of exclusions: those fixed by TM and macOS, including the hidden version database on each volume and recognised temporary files, and those set by the user in TM settings. Among the latter should be any very large bundles and folders containing huge numbers of small files, such as the Xcode app, as they will back up exceedingly slowly even to fast local backup storage, and can tie up a network backup for many hours. It’s faster to reinstall Xcode rather than restore it from a backup.

Current TM backups are highly efficient, as TM can copy just the blocks that have changed; older versions of TM backing up to HFS+ could only copy whole files. However, that can be impaired by apps that rewrite the whole of each large file when saving. Because the backup is being made to APFS, TM ensures that any sparse files are preserved, and handles clone files as efficiently as possible.

Once the backup has been written, TM then maintains old backups, to retain:

  • hourly backups for the last 24 hours, to accompany hourly local snapshots,
  • daily backups over the previous month,
  • weekly backups stretching back to the start of the current backup series.

These are summarised in the diagram below.

tmseqoutline1

Local snapshots

TM makes two types of snapshot: on each volume it’s set to back up, it makes a local snapshot immediately before each backup, then deletes that after 24 hours; on the backup storage, it turns each backup into a snapshot from which you can restore backed up files, and those are retained as stated above.

APFS snapshots, including TM local snapshots, include the whole of a volume, without any exceptions or exclusions, which can have surprising effects. For example, a TM exclusion list might block backing up of large virtual machine files resulting in typical backups only requiring 1-2 GB of backup storage, but because those VMs change a lot, each local snapshot could require 25 GB or more of space on the volume being backed up. One way to assess this is to check through each volume’s TM exclusion list and assess whether items being excluded are likely to change much. If they are, then they should be moved to a separate volume that isn’t backed up by TM, thus won’t have hourly snapshots.

Some workflows and apps generate very large working files that you may not want to clutter up either TM backups or local snapshots. Many apps designed to work with such large files provide options to relocate the folders used to store static libraries and working files. If necessary, create a new volume that’s excluded completely from TM backups to ensure those libraries and working files aren’t included in snapshots or backups.

TM can’t run multiple backup configurations with different sets of exclusions, though. If you need to do that, for instance to make a single nightly backup of working files, then do so using a third-party utility in addition to your hourly TM backups.

This can make a huge difference to free space on volumes being backed up, as the size of each snapshot can be multiplied by 24 as TM will try to retain each hourly snapshot for the last 24 hours.

Macs that aren’t able to make backups every hour can also accrue large snapshots, as they may retain older snapshots, that will only grow larger over time as that volume changes from the time that snapshot was made.

While snapshots are a useful feature of TM, the user has no control over them, and can’t shorten their period of retention or turn them off altogether. Third-party backup utilities like Carbon Copy Cloner can, and may be more suitable when local snapshots can’t be managed more efficiently.

iCloud Drive

Like all backup utilities, TM can only back up files that are in iCloud Drive when they’re downloaded to local storage. Although some third-party utilities can work through your iCloud Drive files downloading them automatically as needed, TM can’t do that, and will only back up files that are downloaded at the time that it makes a backup.

There are two ways to ensure files stored in iCloud Drive will be backed up: either turn Optimise Mac Storage off (in Sonoma and later), or download the files you want backed up and ‘pin’ them to ensure they can’t be removed from local storage (in Sequoia). You can pin individual files or whole folders and their entire contents by selecting the item, Control-click for the contextual menu, and selecting the Keep Downloaded menu command.

Key points

  • Rotate through 2 or more backup stores to handle different locations, or for redundancy.
  • Back up APFS volumes to APFS backup storage.
  • Exclude all non-essential files, and bundles containing large numbers of small files, such as Xcode.
  • Watch for apps that make whole-file changes, thus increasing snapshot and backup size.
  • Store large files on volumes not being backed up to minimise local snapshot size.
  • If you need multiple backup settings, use a third-party utility in addition to TM.
  • To ensure iCloud Drive files are backed up, either turn off Optimise Mac Storage (Sonoma and later), or pin essential files (Sequoia).

Further reading

Time Machine in Sonoma: strengths and weaknesses
Time Machine in Sonoma: how to work around its weaknesses
Understand and check Time Machine backups to APFS
Excluding folders and files from Time Machine, Spotlight, and iCloud Drive

A brief history of FileVault

By: hoakley
19 October 2024 at 15:00

Encrypting all your data didn’t become a thing until well after the first release of Mac OS X. Even then, the system provided little support, and most of us who wanted to secure private data relied on third-party products like PGP (Pretty Good Privacy).

pgp2003

FileVault 1

Apple released the first version of FileVault, now normally referred to as FileVault 1 or Legacy FileVault, in Mac OS X 10.3 Panther in 2003. Initially, that only encrypted a user’s Home folder into a sparse disk image, then in 10.5 Leopard it started using sparse bundles instead. These caused problems with Time Machine backup when it too arrived in Leopard, and proved so easy to crack that in 2006 Jacob Appelbaum and Ralf-Philipp Weinmann released a tool, VileFault, to decrypt FileVault disk images.

filevault2004

FileVault 1 was controlled in the Security pane of System Preferences, shown here in 2004.

newuser2004

Each new user added in the Accounts pane could have their Home folder stored in an encrypted disk image. Encryption keys were based on the user’s password, with a master password set for all accounts on the same Mac.

FileVault 2

FileVault 2 was introduced in Mac OS X 10.7 Lion in 2011, and at last provided whole-volume encryption based on the user password. Encryption was performed using the XTS-AES mode of AES with a 256-bit key, by the CPU. At that time, more recent Intel processors had instructions to make this easier and quicker, but all data written to an encrypted volume had to be encrypted before it was written to disk, and all data read from it had to be decrypted before it could be used. This imposed significant overhead of around 3%, which was more noticeable on slower storage such as hard disks, and with slower Macs.

Apple didn’t implement this by modifying the HFS+ file system to add support for encryption, but by adding encryption support to CoreStorage, the logical volume manager. In theory this would have enabled it to encrypt other file systems, but I don’t think that was ever done.

Turning FileVault on and off was quite a pain, as the whole volume had to be encrypted or decrypted in the background, a process that could take many hours or even days. Most users tried to avoid doing this too often as a result so, while FileVault 2 was secure and effective, it wasn’t as widely used as it should have been.

These screenshots step through the process of enabling FileVault in 2017.

lockratsec

Control was in the FileVault tab in System Preferences.

filevault01

iCloud Recovery was added as an alternative to the original recovery key.

filevault02

Encryption began following a restart, and then proceeded in the background for however long it took. Shrewd users enabled FileVault when a minimum had been installed to the startup volume, to minimise time taken for encryption.

filevault03

With a minimal install, it was possible to complete initial encryption in less than an hour. With full systems, it could take days if you were unlucky.

Although FileVault has had a few security glitches, it has done its job well. Perhaps its greatest threat came in the early days of macOS Sierra, when Ulf Frisk developed a simple method for retrieving the FileVault password for any Mac with a Thunderbolt port. An attacker could connect a special Thunderbolt device to a sleeping or locked Mac, force a restart, then read the password off within 30 seconds. This exploited a vulnerability in the handling of DMA, and was addressed by enabling VT-d in EFI, in Sierra 10.12.2 and 10.12.4.

Hardware encryption

The next big leap forward came at the end of 2017, with the release of the first Macs with T2 chips, as intermediates on the road to Apple silicon. One of Apple’s goals in T2 and Apple Silicon chips was to make encrypted volumes the default. To achieve that, T2 and M-series chips incorporate secure enclaves and perform encryption and decryption in hardware, rather than using CPU cycles.

The Secure Enclave incorporate the storage controller for the internal SSD, so all data transferred between CPU and SSD passes through an encryption stage in the enclave. When FileVault is disabled, data on protected volumes is still encrypted using a volume encryption key (VEK), in turn protected by a hardware key and a xART key used to protect from replay attacks.

filevaultpasswords1

When FileVault is enabled, the same VEK is used, but it’s protected by a key encryption key (KEK), and the user password is required to unwrap that KEK, so protecting the VEK used to perform encryption/decryption. This means that the user can change their password without the volume having to be re-encrypted, and allows the use of special recovery keys in case the user password is lost or forgotten. Keys are only handled in the secure enclave.

Securely erasing an encrypted volume, also performed when ‘erasing all content and settings’, results in the secure enclave deleting its VEK and the xART key, rendering the residual volume data inaccessible even to the secure enclave itself. This ensures that there’s no need to delete or overwrite any residual data from an encrypted volume: once the volume’s encryption key has been deleted, its previous contents are immediately unrecoverable.

eacas

Coverage of boot volumes by encryption varies according to the version of macOS. Prior to macOS Catalina, where macOS has a single system volume, the whole of that is encrypted; in Catalina, both System and Data volumes are encrypted; in Big Sur and later, the Signed System Volume (SSV) isn’t encrypted, nor are Recovery volumes, but the Data volume is.

External disks

Hardware encryption and FileVault’s ingenious tricks aren’t available for external disks, but APFS was designed to incorporate software encryption from the outset. As with internal SSDs, the key used to encrypt the volume contents isn’t exposed, but accessed via a series of wrappers, enabling the use of recovery keys if the user password is lost or forgotten. This involves a KEK and VEK in a similar manner to FileVault on internal SSDs. As the file system on the volume is also encrypted, after the KEK and VEK have been unwrapped, the next task in accessing an encrypted volume is to decrypt the file system B-tree using the VEK.

Enabling FileVault has been streamlined in recent years, as shown here in System Settings last year, for an external SSD, thus not using hardware encryption.

filevault1

FileVault control has moved to Privacy & Security in System Settings.

filevault2

The choice of iCloud Recovery or a recovery key remains.

filevault3

Because only the Data volume is now encrypted, enabling FileVault before populating the Home folder allows encryption to be almost instantaneous, on an external disk.

Virtual machines

The most recent enhancement to FileVault protection extends support to Sequoia virtual machines running on Sequoia hosts. Apple hasn’t yet explained how that one works, although I suspect the word exclave is likely to appear in the answer.

If your Mac has a T2 or Apple silicon chip and you haven’t enabled FileVault, then you’re missing one of the Mac’s best features.

Who’s been accessing my storage? Reading a disk’s history

By: hoakley
17 October 2024 at 14:30

Have you ever wondered whether someone else has changed your Mac’s storage? Or which version of macOS formatted each of its volumes? As all good forensic investigators know, APFS keeps detailed records of the formatting and modification of each volume. This article explains how you can read and interpret them. As in the tale of Goldilocks and the Three Bears, you may be able to tell who has been eating your APFS porridge.

Information available

Each APFS volume stores details of its history in the volume superblock apfs_superblock_t. Those include information on how that volume was created in apfs_formatted_by, and up to the last 8 times the volume has been modified, in apfs_modified_by.

Although you’ll need a forensic disk analysis tool to get full details, some of that data is easy to access. Select a volume in the Finder, and Get Info will give a time and date that volume was last formatted.

Run First Aid in Disk Utility on that volume’s container, and there’s even more information given about each volume within the container, including those you can’t see. If you’d rather not run a full check and repair, then you should see the same information in Terminal by using
diskutil verifyVolume disk10
where disk10 is the device name for the container. If you prefer you can use fsck_apfs directly, but verifyVolume should use that command’s options most efficiently.

One lingering problem you may encounter in Disk Utility is that it still fails frequently because it can’t unmount volumes for checking. If you encounter that error when trying to run First Aid on a container, try manually unmounting each volume within that container. If all else fails, diskutil verifyVolume appears to be better at handling the problem.

Workthrough

diskfirstaid1

As shown above, when run on one of my external SSDs, information about two APFS volumes was returned, itself something of a surprise. The volume I expected gave
The volume ThunderBay3 was formatted by diskmanagementd (1412.81.1) and last modified by apfs_kext (2313.1.2).
and the surprise, which isn’t mounted, thus effectively hidden, gave
The volume Update was formatted by com.apple.Mobile (1677.50.1) and last modified by apfs_kext (1677.141.2).
The Finder’s Get Info dialog for ThunderBay3 gave a volume creation date of 11 February 2020, and last modification of 20 December 2020.

Taking the visible volume ThunderBay3 first, APFS says that it was formatted by its own formatting tool, diskmanagementd, in APFS version 1412.81.1, which came in macOS 10.15 Catalina (see the Appendix below). A look through details of versions released pins that down to 10.15.3, released on 28 January 2020, which tallies with the creation date from the Finder. Its last modification was performed by a general APFS function, in APFS version 2313.1.2, which is that current for macOS 15.0 and 15.0.1.

The hidden Update volume has had quite a different history, as it was created in APFS version 1677.50.1 in Big Sur, to be more precise in macOS 11.0.1 released on 12 November 2021. That wasn’t a conventional volume creation either, and was performed by com.apple.Mobile, part of the Big Sur installer. It was last modified using APFS version 1677.141.2, which came in macOS 11.6 on 13 September 2021. Since then it appears to have been left unmounted and unused.

The history of that container therefore reads:

  • ThunderBay3 created by the user on 11 February 2020 in macOS 10.15.3
  • Update created by a macOS installer after 12 November 2021 in macOS 11.0.1
  • Update last mounted after 13 September 2021 in macOS 11.6
  • ThunderBay3 currently in use.

Conclusions

The hidden Update volume contains a restore log apparently left behind after the 11.5.2 update, together with some empty folders. These demonstrate that it was a temporary volume created by Big Sur’s new macOS installer, but never cleaned up afterwards, and left abandoned for the last three years. As Big Sur was the first version of macOS to use Apple’s new installer that created a Signed System Volume, this is likely to be present on other external disks that were mounted when any version of Big Sur was installed. Although it takes little space, it’s a surprising omission that no subsequent installer has seen fit to clean this up by deleting the volume.

Otherwise, information about the visible and mounted volume appears consistent, and confirms what I recall of its history. No one has been eating this bear’s APFS porridge.

Appendix: APFS and macOS version details

APFS major version numbers change with major version of macOS:

  • APFS version 0.3 or 249.x.x in macOS 10.12
  • 748.x.x in 10.13
  • 945.x.x in 10.14
  • 1412.x.x in 10.15
  • 1677.x.x in macOS 11
  • 1933.x.x in 12.0-12.2.1
  • 1934.x.x 12.3 and later
  • 2142.x.x in 13
  • 2235.x.x in 14.0-14.3.1
  • 2236.x.x in 14.4 and later
  • 2313.x.x in 15.

Minor version numbers increment according to the minor version of macOS, and patch numbers wander without pattern. Those can be checked by looking at the changes given for each macOS update listed on this page.

Disk Images: How read-write disk images have gone sparse

By: hoakley
14 October 2024 at 14:30

Until about three years ago, most types of disk image had fixed size. When you created a read-write disk image (UDRW) of 10 GB, it occupied the same 10 GB on disk whether it was full or empty. The only two that could grow and shrink in size were sparse bundles and sparse disk images, with the former generally preferred for its better resizing and performance.

This changed silently in Monterey, since when read-write disk images have been automatically resized by macOS, and saved in APFS sparse file format. As this remains undocumented, this article explains how this works, and how and where you can use it to your advantage.

Sparse

In this context, the word sparse refers to two very different properties:

  • Sparse bundles and sparse (disk) images are types of disk image that can change in size, and can be stored on a wide range of file systems and storage, including on NAS and other networked storage.
  • Sparse files are stored in a highly efficient file format available in modern file systems, where only the data in a file is stored, and empty space within that file doesn’t waste storage space. It’s available in APFS, but not in HFS+.

Requirements

For read-write disk images to be sparse, the following are required:

  • The disk image must be saved to, and remain on, an APFS volume.
  • The file system within the disk image can be either APFS or HFS+, but not FAT or ExFAT.
  • The disk image must be created and unmounted first. For that initial mount, the disk image isn’t a sparse file, so occupies its full size on disk.
  • Whenever that disk image is mounted again, and has sufficient free space within its set limit, it will be saved in sparse file format, and occupy less than its full size on disk.

Demonstration

  1. Create a new read-write disk image of at least 1 GB size with an internal file system of APFS on an APFS volume, using DropDMG, Disk Utility, or an alternative.
  2. Once it has been created and mounted, unmount it in the Finder.
  3. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is the same as that set.
  4. (Optional) Open the disk image using Precize, and confirm that it’s not a sparse file.
  5. Double-click the disk image to mount it in the Finder, and wait at least 10 seconds before unmounting it.
  6. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is now significantly less than that set.
  7. (Optional) Open the disk image using Precize, and confirm that it has now become a sparse file.

How it works

When you create the disk image, macOS creates and attaches its container, and creates and mounts the file system within that. This is then saved to disk as a regular file occupying the full size of the disk image, plus the overhead incurred by the disk image container itself. No sparse files are involved at this stage.

When that disk image is mounted next, its container is attached through diskarbitrationd, then its file system is mounted. If that’s APFS (or HFS+), it undergoes Trimming, as with other mounts. That coalesces free storage blocks within the image to form one contiguous free space. The disk image is then saved in APFS sparse file format, skipping that contiguous free space. When the file system has been unmounted and the container detached, the space used to store the disk image has shrunk to the space actually used within the disk image, plus the container overhead. Unless the disk image is almost full, the amount of space required to store it on disk will be smaller than the full size of the disk image.

This is summarised in the diagram below.

SparseDiskImage1

The size of read-write disk images is therefore variable depending on the contents, the effectiveness of Trimming in coalescing free space, and the efficiency of APFS sparse file format.

Conversion to sparse file

When mounting an APFS file system in a read-write disk image, APFS tests whether the container backing store is a sparse format, or a flat file. In the case of a newly created read-write disk image that hasn’t yet been converted into a sparse file, that’s detected prior to Spaceman (the APFS Space Manager) scanning for free blocks within its file system. When free blocks are found, APFS sets the type of backing store to sparse, gathers the sparse bytes and ‘punches a hole’ in the disk image’s file extents to convert the container file into sparse format. That appears in the log as:
handle_apfs_set_backingstore:6172: disk5s1 Set backing store as sparse
handle_apfs_set_backingstore:6205: disk5 Backing storage is a raw file
_punch_hole_cb:37665: disk3s5 Accumulated 4294967296 sparse bytes for inode 30473932 in transaction 3246918, pausing hole punching

where disk5s1 is the disk image’s mounted volume, and disk3s5 is the volume in which the disk image container is stored.

Trimming for efficient use of space

That conversion to sparse format is normally only performed once, but from then on, each time that disk image is mounted it’s recognised as having a sparse backup store, and Spaceman performs a Trim to coalesce free blocks and optimise on-disk storage requirements. For an empty read-write disk image of 2,390,202 blocks of 4,096 bytes each, as created in a 10 GB disk image, log entries are:
spaceman_scan_free_blocks:4106: disk5 scan took 0.000722 s, trims took 0.000643 s
spaceman_scan_free_blocks:4110: disk5 2382929 blocks free in 7 extents, avg 340418.42
spaceman_scan_free_blocks:4119: disk5 2382929 blocks trimmed in 7 extents (91 us/trim, 10886 trims/s)
spaceman_scan_free_blocks:4122: disk5 trim distribution 1:0 2+:0 4+:0 16+:0 64+:0 256+:7

accounting for a total of 9.8 GB.

Changes made to the contents of the disk image lead to a gradual reduction in Trim yield. For example, after adding files to the disk image and deleting them, instead of yielding the full 9.8 GB, only 2,319,074 blocks remain free, yielding a total of 9.5 GB.

For comparison, initial Trimming on a matching empty sparse bundle yields the same 9.8 GB. After file copying and deletion, and compaction of the sparse bundle, Trimming performs slightly better, yielding 2,382,929 free blocks for a total of 9.6 GB. Note that Trimming of sparse bundles is performed by APFS Spaceman separately from management of bands in backing storage, which isn’t a function of the file system.

Size efficiency

Although read-write disk images stored as sparse files are efficient in their use of disk space, they’re still not as compact as sparse bundles. For an empty 10 GB image, the read-write type requires 240 MB on disk, but a sparse bundle only needs 13.9 MB. After light use storing files, then deleting the whole contents, a 10 GB read-write disk image grows to occupy 501 MB, but following compaction a sparse bundle only takes 150 MB. That difference may not remain consistent over more prolonged use, though, and ultimately compacting sparse bundles may cease freeing any space at all.

It’s also important to remember that sparse bundles need to be compacted periodically, if any of their contents are deleted, or they may not reduce in size after deletions. Read-write disk images can’t be compacted, and reclaim disk space automatically.

Benefits and penalties

Read-write disk images saved as sparse files are different from sparse bundles in many ways. Like any sparse file, the disk image still has the same nominal size as its full size, and differs in the space taken on disk. Sparse bundles should normally only have band files sufficient to accommodate their current size, so their nominal size remains similar to the space they take on disk. The result is that, while read-write disk images in sparse file format will help increase free disk space, their major benefit is in reducing ‘wear’ in SSDs by not wasting erase-write cycles storing empty data.

Unlike the band file structure in sparse bundles, which can be stored on almost any disk, APFS sparse files have to be treated carefully if they are to remain compact. Moving them to another file system or over a network is likely to result in their being exploded to full size, and I have explained those limitations recently.

Both read-write disk images and sparse bundles deliver good read performance, but write performance is significantly impaired in read-write disk images but not sparse files. Encryption of disk images and sparse bundles also has significant effects on their performance, and in some cases write performance is badly affected by encryption. I have previously documented their performance in macOS Monterey, and will be updating those figures for Sequoia shortly.

Summary

  • Read-write disk images saved on APFS storage in Monterey and later are no longer of fixed size, and should use significantly less disk space unless full, provided the disk image has been unmounted at least once since creation, and the file system in the disk image is APFS or HFS+.
  • This is because macOS now saves the disk image in sparse file format.
  • To retain the disk image in sparse file format, it needs to remain in APFS volumes, and normal precautions are required to maintain its efficient use of space. The disk image can’t be manually compacted, in the way that sparse bundles require to be.
  • The main benefit of this strategy is to minimise erase-write cycles, so reducing ‘wear’ on SSDs.
  • Sparse bundles are still more efficient in their use of disk space, and have higher write speeds, but read-write disk images are now closer in their efficiency and performance.

Previous articles

Introduction
Tools

APFS incompatibilities and how to live with them

By: hoakley
8 October 2024 at 14:30

APFS has many of the features of modern file systems that make most efficient use of space, and others that aren’t found in more traditional file systems like HFS+. While those should all work well when used locally with other APFS volumes, they can prove incompatible with other file systems, and may have untoward side effects. This article looks at those features that you could encounter problems with, and how best to work around them.

Sparse files

Most modern file systems store files that contain significant amounts of blank or absent data in a special sparse file format. In APFS, this is implemented by only allocating file extents to those blocks that do contain data. In many cases, this can save significant disk space. Initially, sparse files were unusual in APFS, but over time they have become increasingly common, and can now be found in some types of disk image, virtual machine storage, and in some databases. Note that disk image types whose name includes the word sparse, sparse bundles and sparse disk images, don’t use sparse file format at all, but are the victims of an unfortunate name collision.

Neither macOS nor APFS can simply convert regular files to sparse format; for a file to be written in sparse format, the code writing it must explicitly skip the empty space. As a result, sparse files are prone to explode to full size when they’re copied or moved unless that’s between two local volumes both using APFS. Examples of where you should expect them to explode include:

  • copy or move to HFS+, as that has no sparse file format
  • copy between Macs using AirDrop or file sharing
  • back up to network storage, although local Time Machine backups to APFS should preserve them
  • copy or move to any other local file system other than APFS, even if that file system has its own support for sparse files.

In each of those cases, sparse files explode in size as they’re copied from the source volume. If you have a 100 GB sparse file that only takes 20 GB of local disk space, when it’s copied over the network, or to a local HFS+ volume, the full 100 GB has to be transferred.

One potential workaround is to compress the sparse file before transferring it. All good compression algorithms will work efficiently on the blank space in the file, so when compressed its size could be as small as the original sparse file. However, when it’s decompressed, even on another APFS volume, it will explode to full size. For disk images, that can be corrected by mounting them, as APFS will then trim their contents, and the disk image should be saved back into sparse format.

Clone files

These are two distinct files that share common data, normally the result of duplicating the original within the same APFS volume. Those two cloned files then only require the storage of the whole file, plus those data blocks that differ between them. This only works within the same volume, and the moment that either of the clones is moved or copied to any other volume, it assumes full size, as it can no longer share data with the other clone.

However, most other file systems don’t support file cloning in this way. When you duplicate a file in an HFS+ volume, there’s no shared data between them, and the two require twice the amount of space as one does.

Snapshots

Snapshots consist of a complete copy of the volume at an instant in time, so require a copy of the file system metadata for that volume, and retain copies of storage blocks for each file as they are changed subsequent to the snapshot being made, so you could roll back that volume to its state at the moment of the snapshot.

Although Time Machine backups contain snapshots of the volume they back up, snapshots can’t normally be copied to another disk or volume. Some have been able to make a complete copy of a disk including its snapshots using the dd command tool, but that should be considered experimental. In all other circumstances, snapshots stay where they were made, but you can always copy from an existing snapshot to reconstitute a volume.

Directory hard links

These aren’t available in APFS, but are supported in HFS+, where they’re used extensively in Time Machine backups. They work like regular hard links, but act on directories rather than individual files. They can’t be copied in any way to an APFS volume, but can be used to reconstitute a volume.

Extended attributes

These are additional metadata associated with files and folders, and are fully supported in both APFS and HFS+. However, many are treated as being ephemeral, and may not be preserved during copying or other actions. The system of flags used to determine which are preserved is detailed in this article.

Several other file systems also support extended attributes, but copying between them is unlikely to transfer them between file systems. In some cases, extended attributes are preserved using AppleDouble format, in which each file can have a hidden shadow with its name prefixed with ._ (dot-underscore) characters. These are most often seen in FAT and ExFAT volumes, but are prone to confuse users of other computers.

Key points

  • APFS sparse files are only preserved when copying or moving files between local APFS volumes. In other circumstances they explode to full size.
  • Good compression methods can keep a sparse file to a similar size, but decompression explodes them to full size. Disk images may then be restored as sparse files after they have been mounted again from APFS.
  • Clone files aren’t preserved when copied or moved to any other volume.
  • Snapshots can’t be copied at all, although they can be reconstituted as volumes.
  • APFS doesn’t support the directory hard links used in Time Machine backups to HFS+, but a backup can be reconstituted as a volume.
  • Extended attributes are preserved between APFS and HFS+, but not with other file systems, except in the shadow files of AppleDouble format, seen in FAT and ExFAT volumes.

Disk Images: Introduction

By: hoakley
7 October 2024 at 14:30

A disk image is a file or a bundle containing what could otherwise be the contents of a disk. It’s a common way to store and move complete file systems in a neat package, for items that need to be separated from the physical storage provided by a drive. macOS uses disk images for some tasks of great importance, including:

  • Recovery and Hardware Diagnostics systems,
  • additions to macOS such as Safari, its supporting frameworks, and dyld caches, in cryptexes,
  • networked storage for Time Machine backups, in sparse bundles,
  • lightweight virtual machines on Apple silicon Macs.

You could use them to store encrypted data on unencrypted volumes, and they’re often used for delivering Apple and third-party software.

Disk images are poorly documented for both users and developers, and have changed significantly over the last few years. Articles in this series explain how to choose between different types of disk image, how to create and use them, and what to do when they go wrong.

Containers and file systems

Disk images consist of two distinct components: the file or bundle itself functioning as a container, and the file system contained inside it. When referred to in this context, disk image containers are completely unrelated to the sandbox containers found in ~/Library/Containers.

This distinction is important in several respects, although it isn’t apparent when you use disk images in the Finder. Preparing a disk image for access involves two separate functions: attaching its container, and mounting any file systems found inside it. When that’s performed by the Finder, perhaps by double-clicking the disk image, those two actions appear fused into one. Similarly, removing the disk image requires all its mounted file systems to be unmounted first, then the container is detached.

One feature that’s widely confused is the encrypted disk image. This involves encryption of the whole container, rather than using an encrypted file system within it. Now that Disk Utility no longer supports the creation of encrypted HFS+ volumes, one remaining alternative is to use an encrypted disk image containing an HFS+ volume.

If you want an analogy for disk images, attaching the container is like connecting an external disk, and once that has been performed, the file systems contained by that disk have to be mounted before you can access their contents.

Types

There are many different types of disk image in use, of which the two this series is most concerned with are plain UDIF read-write disk images (UDRW), and sparse bundles (UDSB). Others you may encounter include:

  • plain UDIF read-only (UDRO),
  • various compressed versions of UDRW,
  • CD/DVD master for export (UDTO),
  • sparse disk image (UDSP), a single file rather than a bundle.

Those specify the container format; within each, there’s the usual choice of file systems, although throughout these articles it will normally be assumed that APFS will be used unless otherwise specified.

The word sparse in sparse bundle and sparse disk image doesn’t refer to APFS sparse files, but to the fact that those types of disk image can grow and diminish in size, and normally try to occupy the minimum amount of disk space. This is an unfortunate name collision.

Structure

With the exception of sparse bundles, all disk images are contained within a single file of opaque structure.

Sparse bundles consist of a single bundle folder containing:

  • bands, a folder containing the contents of the disk image in band files
  • info.plist and its backup copy info.bckup, containing settings including band size
  • lock, an empty file
  • mapped, a folder containing small data files to match all of the band files except the first
  • token, an empty file.

Container size

Until this changed in Monterey (or thereabouts), non-sparse disk images had fixed container sizes. Create a UDIF read-write disk image (UDRW) of 10 GB, and the space occupied by it on disk was approximately 10 GB, whether it was empty or full. Although it remains undocumented, when stored on APFS volumes, UDRW disks can now take advantage of APFS sparse file format, and will normally only occupy the disk space required for the contents of their file system.

This is only true once the disk image has been mounted for the first time after it has been created, mounted and unmounted. To see this, create a test read-write disk image (for example, using Disk Utility) of 10 GB size. Then unmount it, and use the Finder’s Get Info command to inspect its size on disk. That will be 10 GB. Then mount the disk image again, pause a couple of seconds, unmount it, and Get Info will show its size on disk is now much smaller.

As I’ll explain in detail in a later article, this is because each time that disk image is mounted, if its internal file system is HFS+ or APFS, its contents will be trimmed, and saved to disk in sparse file format, which omits all its empty space. This only applies to read-write disk images when they’re stored in APFS file systems; copy them to HFS+ and they’ll explode to full size, as HFS+ doesn’t support the sparse file format.

Considering just the two leading types, empty sizes for a 100 GB disk image are:

  • a sparse bundle is 35.4 MB empty, 53.3 GB when containing about 51 GB files, stored across 6,359 band files.
  • a read-write disk image (UDRW) shrinks to 333.8 MB once stored as an empty sparse file, 53.6 GB when containing about 51 GB files, in a single file container.

Performance

Some types of disk image perform poorly, and can be very slow to write to. Recent versions of macOS have brought improvements, although some options such as encryption can still impair performance significantly. For the two leading types, when their container is stored on the internal SSD of an Apple silicon Mac, with native read and write speeds of around 6-7 GB/s:

  • an initially empty unencrypted sparse bundle reads at 5.1 GB/s, and writes at 4.8 GB/s.
  • an initially empty unencrypted read-write disk image (UDRW) reads at 5.3 GB/s, and writes at only 1 GB/s.

Tests were performed using my utility Stibium, across a range of 160 files of 2 MB to 2 GB size in randomised order, with macOS 15.0.1.

Key points

  • Disk images consist of a file or bundle containing one or more file systems; the container and its contents are distinct.
  • To access the contents of a disk image, the container is first attached, then the file system(s) within it are mounted. In the Finder, those two processes appear as a single action.
  • Encrypted disk images encrypt the container, and don’t necessarily contain encrypted file systems.
  • Disk images come in many different types, and can contain different file systems.
  • Sparse bundles have a file and folder structure inside their bundle folder, with their data saved in band files; all other disk images are single files.
  • Sparse bundles grow and shrink according to the size of files stored within them.
  • In recent macOS, and on APFS, read-write disk images (UDRW) are stored in APFS sparse file format, enabling them to grow and shrink as well.
  • In recent macOS, unencrypted sparse bundles have read and write performance close to that of the disk they’re stored on. Read-write disk images read at similar speeds, but write more slowly, at about 20% of read speed.

❌
❌