Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Disk Images: How read-write disk images have gone sparse

By: hoakley
14 October 2024 at 14:30

Until about three years ago, most types of disk image had fixed size. When you created a read-write disk image (UDRW) of 10 GB, it occupied the same 10 GB on disk whether it was full or empty. The only two that could grow and shrink in size were sparse bundles and sparse disk images, with the former generally preferred for its better resizing and performance.

This changed silently in Monterey, since when read-write disk images have been automatically resized by macOS, and saved in APFS sparse file format. As this remains undocumented, this article explains how this works, and how and where you can use it to your advantage.

Sparse

In this context, the word sparse refers to two very different properties:

  • Sparse bundles and sparse (disk) images are types of disk image that can change in size, and can be stored on a wide range of file systems and storage, including on NAS and other networked storage.
  • Sparse files are stored in a highly efficient file format available in modern file systems, where only the data in a file is stored, and empty space within that file doesn’t waste storage space. It’s available in APFS, but not in HFS+.

Requirements

For read-write disk images to be sparse, the following are required:

  • The disk image must be saved to, and remain on, an APFS volume.
  • The file system within the disk image can be either APFS or HFS+, but not FAT or ExFAT.
  • The disk image must be created and unmounted first. For that initial mount, the disk image isn’t a sparse file, so occupies its full size on disk.
  • Whenever that disk image is mounted again, and has sufficient free space within its set limit, it will be saved in sparse file format, and occupy less than its full size on disk.

Demonstration

  1. Create a new read-write disk image of at least 1 GB size with an internal file system of APFS on an APFS volume, using DropDMG, Disk Utility, or an alternative.
  2. Once it has been created and mounted, unmount it in the Finder.
  3. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is the same as that set.
  4. (Optional) Open the disk image using Precize, and confirm that it’s not a sparse file.
  5. Double-click the disk image to mount it in the Finder, and wait at least 10 seconds before unmounting it.
  6. Select the disk image in the Finder and open the Get Info dialog for it. Confirm that its size on disk is now significantly less than that set.
  7. (Optional) Open the disk image using Precize, and confirm that it has now become a sparse file.

How it works

When you create the disk image, macOS creates and attaches its container, and creates and mounts the file system within that. This is then saved to disk as a regular file occupying the full size of the disk image, plus the overhead incurred by the disk image container itself. No sparse files are involved at this stage.

When that disk image is mounted next, its container is attached through diskarbitrationd, then its file system is mounted. If that’s APFS (or HFS+), it undergoes Trimming, as with other mounts. That coalesces free storage blocks within the image to form one contiguous free space. The disk image is then saved in APFS sparse file format, skipping that contiguous free space. When the file system has been unmounted and the container detached, the space used to store the disk image has shrunk to the space actually used within the disk image, plus the container overhead. Unless the disk image is almost full, the amount of space required to store it on disk will be smaller than the full size of the disk image.

This is summarised in the diagram below.

SparseDiskImage1

The size of read-write disk images is therefore variable depending on the contents, the effectiveness of Trimming in coalescing free space, and the efficiency of APFS sparse file format.

Conversion to sparse file

When mounting an APFS file system in a read-write disk image, APFS tests whether the container backing store is a sparse format, or a flat file. In the case of a newly created read-write disk image that hasn’t yet been converted into a sparse file, that’s detected prior to Spaceman (the APFS Space Manager) scanning for free blocks within its file system. When free blocks are found, APFS sets the type of backing store to sparse, gathers the sparse bytes and ‘punches a hole’ in the disk image’s file extents to convert the container file into sparse format. That appears in the log as:
handle_apfs_set_backingstore:6172: disk5s1 Set backing store as sparse
handle_apfs_set_backingstore:6205: disk5 Backing storage is a raw file
_punch_hole_cb:37665: disk3s5 Accumulated 4294967296 sparse bytes for inode 30473932 in transaction 3246918, pausing hole punching

where disk5s1 is the disk image’s mounted volume, and disk3s5 is the volume in which the disk image container is stored.

Trimming for efficient use of space

That conversion to sparse format is normally only performed once, but from then on, each time that disk image is mounted it’s recognised as having a sparse backup store, and Spaceman performs a Trim to coalesce free blocks and optimise on-disk storage requirements. For an empty read-write disk image of 2,390,202 blocks of 4,096 bytes each, as created in a 10 GB disk image, log entries are:
spaceman_scan_free_blocks:4106: disk5 scan took 0.000722 s, trims took 0.000643 s
spaceman_scan_free_blocks:4110: disk5 2382929 blocks free in 7 extents, avg 340418.42
spaceman_scan_free_blocks:4119: disk5 2382929 blocks trimmed in 7 extents (91 us/trim, 10886 trims/s)
spaceman_scan_free_blocks:4122: disk5 trim distribution 1:0 2+:0 4+:0 16+:0 64+:0 256+:7

accounting for a total of 9.8 GB.

Changes made to the contents of the disk image lead to a gradual reduction in Trim yield. For example, after adding files to the disk image and deleting them, instead of yielding the full 9.8 GB, only 2,319,074 blocks remain free, yielding a total of 9.5 GB.

For comparison, initial Trimming on a matching empty sparse bundle yields the same 9.8 GB. After file copying and deletion, and compaction of the sparse bundle, Trimming performs slightly better, yielding 2,382,929 free blocks for a total of 9.6 GB. Note that Trimming of sparse bundles is performed by APFS Spaceman separately from management of bands in backing storage, which isn’t a function of the file system.

Size efficiency

Although read-write disk images stored as sparse files are efficient in their use of disk space, they’re still not as compact as sparse bundles. For an empty 10 GB image, the read-write type requires 240 MB on disk, but a sparse bundle only needs 13.9 MB. After light use storing files, then deleting the whole contents, a 10 GB read-write disk image grows to occupy 501 MB, but following compaction a sparse bundle only takes 150 MB. That difference may not remain consistent over more prolonged use, though, and ultimately compacting sparse bundles may cease freeing any space at all.

It’s also important to remember that sparse bundles need to be compacted periodically, if any of their contents are deleted, or they may not reduce in size after deletions. Read-write disk images can’t be compacted, and reclaim disk space automatically.

Benefits and penalties

Read-write disk images saved as sparse files are different from sparse bundles in many ways. Like any sparse file, the disk image still has the same nominal size as its full size, and differs in the space taken on disk. Sparse bundles should normally only have band files sufficient to accommodate their current size, so their nominal size remains similar to the space they take on disk. The result is that, while read-write disk images in sparse file format will help increase free disk space, their major benefit is in reducing ‘wear’ in SSDs by not wasting erase-write cycles storing empty data.

Unlike the band file structure in sparse bundles, which can be stored on almost any disk, APFS sparse files have to be treated carefully if they are to remain compact. Moving them to another file system or over a network is likely to result in their being exploded to full size, and I have explained those limitations recently.

Both read-write disk images and sparse bundles deliver good read performance, but write performance is significantly impaired in read-write disk images but not sparse files. Encryption of disk images and sparse bundles also has significant effects on their performance, and in some cases write performance is badly affected by encryption. I have previously documented their performance in macOS Monterey, and will be updating those figures for Sequoia shortly.

Summary

  • Read-write disk images saved on APFS storage in Monterey and later are no longer of fixed size, and should use significantly less disk space unless full, provided the disk image has been unmounted at least once since creation, and the file system in the disk image is APFS or HFS+.
  • This is because macOS now saves the disk image in sparse file format.
  • To retain the disk image in sparse file format, it needs to remain in APFS volumes, and normal precautions are required to maintain its efficient use of space. The disk image can’t be manually compacted, in the way that sparse bundles require to be.
  • The main benefit of this strategy is to minimise erase-write cycles, so reducing ‘wear’ on SSDs.
  • Sparse bundles are still more efficient in their use of disk space, and have higher write speeds, but read-write disk images are now closer in their efficiency and performance.

Previous articles

Introduction
Tools

APFS incompatibilities and how to live with them

By: hoakley
8 October 2024 at 14:30

APFS has many of the features of modern file systems that make most efficient use of space, and others that aren’t found in more traditional file systems like HFS+. While those should all work well when used locally with other APFS volumes, they can prove incompatible with other file systems, and may have untoward side effects. This article looks at those features that you could encounter problems with, and how best to work around them.

Sparse files

Most modern file systems store files that contain significant amounts of blank or absent data in a special sparse file format. In APFS, this is implemented by only allocating file extents to those blocks that do contain data. In many cases, this can save significant disk space. Initially, sparse files were unusual in APFS, but over time they have become increasingly common, and can now be found in some types of disk image, virtual machine storage, and in some databases. Note that disk image types whose name includes the word sparse, sparse bundles and sparse disk images, don’t use sparse file format at all, but are the victims of an unfortunate name collision.

Neither macOS nor APFS can simply convert regular files to sparse format; for a file to be written in sparse format, the code writing it must explicitly skip the empty space. As a result, sparse files are prone to explode to full size when they’re copied or moved unless that’s between two local volumes both using APFS. Examples of where you should expect them to explode include:

  • copy or move to HFS+, as that has no sparse file format
  • copy between Macs using AirDrop or file sharing
  • back up to network storage, although local Time Machine backups to APFS should preserve them
  • copy or move to any other local file system other than APFS, even if that file system has its own support for sparse files.

In each of those cases, sparse files explode in size as they’re copied from the source volume. If you have a 100 GB sparse file that only takes 20 GB of local disk space, when it’s copied over the network, or to a local HFS+ volume, the full 100 GB has to be transferred.

One potential workaround is to compress the sparse file before transferring it. All good compression algorithms will work efficiently on the blank space in the file, so when compressed its size could be as small as the original sparse file. However, when it’s decompressed, even on another APFS volume, it will explode to full size. For disk images, that can be corrected by mounting them, as APFS will then trim their contents, and the disk image should be saved back into sparse format.

Clone files

These are two distinct files that share common data, normally the result of duplicating the original within the same APFS volume. Those two cloned files then only require the storage of the whole file, plus those data blocks that differ between them. This only works within the same volume, and the moment that either of the clones is moved or copied to any other volume, it assumes full size, as it can no longer share data with the other clone.

However, most other file systems don’t support file cloning in this way. When you duplicate a file in an HFS+ volume, there’s no shared data between them, and the two require twice the amount of space as one does.

Snapshots

Snapshots consist of a complete copy of the volume at an instant in time, so require a copy of the file system metadata for that volume, and retain copies of storage blocks for each file as they are changed subsequent to the snapshot being made, so you could roll back that volume to its state at the moment of the snapshot.

Although Time Machine backups contain snapshots of the volume they back up, snapshots can’t normally be copied to another disk or volume. Some have been able to make a complete copy of a disk including its snapshots using the dd command tool, but that should be considered experimental. In all other circumstances, snapshots stay where they were made, but you can always copy from an existing snapshot to reconstitute a volume.

Directory hard links

These aren’t available in APFS, but are supported in HFS+, where they’re used extensively in Time Machine backups. They work like regular hard links, but act on directories rather than individual files. They can’t be copied in any way to an APFS volume, but can be used to reconstitute a volume.

Extended attributes

These are additional metadata associated with files and folders, and are fully supported in both APFS and HFS+. However, many are treated as being ephemeral, and may not be preserved during copying or other actions. The system of flags used to determine which are preserved is detailed in this article.

Several other file systems also support extended attributes, but copying between them is unlikely to transfer them between file systems. In some cases, extended attributes are preserved using AppleDouble format, in which each file can have a hidden shadow with its name prefixed with ._ (dot-underscore) characters. These are most often seen in FAT and ExFAT volumes, but are prone to confuse users of other computers.

Key points

  • APFS sparse files are only preserved when copying or moving files between local APFS volumes. In other circumstances they explode to full size.
  • Good compression methods can keep a sparse file to a similar size, but decompression explodes them to full size. Disk images may then be restored as sparse files after they have been mounted again from APFS.
  • Clone files aren’t preserved when copied or moved to any other volume.
  • Snapshots can’t be copied at all, although they can be reconstituted as volumes.
  • APFS doesn’t support the directory hard links used in Time Machine backups to HFS+, but a backup can be reconstituted as a volume.
  • Extended attributes are preserved between APFS and HFS+, but not with other file systems, except in the shadow files of AppleDouble format, seen in FAT and ExFAT volumes.

❌
❌