Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

How disk images can become sparse files

By: hoakley
24 April 2025 at 14:30

It’s no miracle that a 10 GB disk image can be shrunk down to a few MB of sparse file. This article explains how APFS works that magic, first on a normal read-write disk image, then in the disk image inside a Virtual Machine.

What is a sparse file?

In essence a sparse file is any file whose allocated storage is smaller than the nominal size of that file. In APFS, this imposes two requirements:

  • the INODE_IS_SPARSE flag is set for that file’s inode,
  • the sparse byte count is given in its extended field.

As a result, the total size of storage allocated to that file’s data in its file extents is smaller than the total required to store the file at its nominal size. This is because the file contains empty data that isn’t stored on disk, saving disk space. This becomes clearer when we consider how this works with regular read-write disk images.

How a disk image becomes sparse

To demonstrate how this works, create a read-write disk image, which APFS will then turn into a sparse file. For the sake of simplicity, I’ll ignore all overheads such as the file system in that disk image.

APFS uses 4 KB storage blocks on SSDs. Creating a 4 GB disk image using DropDMG or Disk Utility therefore uses one million blocks. For this example I number those starting from 0000 0001 in hexadecimal, rising to 000F 4240 at the end of that disk image file, a million blocks later.

Once that has been created, copy a 4 MB file into the disk then unmount it, and mount it again. When it’s mounted that second time, APFS Trims it, and marks all its storage blocks apart from those 4 MB as being unused. That leaves my file occupying blocks 0000 0001 to 0000 03E8, and 0000 03E9 to 000F 4240 unallocated. APFS therefore sets the disk image file’s INODE_IS_SPARSE flag to TRUE and writes the sparse byte count to its extended field: the disk image is now a sparse file.

Creating a VM disk image

Unlike that read-write disk image, a disk image used for a Virtual Machine (on Apple silicon, at least) is created a sparse file in the first instance, using code like
let diskFd = open(diskImagePath, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)
var result = ftruncate(diskFd, sizeDisk)
result = close(diskFd)

(error handling omitted) where sizeDisk is the size in bytes. Similar can be achieved in Terminal using the command
dd if=/dev/zero of=Disk.img bs=1m count=0 seek=10240
where the number given for seek is the size in blocks.

Maintaining the VM disk image

A read-write disk image is mounted and Trimmed by APFS on the host Mac. That used for a VM is different, as it’s the guest OS that has the task of Trimming the disk image from inside, and that works just the same as when macOS is booted on a Mac.

Read the entries made in your Mac’s log by APFS during startup. Those appear early with the start of APFS, when the version is given:
33.263 apfs_module_start:3403: load: com.apple.filesystems.apfs, v2332.101.1, apfs-2332.101.1, 2025/04/11

A little later, APFS Space Manager (Spaceman) Trims the first partition/container with log entries like:
34.012 spaceman_scan_free_blocks:4106: disk1 scan took 0.002064 s, trims took 0.000443 s
34.012 spaceman_scan_free_blocks:4110: disk1 101104 blocks free in 47 extents, avg 2151.14
34.012 spaceman_scan_free_blocks:4119: disk1 101104 blocks trimmed in 47 extents (9 us/trim, 106094 trims/s)
34.012 spaceman_scan_free_blocks:4122: disk1 trim distribution 1:14 2+:18 4+:8 16+:2 64+:1 256+:4

A couple of seconds later it trims a second partition:
36.391 spaceman_scan_free_blocks:4106: disk3 scan took 1.749635 s, trims took 1.147491 s
36.391 spaceman_scan_free_blocks:4110: disk3 351308484 blocks free in 319729 extents, avg 1098.76
36.391 spaceman_scan_free_blocks:4119: disk3 351308484 blocks trimmed in 319729 extents (3 us/trim, 278633 trims/s)
36.391 spaceman_scan_free_blocks:4122: disk3 trim distribution 1:118376 2+:48105 4+:82602 16+:42698 64+:24673 256+:3275
36.391 spaceman_scan_free_blocks:4130: disk3 trims dropped: 10469 blocks 10469 extents, avg 1.00

The following matching entries are taken from a macOS VM as it boots on an Apple silicon Mac:
02.557 apfs_module_start:3403: load: com.apple.filesystems.apfs, v2332.101.1, apfs-2332.101.1, 2025/04/11

03.278 spaceman_scan_free_blocks:4106: disk2 scan took 0.001036 s, trims took 0.000770 s
03.278 spaceman_scan_free_blocks:4110: disk2 126731 blocks free in 15 extents, avg 8448.73
03.278 spaceman_scan_free_blocks:4119: disk2 126731 blocks trimmed in 15 extents (51 us/trim, 19480 trims/s)
03.278 spaceman_scan_free_blocks:4122: disk2 trim distribution 1:4 2+:4 4+:5 16+:0 64+:0 256+:2

03.570 spaceman_scan_free_blocks:4106: disk4 scan took 0.295283 s, trims took 0.285527 s
03.570 spaceman_scan_free_blocks:4110: disk4 19188027 blocks free in 9939 extents, avg 1930.57
03.570 spaceman_scan_free_blocks:4119: disk4 19188027 blocks trimmed in 9939 extents (28 us/trim, 34809 trims/s)
03.570 spaceman_scan_free_blocks:4122: disk4 trim distribution 1:6010 2+:1072 4+:1775 16+:700 64+:244 256+:138
03.570 spaceman_scan_free_blocks:4130: disk4 trims dropped: 4252 blocks 4252 extents, avg 1.00

Just as the Trims performed on the host free up unused blocks of storage on the boot disk, so those in the VM do the same for the VM disk image. To demonstrate how those maintain the VM disk image in sparse format, I wrote two files inside the VM when it was running. One was a plain 10 GB file taking 10 GB on disk, the other a 10 GB sparse file taking a few MB. I then closed the VM and measured its size on disk, opened it again, deleted those two files and closed it again. During this I also took screenshots to verify changes recorded by Disk Utility in the free space inside the VM.

Before writing the two test files, the VM’s disk image size on the host was 107 GB, and it took 23.98 GB on disk as a sparse file. When it contained the two test files, its size remained the same, and it took 34.01 GB on disk. After deleting the files inside the VM, the disk image’s size remained the same, but it only took 24.01 GB on disk, and internally the VM reported that it had returned to 78.9 GB of storage available, the same as it had started with.

As expected, when the VM Trimmed it freed up storage space no longer used by the deleted files, as a result of which the VM disk image required less space on disk.

How the magic works

  • Read-write disk images are created as normal files. They’re Trimmed by APFS on each subsequent mount, and may then become sparse files when there’s sufficient unused space in them.
  • VM disk images are created as sparse files. They’re Trimmed by APFS in the VM during each boot and on demand, maintaining their sparse format when they have sufficient unused space.

How disk images and VMs are more efficient

By: hoakley
9 April 2025 at 14:38

For many years, most types of disk image were inefficient in their use of storage space, as they occupied their full size on disk. Until recently, when you created a 5 GB read-write UDIF disk image, one of the most popular, it invariably took up 5 GB in storage, even when empty. This also applied to the raw disk images used by Virtual Machines: give a VM 100 GB, and that’s just what it took on disk. With the introduction of sparse files in APFS, this has changed, and many disk images now only take the space they need. I’m not sure exactly when this change occurred, as Apple still doesn’t appear to have documented it, but it seems to have changed with macOS Monterey.

This is easiest to see with a plain read-write disk image, created using DropDMG or Disk Utility.

Disk image

Here’s one I made earlier, a whole 350 GB in size. When it’s created, it’s automatically attached and mounted at full size. For the sake of example, I then copied a large IPSW to it, so it wasn’t entirely empty.

Unmount it and Get Info on the disk image and you’ll see it does still take up a full 350 GB on disk. Mount it again, though, and APFS works its magic. You can see this in LogUI, or the custom log extract provided by Mints.

When unmounted again it has shrunk down to take little more than the size of the IPSW file in it, at just over 17 GB. That’s less than 5% of its nominal size, without using any compression.

It’s worth looking through entries in the log made by APFS for the mount process. First, APFS checks whether the data store for the disk image is already sparse:
01.470 container_backingstore_is_sparse:1652: Image url file:///Volumes/LaCie2tb/350gbudif.dmg Image path /Volumes/LaCie2tb/350gbudif.dmg
01.470 container_backingstore_is_sparse:1659: Image /Volumes/LaCie2tb/350gbudif.dmg is a flat file, do not consider as sparse

It then sets it to sparse, ready for sparsification:
01.475 handle_apfs_set_backingstore:6207: disk9s1 Set backing store as sparse
01.475 handle_apfs_set_backingstore:6240: disk9 Backing storage is a raw file

Space Manager performs an initial scan for free blocks without any Trimming:
01.479 spaceman_scan_free_blocks:4136: disk9 scan took 0.004272 s (no trims)
01.479 spaceman_fxc_print_stats:477: disk9 dev 0 smfree 81258479/85398014 table 4/452 blocks 81258479 32766:20314619:79974226 100.00% range 35869:85362145 99.95% scans 1

Space Manager then scans and Trims free storage blocks, taking just over 0.7 second to complete:
02.196 spaceman_scan_free_blocks:4106: disk9 scan took 0.717433 s, trims took 0.715705 s
02.196 spaceman_scan_free_blocks:4110: disk9 81258479 blocks free in 25 extents, avg 3250339.16
02.196 spaceman_scan_free_blocks:4119: disk9 81258479 blocks trimmed in 25 extents (28628 us/trim, 34 trims/s)
02.196 spaceman_scan_free_blocks:4122: disk9 trim distribution 1:0 2+:0 4+:0 16+:0 64+:0 256+:25

VM

What happens with an Apple silicon VM is a bit more complicated, and harder to observe. This time the virtualisation app should create the disk image inside the VM bundle as a sparse file to begin with, then copy into that what’s needed for the VM, so skipping the first mount stage and Trimming during the second mount.

The result is the same, though, with a 350 GB VM taking just 22 GB on disk. Inspect that disk image using my free utility Precize, and you’ll see that economy confirmed, and the Sparse File flag set.

Conclusions

For plain read-write disk images and those inside VMs to be sparse files:

  • they must contain a suitable raw disk image, such as UDIF read-write;
  • the host file system must be APFS, as HFS+ doesn’t support sparse files;
  • for normal disk images, they must be stored on an SSD that supports Trimming;
  • there must be sufficient free space in the disk image;
  • the guest file system can be APFS, either plain or encrypted, or HFS+J;
  • for normal disk images, they must have been mounted at least once since first being created.

A brief history of disk images on the Mac

By: hoakley
5 April 2025 at 15:00

Disk images, files that contain the contents of a physical storage medium, go back long before the first Mac. Among other tasks, they were originally used to contain representations of floppy disks for replication in manufacture.

Today disk images are at the heart of macOS, and widely used by third-parties. They’re an essential part of macOS installers, home to Recovery mode, and the basis for cryptexes. They’ve been used to burn and replicate optical disks, to archive disk contents, extensively for network backups, and for the distribution of software.

Classic Mac OS

In Classic Mac OS there were two utilities that worked with different formats: Disk Copy used replicas later in DC42 format, after Disk Copy version 4.2, while compressed formats known as DART were handled by the Disk Archive/Retrieval Tool, hence their name.

Mac OS 9 brought Disk Copy 6.0 with added support for the New Disk Image Format (NDIF), which supported resource forks, and ended with its last release version 6.3.3. This also supported read-only Rdxx formats.

By this time, variants of formats had become complex. Here, Disk Copy is configured to create a read-only compressed .img file containing the contents of a standard 1.4 MB floppy disk. In the upper window, it has completed validating the checksum on a self-mounting .smi disk image that’s part of a DiskSet. These could also be signed, using certificates issued not by Apple but by DigiSign.

Here’s Disk Copy saving an image of a hard disk using a similar read-only compressed format, this time to accommodate 1.5 GB.

Mac OS X

The release of Mac OS X 10.1 Puma in 2001 brought Apple’s new Universal Disk Image Format (UDIF), used in DMG disk images, which only had a single fork as its resource fork was embedded in the data fork. Although pre-release versions of Disk Copy 6.4 and 6.5 were available with UDIF support for Mac OS 9, neither was ever released, leaving Classic Mac OS without access to UDIF images. Its support for compression options in Apple Data Compression (ADC) unified the two disk image types, and extended support for images larger than a floppy disk. This new format enabled disk images to represent whole storage devices, complete with a partition map and disk-based drivers.

Tools provided in Mac OS X for working with disk images include Disk Utility and the command tool hdiutil.

On 21 January 2002, the first version of DropDMG, a third-party substitute for creating disk images, was released by C-Command Software. This quickly enabled developers to create disk images with artwork, licences and other features that weren’t accessible from the tools bundled in Mac OS X. DropDMG has flourished over the last 23 years, and remains popular today.

dmgdropdmg

DropDMG’s options for creating a new disk image far exceed those in Disk Utility. Particularly helpful are the compatible version hints shown on various options, to remind you of which file systems are available in different macOS versions, and which types of disk image container are supported. DropDMG will even convert old NDIF disk images last used in Mac OS 9 to more modern formats. It will also change the password of an encrypted disk image from a menu command.

In Mac OS X 10.2 (2002), UDIF and most other supported formats were served from a kernel extension without requiring a helper process. The following year, 10.3 Panther started using a faceless utility DiskImageMounter to mount disk images. Apple then dropped support for embedded resource forks in disk images in Mac OS X 10.4.7, and newly created disk images became less compatible with older Mac OS versions.

Sparse bundles

Until Mac OS X 10.5 Leopard in 2007, all disk images had used single-file formats, although some could be segmented across file sets. Leopard introduced the sparse bundle with its folder of smaller band files containing data. These enabled the image to grow and shrink in size, and became popular means of storing mountable Mac file systems on servers using different file systems.

This is another third-party tool that improved access to disk images from the GUI, DMG Packager, seen in 2009. Unlike DropDMG, this appears to have vanished without trace.

In 2011, with the release of Mac OS X 10.7 Lion, Apple removed more support for old disk image formats. DiskImageMounter no longer opened NDIF .img, .smi self-mounting, .dc42 and .dart compressed formats, although the hdiutil command tool still retained some access to them.

Disk Utility, seen here in 2011, has provided basic access to many disk image formats, but these are only a small selection of options available in the hdiutil command tool, or in DropDMG.

Disk Utility offers a lot of options when you create a new disk image.

This shows the complex set of options available when creating a new disk image in Disk Utility in OS X 10.10 Yosemite, before the advent of APFS.

Support for compression was enhanced in OS X 10.11 El Capitan with the addition of lzfse in a new ULFO format, and macOS 10.15 Catalina added lzma in ULMO. In both cases, these new formats aren’t accessible in older versions of macOS.

APFS support

The arrival of a pre-release version of the new APFS file system in macOS 10.12 Sierra brought its support in disk images, although only for experimental purposes, and Apple cautioned users to ensure their contents were well backed up.

In addition to adding the more efficient ULMO compressed format, macOS 10.15 Catalina is the last to support many Classic Mac OS disk image formats, including those from DiskCopy42, DART and NDIF from Disk Copy 6.x. Support for AppleSingle and MacBinary encodings, and dual-fork file support, were also removed in macOS 11.0 Big Sur in 2020.

This ‘warning’ alert from 2020 illustrates one of the longstanding issues with disk images. Although integrity checking of disk images using checksums has been valuable, when an error is found there’s no possibility of repair or recovery as the image can’t be ‘attached’, so its file system can’t be mounted.

macOS 12 Monterey in 2021 brought multiple deprecations of older formats, including UDBZ using bzip2 compression, segmented UDIF images, and embedded resources. It’s also thought to be the first version of macOS in which UDIF read/write images (UDRW) have been stored in APFS sparse file format, although Apple has nowhere mentioned that. This has transformed what had previously been space-inefficient disk images that retained empty storage into a format that can prove almost as efficient as sparse bundles. This results from the Trim on mounting HFS+ and APFS file systems within the image freeing unused space, enabling that to be saved in the sparse file format.

Disk images have never been glamorous, but have remained at the heart of every Mac.

References

man hdiutil
Introduction
Tools
How read-write disk images have gone sparse
Performance
Bands, Compaction and Space Efficiency

Appendix: Disk image formats

Supported
  • UDRW – UDIF read/write
  • UDRO – UDIF read-only
  • UDCO – UDIF ADC-compressed
  • UDZO – UDIF zlib-compressed
  • ULFO – UDIF lzfse-compressed (OS X 10.11)
  • ULMO – UDIF lzma-compressed (macOS 10.15)
  • UDTO – DVD/CD-R master for export
  • UDSP – sparse image, grows with content
  • UDSB – sparse bundle, grows with content, bundle-backed, Mac OS X 10.5
  • UFBI – UDIF entire image with MD5 checksum.
Unsupported
  • DC42 – Disk Copy 4.2 (Classic)
  • DART – compressed, for Disk Archive/Retrieval Tool (Classic)
  • Rdxx – read-only Disk Copy 6.0 formats
  • NDIF – Disk Copy 6.0, including IMG and self-mounting SMI
  • IDME – ‘Internet enabled’, on downloading post-processed to automatically copy visible contents into a folder, then move the image to the Trash. Now deemed highly insecure.
  • UDBZ – UDIF bzip2-compressed image (deprecated).

❌
❌