Reading view

There are new articles available, click to refresh the page.

How disk images can become sparse files

It’s no miracle that a 10 GB disk image can be shrunk down to a few MB of sparse file. This article explains how APFS works that magic, first on a normal read-write disk image, then in the disk image inside a Virtual Machine.

What is a sparse file?

In essence a sparse file is any file whose allocated storage is smaller than the nominal size of that file. In APFS, this imposes two requirements:

  • the INODE_IS_SPARSE flag is set for that file’s inode,
  • the sparse byte count is given in its extended field.

As a result, the total size of storage allocated to that file’s data in its file extents is smaller than the total required to store the file at its nominal size. This is because the file contains empty data that isn’t stored on disk, saving disk space. This becomes clearer when we consider how this works with regular read-write disk images.

How a disk image becomes sparse

To demonstrate how this works, create a read-write disk image, which APFS will then turn into a sparse file. For the sake of simplicity, I’ll ignore all overheads such as the file system in that disk image.

APFS uses 4 KB storage blocks on SSDs. Creating a 4 GB disk image using DropDMG or Disk Utility therefore uses one million blocks. For this example I number those starting from 0000 0001 in hexadecimal, rising to 000F 4240 at the end of that disk image file, a million blocks later.

Once that has been created, copy a 4 MB file into the disk then unmount it, and mount it again. When it’s mounted that second time, APFS Trims it, and marks all its storage blocks apart from those 4 MB as being unused. That leaves my file occupying blocks 0000 0001 to 0000 03E8, and 0000 03E9 to 000F 4240 unallocated. APFS therefore sets the disk image file’s INODE_IS_SPARSE flag to TRUE and writes the sparse byte count to its extended field: the disk image is now a sparse file.

Creating a VM disk image

Unlike that read-write disk image, a disk image used for a Virtual Machine (on Apple silicon, at least) is created a sparse file in the first instance, using code like
let diskFd = open(diskImagePath, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)
var result = ftruncate(diskFd, sizeDisk)
result = close(diskFd)

(error handling omitted) where sizeDisk is the size in bytes. Similar can be achieved in Terminal using the command
dd if=/dev/zero of=Disk.img bs=1m count=0 seek=10240
where the number given for seek is the size in blocks.

Maintaining the VM disk image

A read-write disk image is mounted and Trimmed by APFS on the host Mac. That used for a VM is different, as it’s the guest OS that has the task of Trimming the disk image from inside, and that works just the same as when macOS is booted on a Mac.

Read the entries made in your Mac’s log by APFS during startup. Those appear early with the start of APFS, when the version is given:
33.263 apfs_module_start:3403: load: com.apple.filesystems.apfs, v2332.101.1, apfs-2332.101.1, 2025/04/11

A little later, APFS Space Manager (Spaceman) Trims the first partition/container with log entries like:
34.012 spaceman_scan_free_blocks:4106: disk1 scan took 0.002064 s, trims took 0.000443 s
34.012 spaceman_scan_free_blocks:4110: disk1 101104 blocks free in 47 extents, avg 2151.14
34.012 spaceman_scan_free_blocks:4119: disk1 101104 blocks trimmed in 47 extents (9 us/trim, 106094 trims/s)
34.012 spaceman_scan_free_blocks:4122: disk1 trim distribution 1:14 2+:18 4+:8 16+:2 64+:1 256+:4

A couple of seconds later it trims a second partition:
36.391 spaceman_scan_free_blocks:4106: disk3 scan took 1.749635 s, trims took 1.147491 s
36.391 spaceman_scan_free_blocks:4110: disk3 351308484 blocks free in 319729 extents, avg 1098.76
36.391 spaceman_scan_free_blocks:4119: disk3 351308484 blocks trimmed in 319729 extents (3 us/trim, 278633 trims/s)
36.391 spaceman_scan_free_blocks:4122: disk3 trim distribution 1:118376 2+:48105 4+:82602 16+:42698 64+:24673 256+:3275
36.391 spaceman_scan_free_blocks:4130: disk3 trims dropped: 10469 blocks 10469 extents, avg 1.00

The following matching entries are taken from a macOS VM as it boots on an Apple silicon Mac:
02.557 apfs_module_start:3403: load: com.apple.filesystems.apfs, v2332.101.1, apfs-2332.101.1, 2025/04/11

03.278 spaceman_scan_free_blocks:4106: disk2 scan took 0.001036 s, trims took 0.000770 s
03.278 spaceman_scan_free_blocks:4110: disk2 126731 blocks free in 15 extents, avg 8448.73
03.278 spaceman_scan_free_blocks:4119: disk2 126731 blocks trimmed in 15 extents (51 us/trim, 19480 trims/s)
03.278 spaceman_scan_free_blocks:4122: disk2 trim distribution 1:4 2+:4 4+:5 16+:0 64+:0 256+:2

03.570 spaceman_scan_free_blocks:4106: disk4 scan took 0.295283 s, trims took 0.285527 s
03.570 spaceman_scan_free_blocks:4110: disk4 19188027 blocks free in 9939 extents, avg 1930.57
03.570 spaceman_scan_free_blocks:4119: disk4 19188027 blocks trimmed in 9939 extents (28 us/trim, 34809 trims/s)
03.570 spaceman_scan_free_blocks:4122: disk4 trim distribution 1:6010 2+:1072 4+:1775 16+:700 64+:244 256+:138
03.570 spaceman_scan_free_blocks:4130: disk4 trims dropped: 4252 blocks 4252 extents, avg 1.00

Just as the Trims performed on the host free up unused blocks of storage on the boot disk, so those in the VM do the same for the VM disk image. To demonstrate how those maintain the VM disk image in sparse format, I wrote two files inside the VM when it was running. One was a plain 10 GB file taking 10 GB on disk, the other a 10 GB sparse file taking a few MB. I then closed the VM and measured its size on disk, opened it again, deleted those two files and closed it again. During this I also took screenshots to verify changes recorded by Disk Utility in the free space inside the VM.

Before writing the two test files, the VM’s disk image size on the host was 107 GB, and it took 23.98 GB on disk as a sparse file. When it contained the two test files, its size remained the same, and it took 34.01 GB on disk. After deleting the files inside the VM, the disk image’s size remained the same, but it only took 24.01 GB on disk, and internally the VM reported that it had returned to 78.9 GB of storage available, the same as it had started with.

As expected, when the VM Trimmed it freed up storage space no longer used by the deleted files, as a result of which the VM disk image required less space on disk.

How the magic works

  • Read-write disk images are created as normal files. They’re Trimmed by APFS on each subsequent mount, and may then become sparse files when there’s sufficient unused space in them.
  • VM disk images are created as sparse files. They’re Trimmed by APFS in the VM during each boot and on demand, maintaining their sparse format when they have sufficient unused space.

How disk images and VMs are more efficient

For many years, most types of disk image were inefficient in their use of storage space, as they occupied their full size on disk. Until recently, when you created a 5 GB read-write UDIF disk image, one of the most popular, it invariably took up 5 GB in storage, even when empty. This also applied to the raw disk images used by Virtual Machines: give a VM 100 GB, and that’s just what it took on disk. With the introduction of sparse files in APFS, this has changed, and many disk images now only take the space they need. I’m not sure exactly when this change occurred, as Apple still doesn’t appear to have documented it, but it seems to have changed with macOS Monterey.

This is easiest to see with a plain read-write disk image, created using DropDMG or Disk Utility.

Disk image

Here’s one I made earlier, a whole 350 GB in size. When it’s created, it’s automatically attached and mounted at full size. For the sake of example, I then copied a large IPSW to it, so it wasn’t entirely empty.

Unmount it and Get Info on the disk image and you’ll see it does still take up a full 350 GB on disk. Mount it again, though, and APFS works its magic. You can see this in LogUI, or the custom log extract provided by Mints.

When unmounted again it has shrunk down to take little more than the size of the IPSW file in it, at just over 17 GB. That’s less than 5% of its nominal size, without using any compression.

It’s worth looking through entries in the log made by APFS for the mount process. First, APFS checks whether the data store for the disk image is already sparse:
01.470 container_backingstore_is_sparse:1652: Image url file:///Volumes/LaCie2tb/350gbudif.dmg Image path /Volumes/LaCie2tb/350gbudif.dmg
01.470 container_backingstore_is_sparse:1659: Image /Volumes/LaCie2tb/350gbudif.dmg is a flat file, do not consider as sparse

It then sets it to sparse, ready for sparsification:
01.475 handle_apfs_set_backingstore:6207: disk9s1 Set backing store as sparse
01.475 handle_apfs_set_backingstore:6240: disk9 Backing storage is a raw file

Space Manager performs an initial scan for free blocks without any Trimming:
01.479 spaceman_scan_free_blocks:4136: disk9 scan took 0.004272 s (no trims)
01.479 spaceman_fxc_print_stats:477: disk9 dev 0 smfree 81258479/85398014 table 4/452 blocks 81258479 32766:20314619:79974226 100.00% range 35869:85362145 99.95% scans 1

Space Manager then scans and Trims free storage blocks, taking just over 0.7 second to complete:
02.196 spaceman_scan_free_blocks:4106: disk9 scan took 0.717433 s, trims took 0.715705 s
02.196 spaceman_scan_free_blocks:4110: disk9 81258479 blocks free in 25 extents, avg 3250339.16
02.196 spaceman_scan_free_blocks:4119: disk9 81258479 blocks trimmed in 25 extents (28628 us/trim, 34 trims/s)
02.196 spaceman_scan_free_blocks:4122: disk9 trim distribution 1:0 2+:0 4+:0 16+:0 64+:0 256+:25

VM

What happens with an Apple silicon VM is a bit more complicated, and harder to observe. This time the virtualisation app should create the disk image inside the VM bundle as a sparse file to begin with, then copy into that what’s needed for the VM, so skipping the first mount stage and Trimming during the second mount.

The result is the same, though, with a 350 GB VM taking just 22 GB on disk. Inspect that disk image using my free utility Precize, and you’ll see that economy confirmed, and the Sparse File flag set.

Conclusions

For plain read-write disk images and those inside VMs to be sparse files:

  • they must contain a suitable raw disk image, such as UDIF read-write;
  • the host file system must be APFS, as HFS+ doesn’t support sparse files;
  • for normal disk images, they must be stored on an SSD that supports Trimming;
  • there must be sufficient free space in the disk image;
  • the guest file system can be APFS, either plain or encrypted, or HFS+J;
  • for normal disk images, they must have been mounted at least once since first being created.

How robust are APFS clone and sparse files?

APFS has two special file types designed to economise on storage space: clone and sparse files. Clone files are two or more distinct files within the same volume whose data is shared; sparse files save space by skipping empty data and only storing data containing information. This article explores how they behave in use, with particular emphasis on Time Machine backups and iCloud Drive. The latter also involves a third type of special file, dataless files.

Clone files

In contrast to hard-linked files, clone files are two or more distinct files within the same file system (volume) whose file extents are identical, so share the same data, as shown below. They’re created by variants of normal file copying, including duplicating in the Finder (and drag-copying within the same volume), and the cp -c command.

fileobject3

Instead of duplicating everything, only the inode and its attributes (blue and pink) are duplicated, together with their file extent information. You can verify this by inspecting the numbers of those inodes, as they’re different, and information in the attributes such as the file’s name will also be different. There’s a flag in the file’s attributes to indicate that cloning has taken place. At first, the two cloned files share the same data blocks and extended attributes, but as the two files are changed by editing, they start to drift apart and become uncloned.

Clone files are becoming more popular thanks to the Hyperspace app, which deduplicates files within the same volume by replacing copies with clones.

Because they can only exist within the same file system, clone files are fragile. Any copy or move to another file system is invariably accompanied by the copying of their full data, and their economy of storage can only remain as long as they stay within the same volume.

Backups

One notable exception to this same-volume rule is in Time Machine backups. As clone files are preserved in local snapshots, when Time Machine constructs a backup as a snapshot in the backup storage volume, shared file extents are retained, so preserving clones. This is reflected in the size of the backup snapshot, and in the report written to the log. For example, when backing up three distinct files and ten clones of one of those, that report included:
14 Total Items in Backup (l: 16 GB p: 11.02 GB)
3 Files Copied (l: 6 GB p: 1.02 GB)
1 Directories Copied (l: Zero KB p: Zero KB)
10 Files Cloned (l: 10 GB p: 10 GB)

Backups made by other utilities are unlikely to reproduce this behaviour, though, as they can’t synthesise snapshots in the way that Time Machine does. To preserve clone files in their backups, they’d have to identify clones in the source and explicitly perform cloning in their backup store. Although Carbon Copy Cloner claims that “in some cases CCC may clone a file on the destination prior to updating its contents”, it doesn’t appear to attempt to preserve clone files in the backups it makes. I’m not aware of any third-party utility that does.

Unfortunately, Time Machine appears unable to restore directly from backup snapshots in the backup store, and performs Finder copies when restoring. That saves each of those clone files as a completely separate file, without any sharing of data. As a result, the space occupied on disk for a restored volume can be substantially greater than the original or its backup. Extensive use of clone files could thus cause problems when restoring from backups.

Of course, rolling a volume back to a local snapshot, such as one made during Time Machine backups, preserves all clone files within that volume.

iCloud Drive

Clone files created within the same volume as local iCloud Drive storage on the Data volume, or cloned when within a folder in iCloud Drive, remain within the same file system and clones are therefore preserved, and when the file is moved to other folders in the same volume.

However, clone files are treated as simple copies as far as iCloud Drive’s remote storage is concerned. While a pair of cloned 5 GB files only use a total of 5 GB local storage, they require a full 10 GB of your iCloud allocation, indicating that their cloud storage is separate and not common to both. Although the effects of eviction (removing local data) and materialisation (restoring local data from cloud storage) are difficult to observe directly, they appear to lose the benefits of cloning.

When the local copy of a file also stored remotely in the cloud is evicted, its data is removed from local storage, rendering it dataless, as shown below.

iCloudDriveFileSummary4

When that file is to be used locally again, its data has to be downloaded from the cloud service, and the local dataless file is materialised by adding its data back. As far as I can tell, that doesn’t result in the reconstruction of the shared file extents, so changes cloned files into normal copies with different file extents. You would then need to use Hyperspace to restore them as clone files. Other Macs sharing the same iCloud Drive also see them as full copies rather than clones.

These behaviours could also catch the user by surprise.

Sparse files

Unlike clone files, the structure of sparse files in APFS is conventional, as shown below.

fileobject1

They achieve their economy in storage by only including file extents containing non-null data, and thus aren’t dependent on remaining within the same file system (volume), making them more robust. Their primary requirement is that they’re created and maintained using specific file system operations, and are only copied or moved to other APFS file systems.

Backups

When backed up by Time Machine to another APFS volume, sparse files are preserved reliably, and are also restored as sparse files. That isn’t likely to hold, though, if the file is transferred using a network file system such as SMB, as all network transfers currently appear to explode sparse files to full size prior to transfer. Because of the way in which they have to be created, only the app maintaining that file could restore its sparse format. In the case of disk images, this should normally occur the next time they’re mounted in the Finder and Trimmed by APFS.

iCloud Drive

Assessing what happens with sparse files in iCloud Drive is considerably simpler than with clone files. As long as they remain downloaded to local storage, they are preserved, and can be moved in and out of iCloud Drive storage without exploding in size. However, they too are stored in full when in iCloud storage, requiring their full size in your iCloud allocation, and the eviction-materialisation cycle explodes them to full size, and their sparse file flag is removed.

The only way to return a former sparse file to its original economical format is then to open and save it using the app that creates and maintains it. In the case of disk images, this should occur when they’re next mounted and Trimmed.

Conclusions

Clone files:

  • are only preserved when moved within the same file system (volume);
  • are preserved and restored from local snapshots;
  • are preserved in Time Machine backups, but aren’t restored from them;
  • aren’t preserved in other backups;
  • could result in a restored volume being substantially larger than its original;
  • occupy their full space in your iCloud allocation;
  • are only preserved in iCloud Drive when they aren’t evicted from local storage;
  • can be regenerated using Hyperspace.

Sparse files:

  • are only preserved when copied or moved directly between APFS volumes;
  • aren’t preserved when copied or moved over network connections, or using SMB;
  • aren’t preserved when copied or moved to different file systems, including HFS+;
  • are preserved in and restored from local Time Machine backups;
  • should be preserved in and restored from other local backups;
  • occupy their full space in your iCloud allocation;
  • are only preserved in iCloud Drive when they aren’t evicted from local storage;
  • can only be regenerated by the app that creates and maintains them.

Both clone and sparse files can result in substantial savings in storage space. However, because that’s fragile, their greatest value is in minimising erase-write cycles in SSDs, hence slowing their ageing.

References

Apple’s APFS Reference (PDF), last revised 22 June 2020.
Dataless files are explained here.
How sparse files work
Files and clones
Special file types, including dataless files

Can APFS special files save storage space?

I’ve long been critical of some of the best-selling utilities for the Mac, that set out to perform deduplication of files by detecting which appear to be identical, and removing ‘spare’ copies. This is because APFS introduced clone files, and in the right circumstances those take up no space in storage, as their data is common and not duplicated at all. As it’s practically difficult to tell whether two files are clones, any utility or command tool that claims to save space by removing duplicates can’t tell you the truth, and in most cases won’t save as much space as it claims.

Claims made by those utilities are often exaggerated. This is because they calculate how much space they think they’ve saved by adding the sizes of the potential duplicates they have deleted. That’s not correct when a clone file is deleted, as that doesn’t actually free any space at all, even though the clone file has exactly the same nominal size as the original.

Benefitting from clone files

I’m delighted to see the eminent John Siracusa turn this on its head and finally make better use of clone files in his app Hyperspace, available from the App Store. Instead of deleting clones, his app can replace duplicate copies with clones, and so achieve real space savings. This comes down to simple arithmetic:

  • if you have two copies (not clones) of a file in the same APFS volume, the total size they take on disk is twice the size of one of them;
  • if you have two clones (not copies) of a file in the same APFS volume, the total size they take on disk is only the size of one of them, as its clone takes no additional space at all.

Hyperspace thus checks all the files in a selected folder, identifies which are identical copies, and (where suitable) will replace those copies (except an original) with clones, so saving real storage space.

I also think it has the most user-friendly payment scheme: download Hyperspace free of charge and check your Mac with it. If it doesn’t find sufficient savings, and you decide not to use it to replace any duplicates with clones, then it costs you nothing. If you want to reclaim that space, then you can opt to pay according to the amount of space it saves, by subscription, or with a one-time payment. On that basis, I unhesitatingly recommend everyone to download it from the App Store, and at least check their Home folder to see if it’s worth paying to reclaim space. You have absolutely nothing to lose.

In my case, perhaps because I tend to clone files using the Finder’s Duplicate command, the savings that it offered were of little benefit, but your Home folder could be different and release 100 GB or more.

Sparse files

The other space-saving special file type in APFS is the sparse file. Although it can bring great savings in storage space, that’s largely up to the app(s) that create and maintain the file, rather than the user. Devising an app that could go round converting plain to sparse files is harder, and risks incompatibility with those apps that access those files.

Fitting 285 GB into 16.5 GB

As a demonstration of how effective APFS special files are in saving disk space, I built myself a 100 GB partition (APFS Container) on an SSD and tried to fill it with clone and sparse files until I got bored.

At this stage, the 100 GB partition contains:

  • One 16.5 GB IPSW image file, with nine clones of it, created using the Duplicate command.
  • Eleven 10 GB sparse files and one clone, created using my app Sparsity.

Add those file sizes together and they come to 285 GB, yet the 100 GB partition only has 16.5 GB stored on it, and still has over 83 GB free. No compression is involved here, of course.

As the saying goes, there ain’t such as thing as a free lunch, and that free space could vanish quickly depending on what happens to those files. The worst case is for an app not to recognise sparse files, and write one to disk in plain format, so swallowing 10 GB at once. Editing the cloned files would be a more gradual way of their growing in size. Only changed data would then need to be saved, so free disk space would steadily fall as more changes were made to the clone.

Clone and sparse files are by no means unique to APFS, but they can be impressive, and above all they’re effective at reducing excess erase-write cycles that age SSDs, whatever you do with the storage they free.

I’m very grateful to Duncan for drawing my attention to Hyperspace, and to John Siracusa for an outstanding app.

❌