How robust are APFS clone and sparse files?
APFS has two special file types designed to economise on storage space: clone and sparse files. Clone files are two or more distinct files within the same volume whose data is shared; sparse files save space by skipping empty data and only storing data containing information. This article explores how they behave in use, with particular emphasis on Time Machine backups and iCloud Drive. The latter also involves a third type of special file, dataless files.
Clone files
In contrast to hard-linked files, clone files are two or more distinct files within the same file system (volume) whose file extents are identical, so share the same data, as shown below. They’re created by variants of normal file copying, including duplicating in the Finder (and drag-copying within the same volume), and the cp -c
command.
Instead of duplicating everything, only the inode and its attributes (blue and pink) are duplicated, together with their file extent information. You can verify this by inspecting the numbers of those inodes, as they’re different, and information in the attributes such as the file’s name will also be different. There’s a flag in the file’s attributes to indicate that cloning has taken place. At first, the two cloned files share the same data blocks and extended attributes, but as the two files are changed by editing, they start to drift apart and become uncloned.
Clone files are becoming more popular thanks to the Hyperspace app, which deduplicates files within the same volume by replacing copies with clones.
Because they can only exist within the same file system, clone files are fragile. Any copy or move to another file system is invariably accompanied by the copying of their full data, and their economy of storage can only remain as long as they stay within the same volume.
Backups
One notable exception to this same-volume rule is in Time Machine backups. As clone files are preserved in local snapshots, when Time Machine constructs a backup as a snapshot in the backup storage volume, shared file extents are retained, so preserving clones. This is reflected in the size of the backup snapshot, and in the report written to the log. For example, when backing up three distinct files and ten clones of one of those, that report included:14 Total Items in Backup (l: 16 GB p: 11.02 GB)
3 Files Copied (l: 6 GB p: 1.02 GB)
1 Directories Copied (l: Zero KB p: Zero KB)
10 Files Cloned (l: 10 GB p: 10 GB)
Backups made by other utilities are unlikely to reproduce this behaviour, though, as they can’t synthesise snapshots in the way that Time Machine does. To preserve clone files in their backups, they’d have to identify clones in the source and explicitly perform cloning in their backup store. Although Carbon Copy Cloner claims that “in some cases CCC may clone a file on the destination prior to updating its contents”, it doesn’t appear to attempt to preserve clone files in the backups it makes. I’m not aware of any third-party utility that does.
Unfortunately, Time Machine appears unable to restore directly from backup snapshots in the backup store, and performs Finder copies when restoring. That saves each of those clone files as a completely separate file, without any sharing of data. As a result, the space occupied on disk for a restored volume can be substantially greater than the original or its backup. Extensive use of clone files could thus cause problems when restoring from backups.
Of course, rolling a volume back to a local snapshot, such as one made during Time Machine backups, preserves all clone files within that volume.
iCloud Drive
Clone files created within the same volume as local iCloud Drive storage on the Data volume, or cloned when within a folder in iCloud Drive, remain within the same file system and clones are therefore preserved, and when the file is moved to other folders in the same volume.
However, clone files are treated as simple copies as far as iCloud Drive’s remote storage is concerned. While a pair of cloned 5 GB files only use a total of 5 GB local storage, they require a full 10 GB of your iCloud allocation, indicating that their cloud storage is separate and not common to both. Although the effects of eviction (removing local data) and materialisation (restoring local data from cloud storage) are difficult to observe directly, they appear to lose the benefits of cloning.
When the local copy of a file also stored remotely in the cloud is evicted, its data is removed from local storage, rendering it dataless, as shown below.
When that file is to be used locally again, its data has to be downloaded from the cloud service, and the local dataless file is materialised by adding its data back. As far as I can tell, that doesn’t result in the reconstruction of the shared file extents, so changes cloned files into normal copies with different file extents. You would then need to use Hyperdrive to restore them as clone files. Other Macs sharing the same iCloud Drive also see them as full copies rather than clones.
These behaviours could also catch the user by surprise.
Sparse files
Unlike clone files, the structure of sparse files in APFS is conventional, as shown below.
They achieve their economy in storage by only including file extents containing non-null data, and thus aren’t dependent on remaining within the same file system (volume), making them more robust. Their primary requirement is that they’re created and maintained using specific file system operations, and are only copied or moved to other APFS file systems.
Backups
When backed up by Time Machine to another APFS volume, sparse files are preserved reliably, and are also restored as sparse files. That isn’t likely to hold, though, if the file is transferred using a network file system such as SMB, as all network transfers currently appear to explode sparse files to full size prior to transfer. Because of the way in which they have to be created, only the app maintaining that file could restore its sparse format. In the case of disk images, this should normally occur the next time they’re mounted in the Finder and Trimmed by APFS.
iCloud Drive
Assessing what happens with sparse files in iCloud Drive is considerably simpler than with clone files. As long as they remain downloaded to local storage, they are preserved, and can be moved in and out of iCloud Drive storage without exploding in size. However, they too are stored in full when in iCloud storage, requiring their full size in your iCloud allocation, and the eviction-materialisation cycle explodes them to full size, and their sparse file flag is removed.
The only way to return a former sparse file to its original economical format is then to open and save it using the app that creates and maintains it. In the case of disk images, this should occur when they’re next mounted and Trimmed.
Conclusions
Clone files:
- are only preserved when moved within the same file system (volume);
- are preserved and restored from local snapshots;
- are preserved in Time Machine backups, but aren’t restored from them;
- aren’t preserved in other backups;
- could result in a restored volume being substantially larger than its original;
- occupy their full space in your iCloud allocation;
- are only preserved in iCloud Drive when they aren’t evicted from local storage;
- can be regenerated using Hyperdrive.
Sparse files:
- are only preserved when copied or moved directly between APFS volumes;
- aren’t preserved when copied or moved over network connections, or using SMB;
- aren’t preserved when copied or moved to different file systems, including HFS+;
- are preserved in and restored from local Time Machine backups;
- should be preserved in and restored from other local backups;
- occupy their full space in your iCloud allocation;
- are only preserved in iCloud Drive when they aren’t evicted from local storage;
- can only be regenerated by the app that creates and maintains them.
Both clone and sparse files can result in substantial savings in storage space. However, because that’s fragile, their greatest value is in minimising erase-write cycles in SSDs, hence slowing their ageing.
References
Apple’s APFS Reference (PDF), last revised 22 June 2020.
Dataless files are explained here.
How sparse files work
Files and clones
Special file types, including dateless files