Normal view

There are new articles available, click to refresh the page.
Today — 1 September 2025Main stream

How large is that file?

By: hoakley
1 September 2025 at 14:30

At first sight, this might seem a simple question to answer. Allow me to demonstrate that it’s more complex, has changed over time, and still hasn’t been resolved.

Before 1984

Although there were exceptions, before the Mac most file systems that we were likely to encounter were simple. Each file consisted of stored data, together with a small entry in the file system’s metadata containing information such as the file’s name and its time and date of creation. By convention, as that’s not stored with the file’s data, that isn’t included in its size.

The only complication here was that storage is divided up into blocks, so in addition to the file’s actual size, we’d also want to know the total size taken by the storage blocks it used, as any given block couldn’t be shared between files, a figure often known as size on disk, and dependent on the size of those blocks. If the block size is 4,096 bytes, then any file whose data was equal to or smaller than that had a size on disk of 4,096 bytes, 4 KiB or 4.096 KB. (There are 1,000 bytes in a KB, but 1,024 bytes in a KiB.)

Classic Mac OS

One of the many innovations in the first Mac was its file system MFS, in which every file has two forks, one the traditional data fork, the other a resource fork for storing structured blobs of data termed resources. As MFS was replaced by HFS and eventually HFS+, those resource forks continued.

filesize04

Resource forks often reached considerable sizes, and although they are stored separately from data forks in HFS+, Apple usually gave the sizes of the two forks, as shown in this dialog, where the size of the resource fork is 117,836 bytes, almost as large as the file’s 144,788 bytes of data. Information about the size of that file’s entry in the file system data, its attributes, wasn’t given, though.

HFS+ in Mac OS X

In the early years of Mac OS X, there was controversy as to whether it should continue supporting the use of resource forks, and most of their uses were replaced by flattened data files arranged in bundles. Nevertheless, in Mac OS X 10.4, HFS+ became multi-forked with what were dubbed extended attributes or xattrs, one type being com.apple.ResourceFork, the classic resource fork. However, the practice of giving separate sizes for the data fork and other forks died.

APFS

When APFS was released in 2017, it changed the way in which xattrs are stored, as explained here. Now, each file can consist of:

  1. file system attributes, stored in the file system metadata structures;
  2. small xattrs of up to 3,804 bytes, stored separately from its attributes, but in file system metadata;
  3. large xattrs over 3,804 bytes, typically including any com.apple.ResourceFork, stored as data streams with separate records;
  4. data, stored in storage blocks as set out in their file extents.

As demonstrated using crafted files, initially the Finder ignored items 1-3 in stating file size, and just gave that of 4, the data, although it could instead still have been including resource forks, of course.

filesize02

The size of this text file is given as 391 bytes in the Finder’s Get Info in High Sierra, but as you’ll see below it contains over 90,000 bytes of extended attributes that figure simply ignores.

filesize01

macOS Sequoia

Here are the sizes given for another of my specially crafted demonstration files, for APFS in macOS 15.6.1 Sequoia.

According to the Finder’s Get Info dialog, this file contains 263,195 bytes and occupies 266 KB on disk. On an SSD with the standard 4 KiB block size, that should be 65 blocks, or 266,240 bytes, as given correctly. There’s something amiss with that file, though, as it claims to be a Zip archive but has an image thumbnail.

Listing its xattrs using xattred reveals no less than 14, including two of 80 KB each. xattred claims it has a data fork size of 183,136 bytes and 161,406 bytes in xattrs, making a total size of 344,542 bytes, which is nowhere near that given by the Finder. (It’s the com.apple.ResourceFork xattr, a classic resource fork, that contains the image thumbnail displayed by QuickLook instead of the normal Zip file icon.)

To discover how the Finder arrives at a size of 263,195 bytes, we need to subtract the data size from that, making 80,059 bytes, the size of the file’s resource fork or com.apple.ResourceFork xattr. So, without being explicit about forks, it’s behaving the same as in Classic Mac OS. You might find that puzzling, given that there’s another xattr of the same size that it’s ignoring, and a dozen more that don’t get a look-in. As the use of com.apple.ResourceFork xattrs has long been discouraged if not deprecated, isn’t that a strange behaviour? The more so when modern xattrs that Apple has introduced relatively recently, such as com.apple.quarantine, com.apple.macl and com.apple.provenance, are ignored.

The deeper you look into this, the more puzzling it becomes. Here are the same file’s figures as shown in Precize.

Sizes are given a few lines down, from two sources, URL Keys and the file system (FileManager), and they also differ. There’s a list of xattrs given at the foot of this window, but that only gives 12 and ignores com.apple.ResourceFork and com.apple.FinderInfo.

In the macOS API, code can obtain values for file sizes from its URL. Two keys are available, fileSizeKey and totalFileSizeKey. The first gives the data size, and the second is the same ‘total’ as that shown by the Finder, i.e. data + com.apple.ResourceFork xattr, but excluding all other xattrs. Apple’s documentation explains those as:

  • fileSize is “the total file size, in bytes”
  • totalFileSize is “the total displayable size of the file, in bytes. The allocated size in bytes may include space used by metadata.”

FileManager also gives the data size in its FileAttributeKey.size, but doesn’t give any for xattrs, even com.apple.ResourceFork. The size of the Metadata shown for the File system is instead calculated by totalling the individual sizes of all its xattrs, including com.apple.ResourceFork and com.apple.FinderInfo.

This may appear to be nit-picking, but data sizes are given to the exact number of bytes, and the size on disk for non-sparse files should always be within 4,095 bytes of the data size. Yet accounting for xattrs remains rooted in Classic Mac OS from 25 years ago and still pretends that xattrs either don’t exist, or don’t take any space.

Before yesterdayMain stream

Last Week on My Mac: A strategy for data integrity

By: hoakley
20 July 2025 at 15:00

File data integrity is one of those topics that never goes away. Maybe that’s because we’ve all suffered in the past, and can’t face a repeat of that awful feeling when an important document can’t be opened because it’s damaged, or crucial data have gone missing. Before considering what we could do prevent that from happening, we must be clear about how it could occur.

We have an important file, and immediately after it was last changed and saved, a SHA256 digest was made of it and saved to that file as an extended attribute, in the way that you can using Dintch, Fintch or cintch. A few days or weeks later we open the file and discover its contents have changed.

Reasons

What could account for that?

One obvious reason is that the file was intentionally changed and saved without updating its digest. Provided there are good backups, we should be able to step back through them to identify when that change occurred, and decide whether it’s plausible that it was performed intentionally. Although the file’s Modified datestamp should coincide with the change seen in its backups, there’s no way of confirming that change was intentional, or even which app was used to write the changed file (with some exceptions, such as PDF).

Exactly the same observations would also be consistent with the file being unintentionally changed, perhaps as a result of a bug in another app or process that resulted in it writing to the wrong file or storage location. The changed digest can only detect the change in file content, and can’t indicate what was responsible. This is a problem common to file systems that automatically update their own records of file digests, as they are unable to tell whether the change is intentional, simply that there has been a change. This also applies to changes resulting from malicious activity.

The one circumstance in which change in contents, hence in digest, wouldn’t necessarily follow a change in the file’s Modified datestamp is when an error occurs in the storage medium. However, this is also the least likely to be encountered in modern storage media without that error being reported.

Errors occurring during transfer to and from storage are detected by CRC or similar checks made as part of the transfer protocol. This is one of the reasons why a transfer bandwidth of 40 Gb/s cannot realise a data transfer rate of 5 GB/s, because part of that bandwidth is used by the error-checking overhead. Once written to a hard disk or SSD, error-correcting codes are used to verify integrity of the data, and are used to detect bad storage blocks.

Out of interest, I’ve been conducting a long-term experiment with 97 image files totalling 60.8 MB stored in my iCloud Drive since 11 April 2020, over five years ago. At least once a year I download them all and check them using Dintch, and so far I haven’t had a single error.

Datestamps

There are dangers inherent in putting trust in file datestamps as markers of change.

In APFS, each file has four different datestamps stored in its attributes:

  • create_time, time of creation of that file,
  • mod_time, time that file was last modified,
  • change_time, time that the file’s attributes including extended attributes were last modified,
  • access_time, time that file was last read.

For example, a file with the following datestamps

  • create_time 2025-04-18 19:58:48.707+0100
  • mod_time 2025-04-18 20:00:56.134+0100
  • change_time 2025-07-19 06:59:10.542+0100
  • access_time 2025-07-19 06:52:17.504+0100

was created on 18 April this year, last modified a couple of minutes later, last had its attributes changed on 19 July, but was last read 7 minutes before that modification to its attributes.

These can be read using Precize, or in Terminal, but there’s a catch with access_time. APFS has an optional feature, set by volume, determining whether access_time is changed strictly. If that option is set, then every time a file is accessed, whether it’s modified or not, its access_time is updated. However, this defaults to only updating access_time if its current value is earlier than mod_time. I’m not aware of any current method to determine whether the strict access_time is enabled for any APFS volume, and it isn’t shown in Disk Utility.

mod_time can be changed when there has been no change in the file’s data, for example using the Terminal command touch. Any of the times can be altered directly, although that should be very unusual even in malware.

Although attaching a digest to a file as an extended attribute will update its change_time, there are many other reasons for that being changed, including macOS adding or changing quarantine xattrs, the file’s ‘last used date’, and others.

Proposed strategy

  1. Tag folders and files whose data integrity you wish to manage.
  2. Back them up using a method that preserves those tags, such as Time Machine, or copied to iCloud Drive.
  3. Periodically Check their tags to verify their integrity.
  4. As soon as possible after any have been intentionally modified and saved, Retag them to ensure their tags are maintained.
  5. In the event that any are found to have changed, and no longer match their tag, trace that change back in their backups.

Unlike automatic integrity-checking built into a file system, this will detect all unexpected changes, regardless of whether they are made through the file system, are intentional or unintentional, are malicious, or result from errors in storage media or transmission. Because only intentionally changed files are retagged, this also minimises the size of backups.

More updates: xattred, Precize and DelightEd. From xattrs to Rich Text

By: hoakley
24 June 2025 at 14:30

Here are three more updates to some of my most popular apps, primarily for improved compatibility with macOS 26 Tahoe, but with improvements in their interface for other versions of macOS from Big Sur onwards.

xattred 1.6

This toolset for working with extended attributes (xattrs) has several improved window layouts, a couple of fixes in its code to cope with deprecations, and a new icon that should work far better with Tahoe, without compromising its appearance in older macOS.

xattred 1.6 is now available from here: xattred16
from its Product Page, and via its auto-update mechanism. If you’re still using it in Catalina or earlier, please disable its auto-update as detailed in its Help book, so you can remain with version 1.5 or earlier.

Precize 1.16

This provides a great deal of useful information about files, from their inode number, detailed size including that of extended attributes, and access to bookmarks and their analysis. I have tweaked its main window to improve its interface, rebuilt it, and provided it with a new app icon that should be an improvement on all versions of macOS from Big Sur onwards.

Precize 1.16 is now available from here: precize116
from its Product Page, and via its auto-update mechanism. If you’re still using it in Catalina or earlier, please disable its auto-update as detailed in its Help book to remain using your earlier version.

DelightEd 2.4

This text-only Rich Text editor was originally developed to work better with Dark mode, when it was introduced in Mojave. Since then I have used it to produce all the Rich Text I use in apps, ensuring it continues to work properly across both Light and Dark modes. It also has unusual features to support interlinear text. This version has had a few tweaks in its window layout, and has been rebuilt to make it fully compatible with Tahoe and features like Writing Tools, as well as gaining an updated app icon.

DelightEd 2.4 is now available from here: DelightEd24
from its Product Page, and via its auto-update mechanism. If you’re still using it in Catalina or earlier, please disable its auto-update as detailed in its Help book so you can remain with an earlier version.

In the works

I’m currently in the throes of producing a new version of my PDF reader Podofyllin, which I use daily. Unfortunately, this will remove its ability to view the code inside PDFs, as Apple appears to have disabled all the features it relied on to perform that, and they no longer work in Sequoia. However, it still has some unusual features, such as opening multiple views of the same PDF, and can’t edit or save any changes to the original file.

I have spent some time inside Viable, my macOS virtualiser, trying to get it to use the new ASIF disk image, but so far have been unable to get it to work. I will be pursuing that when I get the time.

Enjoy!

❌
❌