Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Last Week on My Mac: A strategy for data integrity

By: hoakley
20 July 2025 at 15:00

File data integrity is one of those topics that never goes away. Maybe that’s because we’ve all suffered in the past, and can’t face a repeat of that awful feeling when an important document can’t be opened because it’s damaged, or crucial data have gone missing. Before considering what we could do prevent that from happening, we must be clear about how it could occur.

We have an important file, and immediately after it was last changed and saved, a SHA256 digest was made of it and saved to that file as an extended attribute, in the way that you can using Dintch, Fintch or cintch. A few days or weeks later we open the file and discover its contents have changed.

Reasons

What could account for that?

One obvious reason is that the file was intentionally changed and saved without updating its digest. Provided there are good backups, we should be able to step back through them to identify when that change occurred, and decide whether it’s plausible that it was performed intentionally. Although the file’s Modified datestamp should coincide with the change seen in its backups, there’s no way of confirming that change was intentional, or even which app was used to write the changed file (with some exceptions, such as PDF).

Exactly the same observations would also be consistent with the file being unintentionally changed, perhaps as a result of a bug in another app or process that resulted in it writing to the wrong file or storage location. The changed digest can only detect the change in file content, and can’t indicate what was responsible. This is a problem common to file systems that automatically update their own records of file digests, as they are unable to tell whether the change is intentional, simply that there has been a change. This also applies to changes resulting from malicious activity.

The one circumstance in which change in contents, hence in digest, wouldn’t necessarily follow a change in the file’s Modified datestamp is when an error occurs in the storage medium. However, this is also the least likely to be encountered in modern storage media without that error being reported.

Errors occurring during transfer to and from storage are detected by CRC or similar checks made as part of the transfer protocol. This is one of the reasons why a transfer bandwidth of 40 Gb/s cannot realise a data transfer rate of 5 GB/s, because part of that bandwidth is used by the error-checking overhead. Once written to a hard disk or SSD, error-correcting codes are used to verify integrity of the data, and are used to detect bad storage blocks.

Out of interest, I’ve been conducting a long-term experiment with 97 image files totalling 60.8 MB stored in my iCloud Drive since 11 April 2020, over five years ago. At least once a year I download them all and check them using Dintch, and so far I haven’t had a single error.

Datestamps

There are dangers inherent in putting trust in file datestamps as markers of change.

In APFS, each file has four different datestamps stored in its attributes:

  • create_time, time of creation of that file,
  • mod_time, time that file was last modified,
  • change_time, time that the file’s attributes including extended attributes were last modified,
  • access_time, time that file was last read.

For example, a file with the following datestamps

  • create_time 2025-04-18 19:58:48.707+0100
  • mod_time 2025-04-18 20:00:56.134+0100
  • change_time 2025-07-19 06:59:10.542+0100
  • access_time 2025-07-19 06:52:17.504+0100

was created on 18 April this year, last modified a couple of minutes later, last had its attributes changed on 19 July, but was last read 7 minutes before that modification to its attributes.

These can be read using Precize, or in Terminal, but there’s a catch with access_time. APFS has an optional feature, set by volume, determining whether access_time is changed strictly. If that option is set, then every time a file is accessed, whether it’s modified or not, its access_time is updated. However, this defaults to only updating access_time if its current value is earlier than mod_time. I’m not aware of any current method to determine whether the strict access_time is enabled for any APFS volume, and it isn’t shown in Disk Utility.

mod_time can be changed when there has been no change in the file’s data, for example using the Terminal command touch. Any of the times can be altered directly, although that should be very unusual even in malware.

Although attaching a digest to a file as an extended attribute will update its change_time, there are many other reasons for that being changed, including macOS adding or changing quarantine xattrs, the file’s ‘last used date’, and others.

Proposed strategy

  1. Tag folders and files whose data integrity you wish to manage.
  2. Back them up using a method that preserves those tags, such as Time Machine, or copied to iCloud Drive.
  3. Periodically Check their tags to verify their integrity.
  4. As soon as possible after any have been intentionally modified and saved, Retag them to ensure their tags are maintained.
  5. In the event that any are found to have changed, and no longer match their tag, trace that change back in their backups.

Unlike automatic integrity-checking built into a file system, this will detect all unexpected changes, regardless of whether they are made through the file system, are intentional or unintentional, are malicious, or result from errors in storage media or transmission. Because only intentionally changed files are retagged, this also minimises the size of backups.

Can you trust times shown in the log?

By: hoakley
30 May 2025 at 14:30

Next to the order of entries in the log, their date and time stamps are one of the most important pieces of information they contain. In some cases, such as when you’re using the log to estimate performance, their accuracy is vital. This article reports the results from tests to validate the times given in log records provided by the log show command, and in my free log browsers LogUI and Ulbow.

Clock date and time

Dates and times given in log extracts invariably match those of the Mac’s system clock, the only catch here being adjustments for time zone and DST. The latter can become confusing if you look at the log when DST is changed, or from a different time zone. To cope with that you can use the --timezone local option in log show to express all times with uniform adjustment. Ulbow doesn’t use that, but LogUI does now synchronise all time and date stamps to the current time zone and DST.

Time differences

The log is an excellent tool for measuring time and performance, either using regular entries or Signposts that are intended for the purpose. Writing an entry into the log incurs minimal overhead, and is simple to perform from any code or script. If your favourite scripting language doesn’t give direct access to writing entries, then you can use my free command tool blowhole to do so. If you want to assess processes in macOS, then it’s usually straightforward to identify appropriate milestones that mark events and use those to calculate the period. These all depend on the times reported in log entries being sufficiently accurate.

Gold standard

Since Mac OS X, every Mac has had a high-precision internal clock within it (prior to that Time Manager could resolve times down to the microsecond but no further). This increments monotonically in ‘ticks’, an unsigned 64-bit integer, starting from an arbitrary value, and is referred to as Mach Absolute Time (MAT).

Intel Macs increment their ‘tick’ count once every nanosecond, so the difference between two readings of the clock represents the time interval in nanoseconds. Life isn’t as simple with Apple silicon Macs, as they tick three times every 125 nanoseconds, or once every 41.67 nanoseconds. Apple’s latest documentation on MAT, its units and use, comes in a Technical Q&A dated 2005.

Once correctly converted into nanoseconds, MAT is the closest available measurement of time to a gold standard.

I suspect that log entries are originally given a raw MAT as their time, and that can be made available using the log show command, or in Ulbow, as that uses log show to obtain log entries. LogUI reads the log directly, through the OSLog API in macOS, which currently doesn’t provide MAT values, instead giving a lower resolution Date value.

This validation therefore compares time intervals given by Ulbow from log entry timestamps, and those given in LogUI, against MAT intervals obtained in Ulbow. To increase the challenge, log entries used are from blowhole writing 25 log entries as fast as it can, a worst case scenario as that writes 2-3 entries each microsecond.

Comparisons

Log extracts obtained using log show in Ulbow and those estimated by LogUI were compared, and their timestamps were found to be identical.

On an Apple silicon Mac (M4 Pro), entries written by blowhole had raw MAT values that recorded intervals of 9 or 10 ticks between them, after the first three were made 73 and 15 ticks apart. From the third of the series of 25 log entries there was a strong linear relationship between recorded MAT in nanoseconds elapsed and loop number, as shown in the chart below.

The gradient of the regressed line shows that blowhole‘s log entries occurred at intervals of just under 405 nanoseconds.

Because LogUI (and regular timestamps in log show and Ulbow) only resolve to microseconds, the matching plot for LogUI’s times against loop number is stepped.

The gradient of this regression line is 0.4, indicating that the intervals occurred at 400 nanoseconds, almost identical to that found for the MAT.

Plotting times measured by LogUI (which also represents those for Ulbow and log show, as they’re identical) against that of MAT shows a good linear relationship with a gradient of just under 1.009, indicating that timestamps in log show, Ulbow and LogUI are accurate and reliable estimates of MAT.

Differences between pairs of time estimates obtained from MAT and LogUI ranged from -83 to +792 nanoseconds, with a median of +370 and quartiles of +83 to +583 nanoseconds.

Conclusion

  • Times given for log entries in LogUI, Ulbow and log show are reliable estimates of MAT to within +0.8 microseconds.
  • When nanosecond resolution is needed, the machTimestamp field from log show or Ulbow should be used, and converted into nanoseconds.

❌
❌