Normal view

Before yesterdayMain stream

How much power does Visual Look Up use?

By: hoakley

2 September 2025 at 14:30

Look in the log, and Visual Look Up (VLU) on an Apple silicon Mac apparently involves a great deal of work, in both CPU cores and the neural engine (ANE). This article reports a first attempt to estimate power and energy use for a single VLU on an image.

To estimate this, I measured CPU, GPU and ANE power in sampling periods of 100 ms using powermetrics, and correlated events seen there with those recorded in a log extract over the same period, obtained using LogUI. The test was performed on a Mac mini M4 Pro running macOS 15.6.1, using Preview to perform the VLU on a single image showing a small group of cattle in an upland field. Power measurements were collected from a moment immediately before opening the image, and ceased several seconds after VLU was complete.

When used like this, powermetrics imposes remarkably little overhead on the CPU cores, but its sampling periods are neither exact nor identical. This makes it difficult to correlate log entries and their precise timestamps with sampling periods. While powermetrics gives power use in mW, those measurements aren’t calibrated and making assumptions about their accuracy is hazardous. Nevertheless, they remain the best estimates available.

Log narrative

The first step in log analysis was to identify the starting time of powermetrics sampling periods. Although execution of that command left no trace in its entries, as it has to be run with elevated privileges using sudo, its approval was obvious in entries concluding
30.677182 com.apple.opendirectoryd ODRecordVerifyPassword completed
A subsequent entry at 30.688828 seconds was thus chosen as the start time for sampling periods, and all times given below as given in seconds after that time zero.

The following relevant events were identified in the log extract at elapsed times given in seconds:

1.3 com.apple.VisionKit Signpost Begin: “VisionKit MAD Parse Request”
1.3 com.apple.mediaanalysis Running task VCPMADServiceImageProcessingTask
1.4 ANE started and an ObjectDetectionModel run for 0.2 s
1.6 ANE activity and a NatureWorldModel run for 0.25 s
2.0 ANE activity for 0.15 s
2.4 ANE activity for 0.1 s
8.1 ANE activity and a UnifiedModel run for 0.01 s
8.1 PegasusKit queried Apple’s SMOOT service, the external connection used to populate the VLU window.

Thus, the ANE was run almost continuously from 1.4-2.2 seconds after the start of sampling, otherwise was used little over the total period of about 9 seconds. Over that period of activity, an initial model used to detect objects was succeeded by a later model to identify objects in a ‘nature world’.

Power and energy estimates

From the log record, it was deduced that the VLU was started in powermetrics sample 10 (1.0 seconds elapsed), and essentially complete by sample 75 (7.5 seconds elapsed), a period of approximately 6.5 seconds, following which power use was low until the end of the sampling periods. All subsequent calculations refer to that series of samples and period of time.

Sums, averages and maxima of power measurements for that period of 6.5 seconds are:

CPU 64,289 mW total, 989 mW average, 7,083 mW maximum (10 P cores)
GPU 3,151 mW total, 48 mW average, 960 mW maximum (20 cores)
ANE 1,551 mW total, 24 mW average, 671 mW maximum
total 68,991 mW total, 1,061 mW average, 7,083 mW maximum.

Thus for the whole VLU, 93% of power was used by the CPU, 4.6% by the GPU, and only 2.2% by the ANE.

For comparison, in the M4 Pro chip running maximal in-core loads, each P core can use 1.3 W running floating point code, and 3 W running NEON code. The chip’s 20-core GPU was previously measured as using a steady maximum power of 20 W, with peaks at 25 W.

As each power sample covers 0.1 seconds, energy used during each sampling period is power/0.1, thus the total energy used over the 6.5 second period of VLU is:

CPU 6.4 J
GPU 0.3 J
ANE 0.2 J
total 6.9 J.

Those are small compared to the test threads used previously, which cost 3-8 J for each P core used.

Power over time

Power used in each 100 ms sampling period varied considerably over the whole 10 seconds. The chart below shows total power for the CPU.

Highest power was recorded between samples 10-25, corresponding to 1.0-2.5 seconds elapsed since the start of measurements, and most events identified in the log. Later bursts of power use occurred at about 4.2 seconds, and between 6.6-7.1 seconds, which most probably corresponded to opening the info window and performing the selected look-up.

Almost all power use by the neural engine occurred between 1.5-2.1 seconds, correlating well with the period in which substantial models were being run.

Peak GPU power use occurred around 1.0-1.5 seconds when the image was first displayed, at 3.1-3.2 seconds, and between 6.5-7.4 seconds. It’s not known whether any of those were the result of image processing for VLU as GPU-related log entries are unusual.

Composite total power use demonstrates how small and infrequent ANE and GPU use was in comparison to that of the CPU.

Conclusions

Given the limitations of this single set of measurements, I suggest that, on Apple silicon Macs

power and energy cost of VLU is remarkably low;
the great majority of work done in VLU is performed in the CPU;
although use of the neural engine may result in substantial performance improvements, VLU doesn’t make heavy demands on the neural engine in terms of power or energy use;
VLU may appear impressive, but it’s not actually that demanding on the capability of the hardware.

The Eclectic Light Company
Encryption and checking hashes slows faster SSDs
17 July 2025 at 14:30

Encryption and checking hashes slows faster SSDs

The Eclectic Light Company

By: hoakley

17 July 2025 at 14:30

It’s commonly claimed that software encryption, as used in APFS Encrypted format, incurs negligible overhead. The last time I looked at that was with Thunderbolt 3 SSDs connected to a Mac Studio M1 Max, when I found that varied according to the SSD. One of the three I tested then did show significant reductions in encrypted write speed, from 2.2 to 1.8 GB/s, but the fastest showed no change from its unencrypted write speed of 2.8 GB/s. This article reports new test results from a Mac mini M4 Pro with faster SSDs, one Thunderbolt 5 and the other USB4, and adds data for computing SHA256 hashes.

These are of particular interest, as not only are the unencrypted transfer speeds for both SSDs significantly higher than Thunderbolt 3, but the host has significantly faster CPU cores.

Two sets of measurements were made on each of the two SSDs:

Stibium 1.2 running on macOS 15.5 Sequoia was used to measure read and write speeds over randomised sequences of a total of 53 GB in 160 files of 2 MB to 2 GB individual size.
Stibium was used to measure the single file read speed of a 16.8 GB IPSW file, and Dintch was used to measure the time taken to stream the file in and compute its SHA256 digest, using CryptoKit.

Read and write speeds

Results of the first series of tests showed both SSDs performed as expected when using plain APFS, with read and write speeds of 5.3 GB/s for TB5, and 3.7 GB/s for USB4.

Small reductions in read speed were seen in both SSDs when using APFS Encrypted, to about 98% and 95% of their unencrypted read speed. Although there was a similar small reduction in write speed for USB4, to 97%, that seen in the Thunderbolt 5 SSD was greater, with a fall from 5.3 to 4.7 GB/s (89%). Both sets of tests were repeated for that SSD, allowing ample time for the SLC cache to be emptied after each set of write tests, and results remained essentially the same.

Although write speed to APFS Encrypted for this Thunderbolt 5 SSD remained well above that for USB4, encryption brought a reduction in speed of just over 10%, more than I had anticipated.

Hash computation

SHA256 and SHA512 digests are now used to check file data integrity. Both are computationally intensive, and I have previously reported that reading files of substantial size and computing their digests using CryptoKit proceeds at about 3 GB/s for files stored on the fast internal SSD of a Mac mini M4 Pro.

With the Thunderbolt 5 SSD, a plain file of 16.8 GB was read at 6.5 GB/s, and encrypted at 4.7 GB/s. SHA256 digest computation was performed at 2.6 GB/s from plain APFS, and 2.2 GB/s from APFS Encrypted, both well below that from the internal SSD, and less than half the speed of just reading the file.

Although the USB4 SSD was inevitably slower on the read tests, at 3.8 GB/s, encryption had little effect, at 3.7 GB/s. SHA256 digest computation was, if anything, faster than with Thunderbolt 5, at 2.8 GB/s plain, and 2.7 GB/s encrypted.

Conclusions

Although there may well be differences with other Thunderbolt 5 and USB4 SSDs, and more extensive results would be helpful:

Whether plain or encrypted APFS, Thunderbolt 5 SSDs are substantially faster than USB4.
Encryption can result in significantly lower write speeds on some Thunderbolt 5 SSDs.
Otherwise, encryption has only small effects on read and write speeds.
Computation of SHA256 digests is significantly slower than encryption, and ranges between 2.2-2.8 GB/s on larger files.
This suggests that, even in faster M4 chips, CPU performance limits the speed of software encryption, and even more so for SHA256 digest computation.