Reading view

There are new articles available, click to refresh the page.

Which cores does Visual Look Up use?

A couple of weeks ago I estimated how much power and energy were used when performing Visual Look Up (VLU) on an Apple silicon Mac, and was surprised to discover how little that was, concluding that “it’s not actually that demanding on the capability of the hardware”. This article returns to those measurements and looks in more detail at what the CPU cores and GPU were doing.

That previous article gives full details of what I did. In brief, this was performed on a Mac mini M4 Pro running macOS Sequoia 15.6.1, using an image of cattle in a field, opened in Preview. powermetrics collected samples in periods of 100 ms throughout, and a full log extract was obtained to relate time to logged events.

Power use by CPU cores, GPU and neural engine (ANE) are shown in this chart from that article. This tallies against log records for the main work in VLU being performed in samples 10-24, representing a time interval of approximately 1.0-2.4 seconds after the start. There were also briefer periods of activity around 3.2 seconds on the GPU, 4.2 seconds on the CPU, and 6.6-7.1 seconds on the CPU. The latter correlated with online access to Apple’s SMOOT service to populate and display the VLU window.

To gain further detail, powermetrics measurements of CPU core cluster frequencies, active residencies of each core, and GPU frequency and active residency, were analysed for the first 80 collection periods.

Frequency and active residency

Cluster frequencies in MHz are shown in the chart above for the one E and two P clusters, and the GPU. These show:

  • The E cores (black) ran at a baseline of 1200-1300 MHz for much of the time, reaching their maximum frequency of 2592 MHz during the main VLU period at 1.0-2.4 seconds.
  • The first P cluster (blue), P0, was active in short bursts over the first 1.5 seconds, and again between 6.3-7.0 seconds. For the remainder of the period the cluster was shut down.
  • The second P cluster (red), P1, was most active during the three periods of high power use, although it didn’t quite reach its maximum frequency of 4512 MHz. When there was little core activity, it was left to idle at 1260 MHz but wasn’t shut down.
  • The GPU (yellow) ran at 338 MHz or was shut down for almost all the time, with one brief peak at 927 MHz.

This chart shows the total active residencies for each of the three CPU clusters, obtained by adding their % measurements. Thus the maximum for the E cluster is 400%, and 500% for each of the two P clusters, and 1,400% in all. These are broadly equivalent to the CPU % shown in Activity Monitor, and take no account of frequency. These show:

  • The E cores (pale blue) had the highest active residency throughout, ranging from as little as 30% when almost idle around 5 seconds, to just over 300% during the main VLU phase at 1.4 seconds.
  • The first P cluster (purple) remained almost inactive throughout.
  • The second P cluster (red) was only active during the periods of highest work, particularly between 1.0-2.4 seconds and again at 6.4-7.1 seconds. For much of the rest of the test it had close to zero active residency.

Taken together, these show that a substantial proportion of the processing undertaken in VLU was performed by the E cores, with shorter peaks of activity in some of the cores in the second P cluster. For much of the time, though, all ten P cores were either idle or shut down.

Load

Combining frequency and active residency into a single value is difficult for the two types of CPU core. To provide a rough metric, I have calculated ‘cluster load’ as
total cluster active residency x (cluster frequency / maximum core frequency)
where the maximum frequency of these E cores is taken as 2592 MHz, and the P cores as 4512 MHz. For example, in the sample period at 2.2 seconds, the P1 cluster frequency was 4449 MHz, and the total active residency for the five cores was 122%. Thus the P1 cluster load was 122 x (4449/4512) = 120.3%. Maximum load for that cluster would have been 500%.

The chart above shows load values for:

  • The E cluster (black) riseing to 150-260% during the peak of VLU activity, from a baseline of 20-30%.
  • The P0 cluster (blue) which never reached 10% after the initial sample at 0 seconds.
  • The P1 cluster (red) spiking at 90-150% during the three most active phases, otherwise remaining below 10%.

Caution is required when comparing E with P cores on this measurement, as not only is E core maximum frequency only 57% that of P cores, but it’s generally assumed that their maximum processing capacity is roughly half that of P cores. Even with that reservation, it’s clear that a substantial proportion of the processing performed in this VLU was on the E cores, with just one cluster of P cores active in short spikes.

Finally, it’s possible to examine the correlation between total P cluster load and total CPU power.

This chart shows calculated total P load and reported total CPU power use. The linear regression shown is
CPU power = 4.1 + (42.2 x total load)
giving a power use of 4,200 mW for a load of 100%, equating to a single P core running at maximum frequency.

Conclusions

  • Cluster frequencies and active residencies measured in CPU cores followed the same phases as seen in CPU power, with most of the processing load of VLU in the the early stage, between 1.0-2.4 seconds, a shorter peak at 6.6-7.1 seconds correlating with online lookup, and a small peak at about 4.2 seconds.
  • A substantial proportion of the processing performed for VLU was run on E rather than P cores, with P cores only being used for brief periods.
  • Visual Look Up used remarkably little of the capability of an M4 Pro chip.

How much power does Visual Look Up use?

Look in the log, and Visual Look Up (VLU) on an Apple silicon Mac apparently involves a great deal of work, in both CPU cores and the neural engine (ANE). This article reports a first attempt to estimate power and energy use for a single VLU on an image.

To estimate this, I measured CPU, GPU and ANE power in sampling periods of 100 ms using powermetrics, and correlated events seen there with those recorded in a log extract over the same period, obtained using LogUI. The test was performed on a Mac mini M4 Pro running macOS 15.6.1, using Preview to perform the VLU on a single image showing a small group of cattle in an upland field. Power measurements were collected from a moment immediately before opening the image, and ceased several seconds after VLU was complete.

When used like this, powermetrics imposes remarkably little overhead on the CPU cores, but its sampling periods are neither exact nor identical. This makes it difficult to correlate log entries and their precise timestamps with sampling periods. While powermetrics gives power use in mW, those measurements aren’t calibrated and making assumptions about their accuracy is hazardous. Nevertheless, they remain the best estimates available.

Log narrative

The first step in log analysis was to identify the starting time of powermetrics sampling periods. Although execution of that command left no trace in its entries, as it has to be run with elevated privileges using sudo, its approval was obvious in entries concluding
30.677182 com.apple.opendirectoryd ODRecordVerifyPassword completed
A subsequent entry at 30.688828 seconds was thus chosen as the start time for sampling periods, and all times given below as given in seconds after that time zero.

The following relevant events were identified in the log extract at elapsed times given in seconds:

  • 1.3 com.apple.VisionKit Signpost Begin: “VisionKit MAD Parse Request”
  • 1.3 com.apple.mediaanalysis Running task VCPMADServiceImageProcessingTask
  • 1.4 ANE started and an ObjectDetectionModel run for 0.2 s
  • 1.6 ANE activity and a NatureWorldModel run for 0.25 s
  • 2.0 ANE activity for 0.15 s
  • 2.4 ANE activity for 0.1 s
  • 8.1 ANE activity and a UnifiedModel run for 0.01 s
  • 8.1 PegasusKit queried Apple’s SMOOT service, the external connection used to populate the VLU window.

Thus, the ANE was run almost continuously from 1.4-2.2 seconds after the start of sampling, otherwise was used little over the total period of about 9 seconds. Over that period of activity, an initial model used to detect objects was succeeded by a later model to identify objects in a ‘nature world’.

Power and energy estimates

From the log record, it was deduced that the VLU was started in powermetrics sample 10 (1.0 seconds elapsed), and essentially complete by sample 75 (7.5 seconds elapsed), a period of approximately 6.5 seconds, following which power use was low until the end of the sampling periods. All subsequent calculations refer to that series of samples and period of time.

Sums, averages and maxima of power measurements for that period of 6.5 seconds are:

  • CPU 64,289 mW total, 989 mW average, 7,083 mW maximum (10 P cores)
  • GPU 3,151 mW total, 48 mW average, 960 mW maximum (20 cores)
  • ANE 1,551 mW total, 24 mW average, 671 mW maximum
  • total 68,991 mW total, 1,061 mW average, 7,083 mW maximum.

Thus for the whole VLU, 93% of power was used by the CPU, 4.6% by the GPU, and only 2.2% by the ANE.

For comparison, in the M4 Pro chip running maximal in-core loads, each P core can use 1.3 W running floating point code, and 3 W running NEON code. The chip’s 20-core GPU was previously measured as using a steady maximum power of 20 W, with peaks at 25 W.

As each power sample covers 0.1 seconds, energy used during each sampling period is power/0.1, thus the total energy used over the 6.5 second period of VLU is:

  • CPU 6.4 J
  • GPU 0.3 J
  • ANE 0.2 J
  • total 6.9 J.

Those are small compared to the test threads used previously, which cost 3-8 J for each P core used.

Power over time

Power used in each 100 ms sampling period varied considerably over the whole 10 seconds. The chart below shows total power for the CPU.

Highest power was recorded between samples 10-25, corresponding to 1.0-2.5 seconds elapsed since the start of measurements, and most events identified in the log. Later bursts of power use occurred at about 4.2 seconds, and between 6.6-7.1 seconds, which most probably corresponded to opening the info window and performing the selected look-up.

Almost all power use by the neural engine occurred between 1.5-2.1 seconds, correlating well with the period in which substantial models were being run.

Peak GPU power use occurred around 1.0-1.5 seconds when the image was first displayed, at 3.1-3.2 seconds, and between 6.5-7.4 seconds. It’s not known whether any of those were the result of image processing for VLU as GPU-related log entries are unusual.

Composite total power use demonstrates how small and infrequent ANE and GPU use was in comparison to that of the CPU.

Conclusions

Given the limitations of this single set of measurements, I suggest that, on Apple silicon Macs

  • power and energy cost of VLU is remarkably low;
  • the great majority of work done in VLU is performed in the CPU;
  • although use of the neural engine may result in substantial performance improvements, VLU doesn’t make heavy demands on the neural engine in terms of power or energy use;
  • VLU may appear impressive, but it’s not actually that demanding on the capability of the hardware.

❌