Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Which cores does Visual Look Up use?

By: hoakley
16 September 2025 at 14:30

A couple of weeks ago I estimated how much power and energy were used when performing Visual Look Up (VLU) on an Apple silicon Mac, and was surprised to discover how little that was, concluding that “it’s not actually that demanding on the capability of the hardware”. This article returns to those measurements and looks in more detail at what the CPU cores and GPU were doing.

That previous article gives full details of what I did. In brief, this was performed on a Mac mini M4 Pro running macOS Sequoia 15.6.1, using an image of cattle in a field, opened in Preview. powermetrics collected samples in periods of 100 ms throughout, and a full log extract was obtained to relate time to logged events.

Power use by CPU cores, GPU and neural engine (ANE) are shown in this chart from that article. This tallies against log records for the main work in VLU being performed in samples 10-24, representing a time interval of approximately 1.0-2.4 seconds after the start. There were also briefer periods of activity around 3.2 seconds on the GPU, 4.2 seconds on the CPU, and 6.6-7.1 seconds on the CPU. The latter correlated with online access to Apple’s SMOOT service to populate and display the VLU window.

To gain further detail, powermetrics measurements of CPU core cluster frequencies, active residencies of each core, and GPU frequency and active residency, were analysed for the first 80 collection periods.

Frequency and active residency

Cluster frequencies in MHz are shown in the chart above for the one E and two P clusters, and the GPU. These show:

  • The E cores (black) ran at a baseline of 1200-1300 MHz for much of the time, reaching their maximum frequency of 2592 MHz during the main VLU period at 1.0-2.4 seconds.
  • The first P cluster (blue), P0, was active in short bursts over the first 1.5 seconds, and again between 6.3-7.0 seconds. For the remainder of the period the cluster was shut down.
  • The second P cluster (red), P1, was most active during the three periods of high power use, although it didn’t quite reach its maximum frequency of 4512 MHz. When there was little core activity, it was left to idle at 1260 MHz but wasn’t shut down.
  • The GPU (yellow) ran at 338 MHz or was shut down for almost all the time, with one brief peak at 927 MHz.

This chart shows the total active residencies for each of the three CPU clusters, obtained by adding their % measurements. Thus the maximum for the E cluster is 400%, and 500% for each of the two P clusters, and 1,400% in all. These are broadly equivalent to the CPU % shown in Activity Monitor, and take no account of frequency. These show:

  • The E cores (pale blue) had the highest active residency throughout, ranging from as little as 30% when almost idle around 5 seconds, to just over 300% during the main VLU phase at 1.4 seconds.
  • The first P cluster (purple) remained almost inactive throughout.
  • The second P cluster (red) was only active during the periods of highest work, particularly between 1.0-2.4 seconds and again at 6.4-7.1 seconds. For much of the rest of the test it had close to zero active residency.

Taken together, these show that a substantial proportion of the processing undertaken in VLU was performed by the E cores, with shorter peaks of activity in some of the cores in the second P cluster. For much of the time, though, all ten P cores were either idle or shut down.

Load

Combining frequency and active residency into a single value is difficult for the two types of CPU core. To provide a rough metric, I have calculated ‘cluster load’ as
total cluster active residency x (cluster frequency / maximum core frequency)
where the maximum frequency of these E cores is taken as 2592 MHz, and the P cores as 4512 MHz. For example, in the sample period at 2.2 seconds, the P1 cluster frequency was 4449 MHz, and the total active residency for the five cores was 122%. Thus the P1 cluster load was 122 x (4449/4512) = 120.3%. Maximum load for that cluster would have been 500%.

The chart above shows load values for:

  • The E cluster (black) riseing to 150-260% during the peak of VLU activity, from a baseline of 20-30%.
  • The P0 cluster (blue) which never reached 10% after the initial sample at 0 seconds.
  • The P1 cluster (red) spiking at 90-150% during the three most active phases, otherwise remaining below 10%.

Caution is required when comparing E with P cores on this measurement, as not only is E core maximum frequency only 57% that of P cores, but it’s generally assumed that their maximum processing capacity is roughly half that of P cores. Even with that reservation, it’s clear that a substantial proportion of the processing performed in this VLU was on the E cores, with just one cluster of P cores active in short spikes.

Finally, it’s possible to examine the correlation between total P cluster load and total CPU power.

This chart shows calculated total P load and reported total CPU power use. The linear regression shown is
CPU power = 4.1 + (42.2 x total load)
giving a power use of 4,200 mW for a load of 100%, equating to a single P core running at maximum frequency.

Conclusions

  • Cluster frequencies and active residencies measured in CPU cores followed the same phases as seen in CPU power, with most of the processing load of VLU in the the early stage, between 1.0-2.4 seconds, a shorter peak at 6.6-7.1 seconds correlating with online lookup, and a small peak at about 4.2 seconds.
  • A substantial proportion of the processing performed for VLU was run on E rather than P cores, with P cores only being used for brief periods.
  • Visual Look Up used remarkably little of the capability of an M4 Pro chip.

Last Week on My Mac: Coming soon to your Mac’s neural engine

By: hoakley
7 September 2025 at 15:00

If you’ve read any of my articles here about the inner workings of CPU cores in Apple silicon chips, you’ll know I’m no stranger to using the command tool powermetrics to discover what they’re up to. Last week I attempted something more adventurous when trying to estimate how much power and energy are used in a single Visual Look Up (VLU).

My previous tests have been far simpler: start powermetrics collecting sample periods using Terminal, then run a set number of core-intensive threads in my app AsmAttic, knowing those would complete before that sampling stopped. Analysing dozens of sets of measurements of core active residency, frequency and power use is pedestrian, but there’s no doubt as to when the tests were running, nor which cores they were using.

VLU was more intricate, in that once powermetrics had started sampling, I had to double-click an image to open it in Preview, wait until its Info tool showed stars to indicate that stage was complete, open the Info window, spot the buttons that appeared on recognised objects, select one and click on it to open the Look Up window. All steps had to be completed within the 10 seconds of sampling collections, leaving me with the task of matching nearly 11,000 log entries for that interval against sampling periods in powermetrics' hundred samples.

The first problem is syncing time between the log, which gives each entry down to the microsecond, and the sampling periods. Although the latter are supposed to be 100 ms duration, in practice powermetrics is slightly slower, and most ranged between about 116 and 129 ms. As the start time of each period is only given to the nearest second, it’s impossible to know exactly when each sample was obtained.

Correlating log entries with events apparent in the time-course of power use is also tricky. Some are obvious, and the start of sampling was perhaps the easiest giveaway as powermetrics has to be run using sudo to obtain elevated privileges, which leaves unmistakeable evidence in the log. Clicks made on Preview’s tools are readily missed, though, even when you have a good estimate of the time they occurred.

Thus, the sequence of events is known with confidence, and it’s not hard to establish when VLU was occurring. As a result, estimating overall power and energy use for the whole VLU also has good confidence, although establishing finer detail is more challenging.

The final caution applies to all power measurements made using powermetrics, that those are approximate and uncalibrated. What may be reported as 40 mW could be more like 10 or 100 mW.

In the midst of this abundance of caution, one fact stands clear: VLU hardly stresses any part of an Apple silicon chip. Power used during the peak of CPU core, GPU and neural engine (ANE) activity was a small fraction of the values measured during my previous core-intensive testing. At no time did the ten P cores in my M4 Pro come close to the power used when running more than one thread of intensive floating-point arithmetic, and the GPU and ANE spent much of time twiddling their thumbs.

Yet when Apple released VLU in macOS Monterey, it hadn’t been expecting to be able to implement it at all in Intel chips because of its computational demand. What still looks like magic can now be accomplished with ease even in a base M1 model. And when we care to leave our Macs running, mediaanalysisd will plod steadily through recently saved images performing object recognition and classification to add them to Spotlight’s indexes, enabling us to search images by labels describing their contents. Further digging in Apple’s documentation reveals that VLU and indexing of discovered object types is currently limited by language to English, French, German, Italian, Spanish and Japanese.

Some time in the next week or three, when Apple releases macOS Tahoe, we’ll start seeing Apple silicon Macs stretch their wings with the first apps to use its Foundation Models. These are based on the same Large Language Models (LLMs) already used in Writing Tools, and run entirely on-device, unlike ChatGPT. This has unfortunately been eclipsed by Tahoe’s controversial redesign, but as more developers get to grips with these new AI capabilities, you should start to see increasingly novel features appearing.

What developers will do with them is currently less certain. These LLMs are capable of working with text including dialogue, thus are likely to appear early in games, and should provide specialist variants of more generic Writing Tools. They can also return numbers rather than text, and suggest and execute commands and actions that could be used in predictive automation. Unlike previous support for AI techniques such as neural networks, Foundation Models present a simple, high-level interface that can require just a few lines of code.

If you’ve got an Apple silicon Mac, there’s a lot of potential coming in Tahoe, once you’ve jiggled its settings to accommodate its new style.

How much power does Visual Look Up use?

By: hoakley
2 September 2025 at 14:30

Look in the log, and Visual Look Up (VLU) on an Apple silicon Mac apparently involves a great deal of work, in both CPU cores and the neural engine (ANE). This article reports a first attempt to estimate power and energy use for a single VLU on an image.

To estimate this, I measured CPU, GPU and ANE power in sampling periods of 100 ms using powermetrics, and correlated events seen there with those recorded in a log extract over the same period, obtained using LogUI. The test was performed on a Mac mini M4 Pro running macOS 15.6.1, using Preview to perform the VLU on a single image showing a small group of cattle in an upland field. Power measurements were collected from a moment immediately before opening the image, and ceased several seconds after VLU was complete.

When used like this, powermetrics imposes remarkably little overhead on the CPU cores, but its sampling periods are neither exact nor identical. This makes it difficult to correlate log entries and their precise timestamps with sampling periods. While powermetrics gives power use in mW, those measurements aren’t calibrated and making assumptions about their accuracy is hazardous. Nevertheless, they remain the best estimates available.

Log narrative

The first step in log analysis was to identify the starting time of powermetrics sampling periods. Although execution of that command left no trace in its entries, as it has to be run with elevated privileges using sudo, its approval was obvious in entries concluding
30.677182 com.apple.opendirectoryd ODRecordVerifyPassword completed
A subsequent entry at 30.688828 seconds was thus chosen as the start time for sampling periods, and all times given below as given in seconds after that time zero.

The following relevant events were identified in the log extract at elapsed times given in seconds:

  • 1.3 com.apple.VisionKit Signpost Begin: “VisionKit MAD Parse Request”
  • 1.3 com.apple.mediaanalysis Running task VCPMADServiceImageProcessingTask
  • 1.4 ANE started and an ObjectDetectionModel run for 0.2 s
  • 1.6 ANE activity and a NatureWorldModel run for 0.25 s
  • 2.0 ANE activity for 0.15 s
  • 2.4 ANE activity for 0.1 s
  • 8.1 ANE activity and a UnifiedModel run for 0.01 s
  • 8.1 PegasusKit queried Apple’s SMOOT service, the external connection used to populate the VLU window.

Thus, the ANE was run almost continuously from 1.4-2.2 seconds after the start of sampling, otherwise was used little over the total period of about 9 seconds. Over that period of activity, an initial model used to detect objects was succeeded by a later model to identify objects in a ‘nature world’.

Power and energy estimates

From the log record, it was deduced that the VLU was started in powermetrics sample 10 (1.0 seconds elapsed), and essentially complete by sample 75 (7.5 seconds elapsed), a period of approximately 6.5 seconds, following which power use was low until the end of the sampling periods. All subsequent calculations refer to that series of samples and period of time.

Sums, averages and maxima of power measurements for that period of 6.5 seconds are:

  • CPU 64,289 mW total, 989 mW average, 7,083 mW maximum (10 P cores)
  • GPU 3,151 mW total, 48 mW average, 960 mW maximum (20 cores)
  • ANE 1,551 mW total, 24 mW average, 671 mW maximum
  • total 68,991 mW total, 1,061 mW average, 7,083 mW maximum.

Thus for the whole VLU, 93% of power was used by the CPU, 4.6% by the GPU, and only 2.2% by the ANE.

For comparison, in the M4 Pro chip running maximal in-core loads, each P core can use 1.3 W running floating point code, and 3 W running NEON code. The chip’s 20-core GPU was previously measured as using a steady maximum power of 20 W, with peaks at 25 W.

As each power sample covers 0.1 seconds, energy used during each sampling period is power/0.1, thus the total energy used over the 6.5 second period of VLU is:

  • CPU 6.4 J
  • GPU 0.3 J
  • ANE 0.2 J
  • total 6.9 J.

Those are small compared to the test threads used previously, which cost 3-8 J for each P core used.

Power over time

Power used in each 100 ms sampling period varied considerably over the whole 10 seconds. The chart below shows total power for the CPU.

Highest power was recorded between samples 10-25, corresponding to 1.0-2.5 seconds elapsed since the start of measurements, and most events identified in the log. Later bursts of power use occurred at about 4.2 seconds, and between 6.6-7.1 seconds, which most probably corresponded to opening the info window and performing the selected look-up.

Almost all power use by the neural engine occurred between 1.5-2.1 seconds, correlating well with the period in which substantial models were being run.

Peak GPU power use occurred around 1.0-1.5 seconds when the image was first displayed, at 3.1-3.2 seconds, and between 6.5-7.4 seconds. It’s not known whether any of those were the result of image processing for VLU as GPU-related log entries are unusual.

Composite total power use demonstrates how small and infrequent ANE and GPU use was in comparison to that of the CPU.

Conclusions

Given the limitations of this single set of measurements, I suggest that, on Apple silicon Macs

  • power and energy cost of VLU is remarkably low;
  • the great majority of work done in VLU is performed in the CPU;
  • although use of the neural engine may result in substantial performance improvements, VLU doesn’t make heavy demands on the neural engine in terms of power or energy use;
  • VLU may appear impressive, but it’s not actually that demanding on the capability of the hardware.

❌
❌