Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Tune for Performance: Activity Monitor’s CPU view

By: hoakley
11 December 2024 at 15:30

Activity Monitor is one of the unsung heroes of macOS. If you understand how to interpret its rich streams of measurements, it’s thoroughly reliable and has few vices. When tuning for performance, it should be your starting point, a first resort before you dig deeper with more specialist tools such as Xcode’s Instruments or powermetrics.

How Activity Monitor works

Like Xcode’s Instruments and powermetrics, the CPU view samples CPU and GPU activity over brief periods of time, displays results for the last sampling period, and updates those every 1-5 seconds. You can change the sampling period used in the Update Frequency section of the View menu, and that should normally be set to Very Often, for a period of 1 second.

This was normally adequate for many purposes with older M-series chips, but thread mobility in M4 chips can be expected to move threads from core to core, and between whole clusters, at a frequency similar to available sampling periods, and that loses detail as to what’s going on in cores and clusters, and can sometimes give the false impression that a single thread is running simultaneously on multiple cores.

However, sampling frequency also determines how much % CPU is taken by Activity Monitor itself. While periods of 0.1 second and less are feasible with powermetrics, in Activity Monitor they would start to affect its results. If you need to see finer details, then you’ll need to use Xcode Instruments or powermetrics instead.

% CPU

Central to the CPU view, and its use in tuning for performance, is what Activity Monitor refers to as % CPU, which it defines as the “percentage of CPU capability that’s being used by processes”. As far as I can tell, this is essentially the same as active residency in powermetrics, and it’s vital to understand its strengths and shortcomings.

Take a CPU core that’s running at 1 GHz. Every second it ‘ticks’ forward one billion times. If an instruction were to take just one clock cycle, then it could execute a billion of those every second. In any given second, that core is likely to spend some time idle and not executing any instructions. If it were to execute half a billion instructions in any given second, and spend the other half of the time idle, then it has an idle residency of 50% and an active residency of 50%, and that would be represented by Activity Monitor as 50% CPU. So a CPU core that’s fully occupied executing instructions, and doesn’t idle at all, has an active residency of 100%.

To arrive at the % CPU figures shown in Activity Monitor, the active residency of all the CPU cores is added together. If your Mac has four P and four E cores and they’re all fully occupied with 100% active residency each, then the total % CPU shown will be 800%.

There are two situations where this can be misleading if you’re not careful. Intel CPUs feature Hyper-threading, where each physical core acquires a second, virtual core that can also run at another 100% active residency. In the CPU History window those are shown as even-numbered cores, and in % CPU they double the total percentage. So an 8-core Intel CPU then has a total of 16 cores, and can reach 1600% when running flat out with Hyper-threading.

coremanintel

cpuendstop

The other situation affects Apple silicon chips, as their CPU cores can be run at a wide range of different frequencies according to control by macOS. However, Activity Monitor makes no allowance for their frequency. When it shows a core or total % CPU, that could be running at a frequency as low as 600 MHz in the M1, or as high as 4,512 MHz in the M4, nine times as fast. Totalling these percentages also makes no allowance for the different processing capacity of P and E cores.

Thus an M4 chip’s CPU cores could show a total of 400% CPU when all four E cores are running at 1,020 MHz with 100% active residency, or when four of its P cores are running at 4,512 MHz with 100% active residency. Yet the P cores would have an effective throughput of as much as six times that of the E cores. Interpreting % CPU isn’t straightforward, as nowhere does Activity Monitor provide core frequency data.

tuneperf1

In this example from an M4 Pro, the left of each trace shows the final few seconds of four test threads running on the E cores, which took 99 seconds to complete at a frequency of around 1,020 MHz, then in the right exactly the same four test threads completed in 23 seconds on P cores running at nearer 4,000 MHz. Note how lightly loaded the P cores appear, although they’re executing the same code at almost four times the speed.

Threads and more

For most work, you should display all the relevant columns in the CPU view, including Threads and GPU.

tuneperf2

Threads are particularly important for processes to be able run on multiple cores simultaneously, as they’re fairly self-contained packages of executable code that macOS can allocate to a core to run. Processes that consist of just a single thread may get shuffled around between different cores, but can’t run on more than one of them at a time.

Another limitation of Activity Monitor is that it can’t tell you which threads are running on each core, or even which type of core they’re running on. When there are no other substantial threads active, you can usually guess which threads are running where by looking in the CPU History window, but when there are many active threads on both E and P cores, you can’t tell which process owns them.

You can increase your chances of being able to identify which cores are running which threads by minimising all other activity, quitting all other open apps, and waiting until routine background activities such as Spotlight have gone quiet. But that in itself can limit what you can assess using Activity Monitor. Once again, Xcode Instruments and powermetrics can make such distinctions, but can easily swamp you with details.

GPU and co-processors

Percentages given for the GPU appear to be based on a similar concept of active residency without making any allowance for frequency. As less is known about control of GPU cores, that too must be treated carefully.

Currently, Activity Monitor provides no information on the neural processing unit (ANE), or the matrix co-processor (AMX) believed to be present in Apple silicon chips, and that available using other tools is also severely limited. For example, powermetrics can provide power use for the ANE, but doesn’t even recognise the existence of the AMX.

Setting up

In almost all cases, you should set the CPU view to display All processes using the View menu. Occasionally, it’s useful to view processes hierarchically, another setting available there.

Extras

Activity Monitor provides some valuable extras in its toolbar. Select a process and the System diagnostics options tool (with an ellipsis … inside a circle) can sample that process every millisecond to provide a call graph and details of binary images. The same menu offers to run a spindump, for investigating processes that are struggling, and to run sysdiagnose and Spotlight diagnostic collections. The Inspect selected process tool ⓘ provides information on memory, open files and ports, and more.

Explainers

Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.

multiterms1

When an app is run, there’s a single runtime instance created from its single executable code. This is given its own virtual memory and access to system resources that it needs. This is a process, and listed as a separate entity in Activity Monitor.

Each process has a main thread, a single flow of code execution, and may create additional threads, perhaps to run in the background. Threads don’t get their own virtual memory, but share that allocated to the process. They do, though, have their own stack. On Apple silicon Macs they’re easy to tell apart as they can only run on a single core, although they may be moved between cores, sometimes rapidly. In macOS, threads are assigned a Quality of Service (QoS) that determines how and where macOS runs them. Normally the process’s main thread is assigned a high QoS, and background threads that it creates may be given the same or lower QoS, as determined by the developer.

Within each thread are individual tasks, each a quantity of work to be performed. These can be brief sections of code and are more interdependent than threads. They’re often divided into synchronous and asynchronous tasks, depending on whether they need to be run as part of a strict sequence. Because they’re all running within a common thread, they don’t have individual QoS although they can have priorities.

Why % CPU in Activity Monitor isn’t what you think

By: hoakley
8 November 2024 at 15:30

One of the most frequently quoted measurements in Activity Monitor is % CPU. When the fans blow, or a Mac becomes sluggish, it’s often the first figure to look at and wonder why that process is pulling over 100%. That’s the first thing you learn: here, per cent doesn’t mean out of 100, but the sum of the real percentage for each core. As this Mac has eight cores, that allows it up to 800%, until you realise that Intel processors can use Hyper-Threading to pretend that each is two cores, making a maximum of 1,600%, not bad for a single processor. Then you come completely unstuck with Apple silicon.

Apple defines % CPU as “the percentage of CPU capability that’s being used”, a phrase that doesn’t appear to have any definition either in Apple’s documentation or elsewhere. So like everyone else, you assume that 100% for a core represents its maximum processing capacity. Only you couldn’t be more incorrect: in truth, the number given as % CPU is nothing like that.

Tests

To examine this more clearly, I have an app that runs tight loops of assembly code to load CPU cores on Apple silicon chips and measure their performance. For this, I got it to run several threads, each consisting of 500 million loops of floating point calculations. By running those at different Quality of Service (QoS) settings, macOS will send them to different core types.

When I run two threads at low QoS so they’re run only on E cores of a MacBook Pro M3 Pro, each takes 10.2 seconds to complete. If I instead run eight threads at high QoS, the first six of those will be run on the P cores, but the remaining two spill over onto two of the E cores, where they complete in just 3.0 seconds each.

Yet, according to % CPU in Activity Monitor, the two threads on the E cores come to 200%, and the two from eight run in the second test also come to 200%. You can see this in the CPU History window.

m3activitymon

Here are the 6 E cores at the top, and 6 P cores below. ① marks the first test on two E cores, and shows the total of 200% spread across all six E cores over the 10.2 seconds it took to complete. ② marks the second test, with all 6 P cores and two E cores hitting 100% each, for a total of 800%, as shown in Activity Monitor’s window.

By now I’m sure you guessing how running the same code on the E cores could vary in speed by a factor of 3.4 (10.2 against 3.0 seconds) although they’re the same % CPU: that measurement doesn’t take into account core frequency.

Frequency

Unlike traditional Intel CPUs, CPU cores in Apple silicon chips can be run at a wide range of frequencies, as set by macOS. To make this frequency control a bit simpler, cores are grouped into clusters, that also share L2 cache, and within each cluster all cores run at the same frequency. The M3 Pro chip consists of two clusters, one of 6 E cores, the other of 6 P cores.

When macOS loads two of those E cores with low QoS threads, it sets them to run at a low frequency to make the most of their energy efficiency. When it loads two E cores with threads that were intended to be run on P cores, as they have a high QoS, it runs the E cluster up to high frequency, so they perform better.

Measuring frequency of CPU cores is straightforward using the powermetrics command tool. When those threads were being run on E cores at low QoS, frequency was held at their minimum of 744 MHz, but run at high QoS their frequency was set to maximum, at 2748 MHz, 3.7 times faster. If full load at maximum frequency is 100% of their capability, then at low QoS they were really running at 27%, not the 100% given by Activity Monitor, and that largely accounts for the difference in time that those threads took to complete their floating point calculations.

What is % CPU?

What Activity Monitor actually shows as % CPU or “percentage of CPU capability that’s being used” is what’s better known as active residency of each core, that’s the percentage of processor cycles that aren’t idle, but actively processing threads owned by a given process. But it doesn’t take into account the frequency or clock speed of the core at that time, nor the difference in core throughput between P and E cores.

The next time that you open Activity Monitor on an Apple silicon Mac, to look at % CPU figures, bear in mind that while those numbers aren’t completely meaningless, they don’t mean what you might think, and what’s shown as 100% could be anything between 27-100%.

❌
❌