Reading view

There are new articles available, click to refresh the page.

Tune for Performance: Activity Monitor’s CPU view

11 December 2024 at 15:30

Activity Monitor is one of the unsung heroes of macOS. If you understand how to interpret its rich streams of measurements, it’s thoroughly reliable and has few vices. When tuning for performance, it should be your starting point, a first resort before you dig deeper with more specialist tools such as Xcode’s Instruments or powermetrics.

How Activity Monitor works

Like Xcode’s Instruments and powermetrics, the CPU view samples CPU and GPU activity over brief periods of time, displays results for the last sampling period, and updates those every 1-5 seconds. You can change the sampling period used in the Update Frequency section of the View menu, and that should normally be set to Very Often, for a period of 1 second.

This was normally adequate for many purposes with older M-series chips, but thread mobility in M4 chips can be expected to move threads from core to core, and between whole clusters, at a frequency similar to available sampling periods, and that loses detail as to what’s going on in cores and clusters, and can sometimes give the false impression that a single thread is running simultaneously on multiple cores.

However, sampling frequency also determines how much % CPU is taken by Activity Monitor itself. While periods of 0.1 second and less are feasible with powermetrics, in Activity Monitor they would start to affect its results. If you need to see finer details, then you’ll need to use Xcode Instruments or powermetrics instead.

% CPU

Central to the CPU view, and its use in tuning for performance, is what Activity Monitor refers to as % CPU, which it defines as the “percentage of CPU capability that’s being used by processes”. As far as I can tell, this is essentially the same as active residency in powermetrics, and it’s vital to understand its strengths and shortcomings.

Take a CPU core that’s running at 1 GHz. Every second it ‘ticks’ forward one billion times. If an instruction were to take just one clock cycle, then it could execute a billion of those every second. In any given second, that core is likely to spend some time idle and not executing any instructions. If it were to execute half a billion instructions in any given second, and spend the other half of the time idle, then it has an idle residency of 50% and an active residency of 50%, and that would be represented by Activity Monitor as 50% CPU. So a CPU core that’s fully occupied executing instructions, and doesn’t idle at all, has an active residency of 100%.

To arrive at the % CPU figures shown in Activity Monitor, the active residency of all the CPU cores is added together. If your Mac has four P and four E cores and they’re all fully occupied with 100% active residency each, then the total % CPU shown will be 800%.

There are two situations where this can be misleading if you’re not careful. Intel CPUs feature Hyper-threading, where each physical core acquires a second, virtual core that can also run at another 100% active residency. In the CPU History window those are shown as even-numbered cores, and in % CPU they double the total percentage. So an 8-core Intel CPU then has a total of 16 cores, and can reach 1600% when running flat out with Hyper-threading.

coremanintel

cpuendstop

The other situation affects Apple silicon chips, as their CPU cores can be run at a wide range of different frequencies according to control by macOS. However, Activity Monitor makes no allowance for their frequency. When it shows a core or total % CPU, that could be running at a frequency as low as 600 MHz in the M1, or as high as 4,512 MHz in the M4, nine times as fast. Totalling these percentages also makes no allowance for the different processing capacity of P and E cores.

Thus an M4 chip’s CPU cores could show a total of 400% CPU when all four E cores are running at 1,020 MHz with 100% active residency, or when four of its P cores are running at 4,512 MHz with 100% active residency. Yet the P cores would have an effective throughput of as much as six times that of the E cores. Interpreting % CPU isn’t straightforward, as nowhere does Activity Monitor provide core frequency data.

tuneperf1

In this example from an M4 Pro, the left of each trace shows the final few seconds of four test threads running on the E cores, which took 99 seconds to complete at a frequency of around 1,020 MHz, then in the right exactly the same four test threads completed in 23 seconds on P cores running at nearer 4,000 MHz. Note how lightly loaded the P cores appear, although they’re executing the same code at almost four times the speed.

Threads and more

For most work, you should display all the relevant columns in the CPU view, including Threads and GPU.

tuneperf2

Threads are particularly important for processes to be able run on multiple cores simultaneously, as they’re fairly self-contained packages of executable code that macOS can allocate to a core to run. Processes that consist of just a single thread may get shuffled around between different cores, but can’t run on more than one of them at a time.

Another limitation of Activity Monitor is that it can’t tell you which threads are running on each core, or even which type of core they’re running on. When there are no other substantial threads active, you can usually guess which threads are running where by looking in the CPU History window, but when there are many active threads on both E and P cores, you can’t tell which process owns them.

You can increase your chances of being able to identify which cores are running which threads by minimising all other activity, quitting all other open apps, and waiting until routine background activities such as Spotlight have gone quiet. But that in itself can limit what you can assess using Activity Monitor. Once again, Xcode Instruments and powermetrics can make such distinctions, but can easily swamp you with details.

GPU and co-processors

Percentages given for the GPU appear to be based on a similar concept of active residency without making any allowance for frequency. As less is known about control of GPU cores, that too must be treated carefully.

Currently, Activity Monitor provides no information on the neural processing unit (ANE), or the matrix co-processor (AMX) believed to be present in Apple silicon chips, and that available using other tools is also severely limited. For example, powermetrics can provide power use for the ANE, but doesn’t even recognise the existence of the AMX.

Setting up

In almost all cases, you should set the CPU view to display All processes using the View menu. Occasionally, it’s useful to view processes hierarchically, another setting available there.

Extras

Activity Monitor provides some valuable extras in its toolbar. Select a process and the System diagnostics options tool (with an ellipsis … inside a circle) can sample that process every millisecond to provide a call graph and details of binary images. The same menu offers to run a spindump, for investigating processes that are struggling, and to run sysdiagnose and Spotlight diagnostic collections. The Inspect selected process tool ⓘ provides information on memory, open files and ports, and more.

Explainers

Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.

multiterms1

When an app is run, there’s a single runtime instance created from its single executable code. This is given its own virtual memory and access to system resources that it needs. This is a process, and listed as a separate entity in Activity Monitor.

Each process has a main thread, a single flow of code execution, and may create additional threads, perhaps to run in the background. Threads don’t get their own virtual memory, but share that allocated to the process. They do, though, have their own stack. On Apple silicon Macs they’re easy to tell apart as they can only run on a single core, although they may be moved between cores, sometimes rapidly. In macOS, threads are assigned a Quality of Service (QoS) that determines how and where macOS runs them. Normally the process’s main thread is assigned a high QoS, and background threads that it creates may be given the same or lower QoS, as determined by the developer.

Within each thread are individual tasks, each a quantity of work to be performed. These can be brief sections of code and are more interdependent than threads. They’re often divided into synchronous and asynchronous tasks, depending on whether they need to be run as part of a strict sequence. Because they’re all running within a common thread, they don’t have individual QoS although they can have priorities.

Which M4 chip and model?

The Eclectic Light Company

hoakley

7 November 2024 at 15:30

In the light of recent news, you might now be wondering whether you can afford to wait until next year in the hope that Apple then releases the M4 Mac of your dreams. To help guide you in your decision-making, this article explains what chip options are available in this month’s new M4 models, and how to choose between them.

CPU core types

Intel CPUs in modern Macs have several cores, all of them identical. Whether your Mac is running a background task like indexing for Spotlight, or running code for a time-critical user task, code is run across any of the available cores. In an Apple silicon chip like those in the M4 family, background tasks are normally constrained to efficiency (E) cores, leaving the performance (P) cores for your apps and other pressing user tasks. This brings significant energy economy for background tasks, and keeps your Mac more responsive to your demands.

Some tasks are normally constrained to run only on E cores. These include scheduled background tasks like Spotlight indexing, Time Machine backups, and some encoding of media. Game Mode is perhaps a more surprising E core user, as explained below.

Most user tasks are run preferentially on P cores, when they’re available. When there are more high-priority threads to be run than there are available P cores, then macOS will normally send them to be run on E cores instead. This also applies to threads running a Virtual Machine (VM) using lightweight virtualisation, whose threads will be preferentially scheduled on P cores when they’re available, even when code being run in the VM would normally be allocated to E cores.

macOS also controls the clock speed or frequency of cores. For background tasks running on E cores, their frequency is normally held relatively low, for best energy efficiency. When high-priority threads overspill onto E cores, they’re normally run at higher frequency, which is less energy-efficient but brings their performance closer to that of a P core. macOS goes to great lengths to schedule threads and control core frequencies to strike the best balance between energy efficiency and performance.

Unfortunately, it’s normally hard to see effects of frequency in apps like Activity Monitor. Its CPU % figures only show the percentage of cycles that are used for processing, and make no allowance for core frequency. It will therefore show a background thread running at low frequency but 100%, the same as a thread overspilt from P cores running at the maximum frequency of that E core. So when you see Spotlight indexing apparently taking 200% of CPU % on your Mac’s E cores, that might only be a small fraction of their maximum capacity if they were running at maximum frequency.

There are no differences between chips in the M4 family when it comes to each type of CPU core: each P core in a Base variant is the same as each in an M4 Pro or Max, with the same maximum frequency, and the same applies to E cores. macOS also allocates threads to different types of core using the same rules, and their frequencies are controlled the same as well. What differs between them is the number of each type of core, ranging from 4 P and 4 E in the 8-core variant of the Base M4, up to 12 P and 4 E in the 16-core variant of the M4 Max. Thus, their single-core benchmark results should be almost identical, although their multi-core results should vary according to the number of cores.

Game Mode

This mode is an exception to normal CPU and GPU core use, as it:

gives preferential access to the E cores,
gives highest priority access to the GPU,
uses low-latency Bluetooth modes for input controllers and audio output.

However, my previous testing didn’t demonstrate that apps running in Game Mode were given exclusive access to E cores. But for gamers, it now appears that the more E cores, the better.

GPU cores

These are also used for tasks other than graphics, such as some of the more demanding calculations required for Machine Learning and AI. However, experience so far with Writing Tools in Sequoia 15.1 is that macOS currently offloads their heavy lifting to be run off-device in one of Apple’s dedicated servers. Although having plenty of GPU cores might well be valuable for non-graphics purposes in the future, for now there seems little advantage for many.

Thunderbolt 5

M4 Pro and Max, but not Base variants, come equipped with Thunderbolt ports that not only support Thunderbolt 3 and 4, but 5, as well as USB4. Thunderbolt 5 should effectively double the speed of connected TB5 SSDs, but to see that benefit, you’ll need to buy a TB5 SSD. Not only are they more expensive than TB3/4 models, but at present I know of only one range that’s due to ship this year. There will also be other peripherals with TB5 support, including at least one dock and one hub, although neither is available yet. The only TB5 accessories that are already available are cables, and even they are expensive.

TB5 also brings increased video bandwidth and support for DisplayPort 2.1, although even the M4 Max can’t make full use of that. If you’re looking to drive a combination of high-res displays, consult Apple’s Tech Specs carefully, as they’re complicated.

Although TB5 will become increasingly important over the next few years, TB3/4 and USB4 are far from dead yet and are supported by all M4 models.

Which M4 chip?

The table below summarises key figures for each of the variants in the M4 family that have now been released. It’s likely that next year Apple will release an Ultra, consisting of two M4 Max chips joined in tandem, in case you feel the burning desire for 24 P and 8 E cores.

m4configs2

Models available next week featuring each M4 chip are shown with green rectangles at the right.

There are two variants of the Base M4, one with 4P + 4E and 8 GPU cores, the same as Base variants in M1 to M3 families. There’s also the more capable variant, for the first time with 4P + 6E, which promises to be a better all-rounder, and when in Game Mode. It also has an extra couple of GPU cores.

The M4 Pro also comes in two variants, this time differing in the number of P cores, 8 or 10, and GPU cores, 16 or 20. Those overlap with the M4 Max, with 10 or 12 P cores and 32 or 40 GPU cores. Thus the gap between M4 Pro and Max isn’t as great as in the M3, with the GPUs in the M4 Max being aimed more at those working with high-res video, for instance. For more general use, there’s little difference between the 14-core Pro and Max.

Memory and storage

Chips in the M4 family also determine the maximum memory and internal SSD capacity. Apple has at last eliminated base models with only 8 GB of memory, and all now start with at least 16 GB. Base M4 chips are limited to a maximum of 32 GB, while the M4 Pro can go up to 64 GB, and the 16-core Max up to 128 GB, although in its 14-core variant, the Max is only available with 36 GB (I’m very grateful to Thomas for pointing this out below).

Unfortunately, Apple hasn’t increased the minimum size of internal SSD, which remains at 256 GB for some base models. Smaller SSDs may be cheaper, but they are also likely to have shorter lives, as under heavy use their small number of blocks will be erased for reuse more frequently. That may shorten their life expectancy to much less than the normal period of up to 10 years, as was seen in some of the first M1 models. This is more likely to occur when swap space is regularly used for virtual memory. I for one would have preferred 512 GB as a starting point.

While Base M4 chips come with SSDs up to 2 TB in size, both Pro and Max can be supplied with internal SSDs of up to 8 TB.

I hope this proves useful in guiding your decision.

CPU 和 GPU - 异构计算的演进与发展

面向信仰编程

9 April 2021 at 08:00