Reading view

There are new articles available, click to refresh the page.

Why % CPU in Activity Monitor isn’t what you think

8 November 2024 at 15:30

One of the most frequently quoted measurements in Activity Monitor is % CPU. When the fans blow, or a Mac becomes sluggish, it’s often the first figure to look at and wonder why that process is pulling over 100%. That’s the first thing you learn: here, per cent doesn’t mean out of 100, but the sum of the real percentage for each core. As this Mac has eight cores, that allows it up to 800%, until you realise that Intel processors can use Hyper-Threading to pretend that each is two cores, making a maximum of 1,600%, not bad for a single processor. Then you come completely unstuck with Apple silicon.

Apple defines % CPU as “the percentage of CPU capability that’s being used”, a phrase that doesn’t appear to have any definition either in Apple’s documentation or elsewhere. So like everyone else, you assume that 100% for a core represents its maximum processing capacity. Only you couldn’t be more incorrect: in truth, the number given as % CPU is nothing like that.

Tests

To examine this more clearly, I have an app that runs tight loops of assembly code to load CPU cores on Apple silicon chips and measure their performance. For this, I got it to run several threads, each consisting of 500 million loops of floating point calculations. By running those at different Quality of Service (QoS) settings, macOS will send them to different core types.

When I run two threads at low QoS so they’re run only on E cores of a MacBook Pro M3 Pro, each takes 10.2 seconds to complete. If I instead run eight threads at high QoS, the first six of those will be run on the P cores, but the remaining two spill over onto two of the E cores, where they complete in just 3.0 seconds each.

Yet, according to % CPU in Activity Monitor, the two threads on the E cores come to 200%, and the two from eight run in the second test also come to 200%. You can see this in the CPU History window.

m3activitymon

Here are the 6 E cores at the top, and 6 P cores below. ① marks the first test on two E cores, and shows the total of 200% spread across all six E cores over the 10.2 seconds it took to complete. ② marks the second test, with all 6 P cores and two E cores hitting 100% each, for a total of 800%, as shown in Activity Monitor’s window.

By now I’m sure you guessing how running the same code on the E cores could vary in speed by a factor of 3.4 (10.2 against 3.0 seconds) although they’re the same % CPU: that measurement doesn’t take into account core frequency.

Frequency

Unlike traditional Intel CPUs, CPU cores in Apple silicon chips can be run at a wide range of frequencies, as set by macOS. To make this frequency control a bit simpler, cores are grouped into clusters, that also share L2 cache, and within each cluster all cores run at the same frequency. The M3 Pro chip consists of two clusters, one of 6 E cores, the other of 6 P cores.

When macOS loads two of those E cores with low QoS threads, it sets them to run at a low frequency to make the most of their energy efficiency. When it loads two E cores with threads that were intended to be run on P cores, as they have a high QoS, it runs the E cluster up to high frequency, so they perform better.

Measuring frequency of CPU cores is straightforward using the powermetrics command tool. When those threads were being run on E cores at low QoS, frequency was held at their minimum of 744 MHz, but run at high QoS their frequency was set to maximum, at 2748 MHz, 3.7 times faster. If full load at maximum frequency is 100% of their capability, then at low QoS they were really running at 27%, not the 100% given by Activity Monitor, and that largely accounts for the difference in time that those threads took to complete their floating point calculations.

What is % CPU?

What Activity Monitor actually shows as % CPU or “percentage of CPU capability that’s being used” is what’s better known as active residency of each core, that’s the percentage of processor cycles that aren’t idle, but actively processing threads owned by a given process. But it doesn’t take into account the frequency or clock speed of the core at that time, nor the difference in core throughput between P and E cores.

The next time that you open Activity Monitor on an Apple silicon Mac, to look at % CPU figures, bear in mind that while those numbers aren’t completely meaningless, they don’t mean what you might think, and what’s shown as 100% could be anything between 27-100%.

Last Week on My Mac: M4 incoming

The Eclectic Light Company

hoakley

3 November 2024 at 16:00

Almost exactly a year after it released its first Macs featuring chips in the M3 family, Apple has replaced those with the first M4 models. Benchmarkers and core-counters are now busy trying to understand how these will change our Macs over the coming year or so. Before I reveal which model I have ordered, I’ll try to explain how these change the Mac landscape, concentrating primarily on CPU performance.

CPU cores

CPUs in the first two families, M1 and M2, came in two main designs, a Base variant with 4 Performance and 4 Efficiency cores, and a Pro/Max with 8 P and 2 or 4 E cores, that was doubled-up to make the Ultra something of a beast with its 16 P and 4 or 8 E cores. Last year Apple introduced three designs: the M3 Base has the same 4 P and 4 E CPU core configuration as in the M1 and M2 before it, but its Pro and Max variants are more distinct, with 6 P and 6 E in the Pro, and 10-12 P and 4 E cores in the Max. The M4 family changes this again, improving the Base and bringing the Pro and Max variants closer again.

As these are complicated by sub-variants and binned versions, I have brought the details together in a table.

mcorestable2024

I have set the core frequencies of the M4 in italics, as I have yet to confirm them, and there’s some confusion whether the maximum frequency of the P core is 4.3 or 4.4 GHz.

Each family of CPU cores has successively improved in-core performance, but the greatest changes are the result of increasing maximum core frequencies and core numbers. One crude but practical way to compare them is to total the maximum core frequencies in GHz for all the cores. Strictly speaking, this should take into account differences in processing units between P and E cores, but that also appears to have changed with each family, and is hard to compare. In the table, columns giving Σfn are therefore simply calculated as
(max P core frequency x P core count) + (max E core frequency x E core count)

Plotting those sum core frequencies by variant for each of the four families provides some interesting insights.

mcoresbars2024

Here, each bar represents the sum core frequency of each full-spec variant. Those are grouped by the variant type (Base, Pro, Max, Ultra), and within those in family order (M1 purple, M2 pale blue, M3 dark blue, M4 red). Many trends are obvious, from the relatively low performance expected of the M1 family, except the Ultra, and the changes between families, for example the marked differences in the M4 Pro, and the M3 Max, against their immediate predecessors.

Sum core frequencies fall into three classes: 20-30, 35-45, and greater than 55 GHz. Three of the four chips in the M1 family are in the lowest of those, with only the M1 Ultra reaching the highest. The M4 is the first Base variant to reach the middle class, thanks in part to its additional two E cores. Two of the M4 variants (Pro and Max) have already reached the highest class, and any M4 Ultra would reach far above the top of the chart at 128 GHz.

Real-world performance will inevitably differ, and vary according to benchmark and app used for comparison. Although single-core performance has improved steadily, apps that only run in a single thread and can’t take advantage of multiple cores are likely to show little if any difference between variants in each family.

Game Mode is also of interest for those considering the two versions of the M4 Base, with 4 or 6 E cores. This is because that mode dedicates the E cores, together with the GPU, to the game being played. It’s likely that games that are more CPU-bound will perform significantly better on the six E cores of the 10-Core version of the iMac, which also comes with a 10-core GPU and four Thunderbolt 4 ports.

Memory and GPU

Memory bandwidth is also important, although for most apps we should assume that Apple’s engineers match that with likely demand from CPU, GPU, neural engine, and other parts of the chip. There will always be some threads that are more memory-bound, whose performance will be more dependant on memory bandwidth than CPU or GPU cores.

Although Apple claims successive improvements in GPU performance, the range in GPU cores has started at 8 and attained 32-40 in Max chips. Where the Max variants come into their own is support for multiple high-res displays, and challenging video editing and processing.

Thunderbolt and USB 3

The other big difference in these Macs is support for the new Thunderbolt 5 standard, available only in models with M4 Pro or M4 Max chips; Base variants still only support Thunderbolt 4. Although there are currently almost no Thunderbolt 5 peripherals available apart from an abundant supply of expensive cables, by the end of this year there should be at least one range of SSDs and one dock shipping.

As ever with claimed Thunderbolt performance, figures given don’t tell the whole story. Although both TB4 and USB4 claim ‘up to’ 40 Gb/s transfer rates, in practice external SSD performance is significantly different, with Thunderbolt topping out at about 3 GB/s and USB4 reaching up to 3.4 GB/s. In practice, TB5 won’t deliver the whole of its claimed maximum of 120 Gb/s to a single storage device, and current reports are that will only achieve disk transfers at 6 GB/s, or twice TB4. However, in use that’s close to the expected performance of internal SSDs in Apple silicon Macs, and should make booting from a TB5 external SSD almost indistinguishable in terms of speed.

As far as external ports go, this widens the gap between the M4 Pro Mac mini’s three TB5 ports, which should now deliver 3.4 GB/s over USB4 or 6 GB/s over TB5, and its two USB-C ports that are still restricted to USB 3.2 Gen 2 at 10 Gb/s, equating to 1 GB/s, the same as in M1 models from four years ago.

My choice

With a couple of T2 Macs and a MacBook Pro M3 Pro, I’ve been looking to replace my original Mac Studio M1 Max. As it looks likely that an M4 version of the Studio won’t be announced until well into next year, I’m taking the opportunity to shrink its already modest size to that of a new Mac mini. What better choice than an M4 Pro with 10 P and 4 E cores and a 20-core GPU, and the optional 10 Gb Ethernet? I seldom use the fourth Thunderbolt port on the Studio, and have already ordered a Kensington dock to deliver three TB5 ports from one on the Mac, and I’m sure it will drive my Studio Display every bit as well as the Studio has done.

If you have also been tempted by one of the new Mac minis, I was astonished to discover that three-year AppleCare+ for it costs less than £100, that’s two-thirds of the price that I pay each year for AppleCare+ on my MacBook Pro.

I look forward to diving deep into both my new Mac and Thunderbolt 5 in the coming weeks.

What performance should you get from different types of storage?

The Eclectic Light Company

hoakley

10 September 2024 at 14:30

External storage is invariably sold with ‘up-to’ performance figures. In practice, you’ll seldom realise anything like some write or read speeds claimed. And when it comes to prolonged tasks like that first full Time Machine backup, no matter how fast you thought that drive would be, it always takes longer than expected.

Over the last few years I have tested and reviewed many examples of different types of external storage, from basic USB 3 hard drives, to the latest USB4 SSD enclosures, and NAS packed with fast SSDs. This article draws on all those test results to give you a better idea of what to expect when they’re being used with your Mac.

Results quoted here are typical for those tests performed mostly using a Mac Studio M1 Max, but unless otherwise indicated should be similar for recent Intel models. They’re summarised in this table.

storage1

Write speeds are given for:

the single 50 MB write test performed by Time Machine before each backup;
500 multiple concurrent writes of 4 KB each, performed in those same Time Machine tests;
calculated net write speed over a first full backup to APFS of at least 400 GB;
general write speed measurement using my app Stibium, which gives broadly similar results to other leading benchmarking apps.

General read speeds are also obtained using Stibium, and similar to other apps. All speeds are given as MB/s for consistency.

Before looking at individual types of storage, one obvious and important result is the effect of throttling by macOS on Time Machine backup performance. Considering Time Machine’s own tests, writing a single 50 MB file is performed consistently at around 200-225 MB/s to local storage of whatever type, and multiple concurrent writes of 4 KB files reach around 20-23 MB/s regardless of local storage type. Those hold good even when you back up to a fast Thunderbolt 3 SSD, and backing up to a NAS is little quicker unless it’s over 2.5GbE to an NVMe SSD. Local transfer speeds only differ more substantially in general tests, when they aren’t throttled as they are in Time Machine.

Hard disks

When writing to or reading from a local hard disk, performance varies substantially according to which sectors on the hard disk are being accessed. This is a well-known phenomenon, and the result of geometry, as sectors are faster at the periphery of the disk’s platter, and slower in the inner part. Ranges given here take that into account: the lower figure is for inner sectors, and the higher for outer ones. Some users compensate for this effect, and only ever use the outer half of a disk’s sectors to obtain better performance, but that reduces their available capacity, and effectively doubles their cost per TB.

SSDs

SATA SSDs may be cheapest, but they’re also slowest, and with Macs they generally don’t enjoy Trim or SMART health indicator support. Of the two, Trim support is usually the more important, as without that, they can accumulate blocks waiting to be erased and returned for further use, and as a result their write (but not read) speed can fall as low as 100 MB/s. Unless used for largely static storage, this is a significant risk.

NVMe SSDs deliver twice the performance of SATA models, and generally enjoy Trim but not SMART indicator support. This makes them far better suited to general use, as their write speeds should be sustained from new throughout their working life.

USB 3.2 Gen 2, Thunderbolt 3, USB4

Translating commonly quoted transfer speeds for these three protocols into real-world speeds turns out to be complex. In practice, these are what you can expect to see:

USB 3.2 Gen 2 at 10 Gb/s is slightly less than 1 GB/s
Thunderbolt 3 at 32 Gb/s is up to 3 GB/s
USB4 at 40 Gb/s is up to 3.4 GB/s.

All recent models of Mac, both Intel and Apple silicon, should realise full performance over USB 3.2 Gen 2 and Thunderbolt 3, but support for USB4 is limited to Apple silicon. Unless a drive or enclosure specifically includes Thunderbolt 3 as a fallback, when connected to an Intel Mac, you should expect it to fall back to USB 3.2 Gen 2 at just under 1 GB/s, less than a third of the speed of USB4.

NAS

Although I haven’t made any systematic comparison between AFP and SMB network protocols, I can see no consistent difference in their performance, when used with the latest versions of macOS and NAS software. The latter, though, can be critical: older versions of NAS software can perform poorly when used over SMB with recent macOS. Keeping your NAS software up to date is important.

Throttling of Time Machine backup writing isn’t supposed to occur when backing up over a network, and there is some evidence here to support that, with significantly better results for 50 MB test files. However, those are only apparent when using NVMe SSDs in the NAS, with a wired Ethernet 2.5GbE connection to provide sufficient bandwidth.

Check TM performance

Provided that your Mac is running a recent version of macOS and backing up to APFS, it’s simple to read the two write performance tests that occur at the start of each Time Machine backup using my free T2M2. Alternatively, you can also read them using the Time Machine custom log extract in Mints. In T2M2 they should look something like:
Destination IO performance measured: Wrote 1 50 MB file at 238.02 MB/s to "/Volumes/ThunderBay2" in 0.210 seconds Concurrently wrote 500 4 KB files at 35.58 MB/s to "/Volumes/ThunderBay2" in 0.058 seconds

Check general performance

Although there are other apps that will do this, I developed Stibium for this purpose. Follow the ‘gold standard’ procedure detailed in its Help Reference to obtain the most accurate and reproducible results. Stibium can test any storage you can access in the Finder, including all local devices and networked systems such as NAS.