Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

M4 Pro full on: when CPU and GPU draw over 50 W, and how Low Power mode changes that

By: hoakley
22 January 2025 at 15:30

Most testing and benchmarks avoid putting heavy loads on CPU and GPU at the same time, so running an Apple silicon chip ‘full on’. This article explores what happens in the CPU and GPU of an M4 Pro when they’re drawing a total of over 50 W, and how that changes in Low Power mode. It concludes my investigations of power modes, for the time being.

Methods

Three test runs were performed on a Mac mini M4 Pro with 10 P and 4 E cores, and a 20-core GPU. In each run, Blender Benchmarks were run using Metal, and shortly after the start of the first of those, monster, 3 billion tight loops of NEON code were run on CPU cores at maximum Quality of Service in 10 threads. From previous separate runs, the monster test runs the GPU at its maximum frequency of 1,578 MHz and 100% active residency, to use about 20 W, and that NEON code runs all 10 P cores at high frequency of about 3,852 MHz and 100% active residency to use about 32 W. This combined testing was performed in each of the three power modes: Low Power, Automatic, and High Power.

In addition to recording test performance, powermetrics was run during the start of each NEON test at its shortest sampling period, with both cpu_power and gpu_power samplers active.

Performance

There was no difference in performance between High Power and Automatic settings, which completed both tasks with the same performance as when they were run separately:

  • NEON time separate 2.12 s, together High Power 2.12 s, Auto 2.12 s
  • monster performance separate 1215-1220, together High Power 1221, Auto 1220.

As expected, Low Power performance was greatly reduced. NEON time was 4.33 s (49% performance), even slower than running alone at Low Power (2.87 s), and monster performance 795, slightly lower than running alone at Low Power (837).

High Power mode

This first graph shows CPU core cluster frequencies and active residencies for a period of 0.3 seconds when the monster test was already running, and the NEON test was started.

At time 0, the P0 cluster (black) was shut down, and the P1 cluster (red) running with one core at 100% active residency, a second at about 60%, and at about 3,900 MHz. As the ten test threads were loaded onto the two clusters, cluster frequencies were quickly brought to 3,852 MHz, by reducing that of the P1 cluster and rapidly increasing that of the P0 cluster.

By 0.1 seconds, both clusters were at full active residency and running at 3,852 MHz, where they remained until the NEON test threads completed.

Power used by the CPU followed the same pattern, rising rapidly from about 6,000 mW to about 32,000 mW at 0.1 seconds. GPU power varied between 8,600-23,000 mW, resulting in a peak total power of slightly less than 52,000 mW, and a dip to 40,600 mW. Typical sustained power with both CPU and GPU tests running was 50-52 W.

Low Power mode

These results are more complicated, and involve significant use of the E cluster.

This graph shows active residency alone, and this time includes the E cluster, shown in blue, and the GPU, in purple. NEON test threads were initially loaded into the two P clusters, filling them at 0.13 seconds. After that, threads were moved from some of those P cores to run on E cores instead, leaving just two test threads running on each of the P clusters by 0.26 seconds. Over much of that time the GPU had full active residency, but as that fell threads were moved from E cores back to P cores. By the end of this period of 0.5 seconds, 4 of 5 cores in each of the two P clusters were at 100%, and the GPU was also at 100% active residency.

This bar chart shows changing cluster total active residency for the E (red) and two P (blues) clusters by sample. With 10 test threads and significant overhead, the total should have reached at least 1,000%, which was only achieved in sample 4, and from sample 13 onwards.

Those active residencies are shown in the lower section of this graph (with open circles), together with cluster frequencies (filled circles) above them. As the P clusters were being loaded with test threads, both P clusters (black) were brought to a frequency of only 1,800 MHz, compared with 3,852 MHz in the High Power test. The E cluster (blue) was run throughout at its maximum frequency of 2,592 MHz, except for one sample period. GPU frequency (purple) remained below 1,000 MHz throughout, compared with a steady maximum of 1,578 MHz when at High Power.

Power changed throughout this initial period running the NEON test. Initially, CPU power (red) rose to a peak of 6,660 mW, then fell slowly to 3,500 mW before rising again to about 6,000 mW. GPU power rose to peak at just over 7,000 mW, but at one stage fell to only 26 mW. Total power used by the CPU and GPU ranged between 11-13.2 W, apart from a short period when it fell below 5 W. Those are all far lower than the steadier power use in High Power mode.

How macOS limits power

Running these tests in Low Power mode elicited some of the most sophisticated controls I have seen in Apple silicon chips. Compared to being run unfettered in Automatic or High Power mode, macOS used a combination of strategies to keep CPU and GPU total power use below 13.5 W:

  • P core frequencies were limited to 1,800 MHz, instead of 3,852 MHz.
  • High QoS threads that would normally have been run on P cores were transferred to E cores, which were then run at their maximum frequency of 2,592 MHz.
  • Threads continued to be transferred between E and P cores to balance performance against power use.
  • GPU frequency was limited to below 1,000 MHz.
  • Despite reducing power use to a total of 25% of High Power mode, effects on performance were far less, attaining about 50% of that at High Power mode.

References

How Low Power mode controls CPU cores
Power Modes and Apple silicon CPUs
Last Week on My Mac: Power throttle
Inside M4 chips: CPU power, energy and mystery
Inside M4 chips: Matrix processing and Power Modes
Power Modes and Apple Silicon GPUs
Evaluating M3 Pro CPU cores: 1 General performance

Explainer

Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.

What are CPU core frequencies in Apple silicon Macs?

By: hoakley
20 January 2025 at 15:30

One of the features of CPU cores in Apple silicon Macs is that they aren’t run at a single standard frequency or clock speed, but that varies depending on macOS. Moreover, those frequencies not only differ between generations, so aren’t the same in M2 chips as in the M1, but they also differ between variants within the same family. This article gives frequencies for each of the chips released to date, and considers how and why they differ. This has only been made possible by the many readers who generously gave their time to provide me with this information: thank you all.

The most reliable method of discovering which frequencies are available is using the command tool powermetrics. This lists frequencies for P and E cores, and this article assumes that those it gives are correct. Although it’s most likely that these frequencies aren’t baked into silicon, so could be changed, I’ve seen no evidence to suggest that Apple has done that in any release Mac.

Frequencies

If powermetrics is to be believed, then the maximum frequencies of each of the CPU cores used in each generation differ from some of those you’ll see quoted elsewhere. Correct values should be:

  • M1 E 2064 MHz or 2.1 GHz; P 3228 MHz or 3.2 GHz;
  • M2 E 2424 MHz or 2.4 GHz; P 3696 MHz or 3.7 GHz;
  • M3 E 2748 MHz or 2.7 GHz; P 4056 MHz or 4.1 GHz;
  • M4 E 2892 MHz or 2.9 GHz; P 4512 MHz or 4.5 GHz.

However, not all variants within a family can use those maximum frequencies. The full table of frequencies reported by powermetrics is:

This is available for download as a Numbers spreadsheet and in CSV format here: mxfreqs

Why those frequencies?

Depending on workload, thread Quality of Service, power mode, and thermal status, macOS sets the frequency for each cluster of CPU cores. Those used range between the minimum or idle, and the maximum, usually given as the core’s ‘clock speed’ and an indication of its maximum potential performance. In between those are as many as 17 intermediate frequencies giving cores great flexibility in performance, power and energy use. Core design and development uses sophisticated models to select idle and maximum frequencies, and undoubtedly to determine those in between.

Looking at the table, it would be easy to assume those numbers are chosen arbitrarily, but when expressed appropriately I think you can see there’s more to them. To look at frequency steps and the frequencies chosen for them, let me explain how I have converted raw frequencies to make them comparable.

First, I work out the steps as evenly spaced points along a line from 0.0, representing idle, to 1.0, representing the core’s maximum frequency. For each of those evenly spaced steps, I calculate a normalised frequency, as
(FmaxFstep)/(FmaxFidle)
where Fidle is the idle (lowest) frequency value, Fmax is the highest, and Fstep is the actual frequency set for that step.

For example, say a core has an idle frequency of 500 MHz, a maximum of 1,500 MHz, and only one step between those. Its steps will be 0.0, 0.5 and 1.0, and if the relationship is linear, then the frequency set by that intermediate step will be 1,000 MHz. If it’s greater than that, the relationship will be non-linear, tending to higher frequency for that step.

I’ll start with the E cores, as they’re simplest and have fewer steps.

E cores

For the M1, Apple didn’t try any tricks with the frequency of its E cores. There are just three intermediate steps, evenly spaced at 0.25, 0.5 and 0.75, and that’s the same with all E cores regardless of variant, from the base up to the Ultra.

With the M2, shown here in red, Apple added an extra step, and in the base M2 there’s also a lower idle frequency, not shown here. What is obvious is that those intermediate frequencies have been increased relative to those of the M1, turning the straight line into a curve.

The M3, shown here in blue, and M4, in purple, deviate even further from the line of the M1, with more steps and relatively higher frequencies.

This shows progress from the M1 in black to the M4 in purple, whose frequencies follow the polynomial shown.

Across the families, intermediate frequencies are most apparent in the E cores, where background threads are run at lower frequencies, and high-QoS threads that should have been run on P cores are run at higher frequencies. In M1 Pro and Max variants, with their two-core E clusters, macOS increases the E cluster frequency when they are running two threads to improve performance and compensate for their small cluster size.

P cores

With P core frequencies, the initial design for the M1 is different. The majority of the frequency steps follow a straight line still, but with a steeper gradient (1.23 against 1.00). Then in the upper quarter of the frequency range, above the step at 0.71, that line eases off to the maximum. This gives finer control of frequency over higher frequencies, and those higher frequencies are also reduced slightly in the base M1 compared to those here from the Pro, Max and Ultra.

In the M2 family, Apple divided frequencies into two: base and Pro variants have two less steps, with the base having a lower idle frequency. Shown here in red are those for the M2 Max, which are faired into a polynomial curve. That increases frequencies lower down, reduces them slightly at the upper end, then has a significantly higher maximum frequency.

Apple continued to tweak the P curves in the M3 (blue) and M4 (purple), with increasing numbers of steps but the same finer control at the upper end.

Here’s the comparison between M1 Max and M4 Max, with the same underlying ideas, but substantial differences. In the M4, each of the three variants released so far is different. The base M4 has a lower idle and maximum, the M4 Pro has a higher idle and maximum but one less step between them, and the M4 Max adds another step to the Pro’s series.

Significance

Apple’s engineers have clearly put considerable effort into picking optimised frequencies for each of the families and variants within them. If you still think that this is all fine detail and only the maximum frequencies count, then you might like to ponder why so much care has gone into selecting those intermediate frequencies, and how they’ve changed since the M1. Both P and E cores spend a lot of their time running at these carefully chosen frequencies.

Apple silicon CPU cores of the same type aren’t the same after all

By: hoakley
17 January 2025 at 15:30

Since Apple released the first M1 Macs over four years ago, I’ve been guilty of making the assumption that P and E cores used in the variants (base, Pro, Max and Ultra) in each family are identical. Thanks to the persistence of Thomas, I have learned the error of my ways and can now tell you that, while their hardware might be the same, there’s at least one significant difference between some, their operating frequencies, or clock speeds.

This all came to light when I claimed that the E cores in M4 family chips have a maximum frequency of 2,592 MHz, and Thomas tried to correct me by informing me that his have a maximum of 2,892 MHz, a substantial 300 MHz greater. His are in a base M4, mine in an M4 Pro, which seems even stranger, as the trend is for faster CPUs to run at higher frequencies, and that’s true when you compare their P cores: his can only rise to 4,462 MHz, while mine are slightly faster at 4,512 MHz.

The lesson is learned: E and P cores within the same family can have different operating frequencies. Going back to look at my records of the M1 family, I then realised that, while their E cores have identical frequencies, the P core maximum in a base variant is 3,204 MHz, while Pro and Max variants can run up to 3,228 MHz. Although that difference of only 24 MHz is far less, it can’t have occurred by accident.

The purpose of this article is to show the core frequencies that I have already measured [, and ask for your help in filling in the blanks in this table – no longer required, thank you]

Frequency table

mcorefreqs1

The only variant I was missing from those in the M1 family is the M1 Ultra.

I didn’t have any M2 Macs at all, as we decided to skip them and our only M2 chip is in my wife’s iPad Pro. I now have all four, thank you.

mcorefreqs2

I only had one in the M3 family, the M3 Pro in my MacBook Pro, but have all of them now.

mcorefreqs3

Thanks to Thomas, I already have two from the M4 family, the base and Pro variants, and I now have an M4 Max completing these before the Ultra comes later this year.

How to report frequencies

If you’re able to add to this collection, please open Terminal and run the command
sudo powermetrics -n 1 -s cpu_power
which then prompts you for your admin password. A few seconds later the window will fill with a single set of measurements looking like this:

mcorefreqsx

All I’d like is a copy containing 3 lines from that:

  • Machine model at the top, to tell me which Mac it is, thus which chip.
  • E-Cluster HW active residency, which contains a list of frequencies for the E cores.
  • P-Cluster HW active residency, which contains a longer list of frequencies for the P cores.

To help, I have highlighted those three lines in the screenshot above.

I now have all the frequency sets that I needed to complete the table.

Reward

I have added your entries to my Numbers spreadsheet, and will make that available for free download from here, so anyone who wants to check those frequencies can do so.

Frequency information also builds our understanding of Apple silicon chips. My next questions are going to be why there are those differences, and whether they significantly affect the performance of our Macs.

Thank you for helping, and thanks to Thomas for demonstrating that CPU cores of the same type aren’t the same after all.

Postscript

Thank you to all who responded so quickly. I now have all the frequencies and no longer require any more, thank you. I will post an updated table with a brief analysis on Monday. There are a lot of differences, many of them surprisingly subtle!

❌
❌