Power Modes and Apple silicon CPUs
Recent information from Apple about Power Modes available on Apple silicon Macs doesn’t match that found, and lacks detail. For example, according to that article all Mac mini M4 2024 models feature both Low and High Power modes, while a MacBook Pro 16-inch Nov 2023 M3 Pro has neither. In reality, only Mac mini models with the M4 Pro support High Power mode, and that MacBook Pro M3 Pro, like many other Apple silicon models, does support Low Power mode, but not High.
A first look at GPU performance and power use demonstrated that Low Power mode on an M4 Pro resulted in a dramatic reduction in power to a third of that expected, with a disproportionately small reduction in render performance to only 69%. However, Low Power mode in the M3 Pro resulted in no detectable change in GPU performance or power use. This article extends the comparison to CPU cores in both chips.
CPU core frequency
Most of the action here concerns Performance (P) cores, although E cores get the occasional look-in. Frequency of P cores is labile, and they don’t have a single operating frequency that’s increased by a ‘boost’ feature or reduced by ‘throttling’. Each cluster of P cores, six in the M3 Pro and five in the M4 Pro, runs at the same frequency as determined by macOS, according to its rules.
For example, cluster frequency may be reduced as the number of threads running in that cluster increases, and, as I’ll show below, can be set at frequencies well below maximum for other reasons. From the first base M1 chip, gaining an understanding of core frequency control has been central to understanding how these chips function.
Methods
The results described below were obtained using similar methods to those detailed in previous work on M1, M3 and M4 chips (links at the end). Tests used to assess core performance were a tight floating point loop in assembly language, and a similarly tight loop of NEON assembly code for vector processing. These use registers alone, without accessing memory in the loop, so are intended to be core-intensive and not representative of real-world code.
Floating point and NEON tests were run in 1-14 threads on the M4 Pro with Power Mode set to Automatic, and to Low Power. Time to complete each set of threads was recorded using Mach timing, and each test run was analysed in 0.1 second collection periods using powermetrics cpu_power
sampler.
Performance
This graph shows time taken to complete each set of test threads against the number of threads being run. Black lines show results for the floating point test, which follows that expected. Up to a thread count of 10, when all P cores in the M4 Pro are fully occupied, time remains fairly constant, then increases as threads spill over to be run on the slower E cores, until they are also full.
NEON performance times, shown in red, appear most peculiar. They are fairly constant up to 5 threads, then grow increasingly longer until they spill over to the E cores above 10 threads, when their rate of increase is reduced, despite being run on slower and less performant E cores.
Frequency and power
Looking at individual test data from powermetrics
revealed a striking difference between floating point and NEON tests, illustrated above for their tests with 10 threads, thus fully occupying both clusters of P cores. Here, floating point power measurements are shown in red, and those for NEON in black.
Power during floating point tests changed as expected from previous tests. It rose steeply to a peak as the threads were loaded, then settled to a steady value until threads were completed, then dropped rapidly back to near-idle. What differs here from the Automatic power setting is that, regardless of the number of threads, P cores were run at a frequency of 3,624 MHz, significantly below normal, reducing average power use to about 1,100 mW per thread.
When NEON threads were run in Low Power mode, the initial peak was much higher, and close to that measured with the Automatic power setting. It then fell over the following 0.3 seconds to a level that appeared independent of the number of threads between 5 and 10, at an average of about 14,000 mW as the total for all P cores. Between 1-5 threads, cluster frequency appeared fixed at 3,624 MHz, the same as that for floating point. As the number of threads rose beyond 5, frequencies fell progressively to 2,616 MHz, indicating they were being reduced by macOS, most probably to limit temperature rise.
Power differences
The following two graphs show average power use when running floating point threads, by the number of threads being run, first for the Automatic power setting, then for Low Power. Points are shown with error bars representing +1 standard deviation about the mean.
When running in Automatic power mode, each additional floating point thread required 1,300 mW on P cores, and 110 mW on E cores. In Low Power mode, threads on P cores were reduced to 1,100 mW, with no significant change on E cores.
NEON threads were more complex.
In Automatic power mode, each additional NEON thread required 3,000 mW on P cores, and 280 mW on E cores. In Low Power mode, for 1-5 threads, each required 2,500 mW. There was then a transition at 6 threads, and from 7 threads upwards power use was constrained to about 14,000 mW, rather than climbing to the peak of over 32,000 mW measured in Automatic mode.
M3 Pro
At this point, I performed a limited number of NEON tests on my M3 Pro, to see if its CPU cores behaved similarly when set to Always for Low Power mode. They didn’t, and instead ran all NEON threads at a cluster frequency reduced to 2,808 MHz, compared with 3,576-3,624 MHz when run with Low Power mode set to Never. That extended the time to execute threads from 2.1-2.7 seconds to a steady 3.0 seconds regardless of their number.
Energy use
While power use is important in determining cooling requirements, for those using Low Power mode when their Mac is running on battery, energy use is often more important. Reducing cluster frequency not only reduces power use but also extends the time to complete the same computational task. If frequency reduction reduces power used to 50% but tasks take twice the time, then the only net saving in energy would be that required by cooling fans.
Although I may return for more extensive calculations of total energy required for the M4 Pro, figures for the M3 Pro appear impressive:
- for a single NEON thread, total energy used fell from 6.7 to 3.6 KJ
- for 6 threads, its full P cluster, total energy fell from 32 to 20 KJ
- for 10 threads, 6 on P cores, and 4 on E, total energy fell from 41 to 26 KJ.
Those are for the CPU cores alone, and don’t include savings from memory, SSD, or cooling fan use.
What does Low Power mode do?
The effects of Low Power mode differ according to Apple silicon chip. For the M4 Pro they include:
- reduction in P core frequency, but no change in E core frequency for high QoS threads that have spilt over from P cores;
- further reduction in P core frequency to limit total power to about 14 W, so restricting heat generation and allowing reduced fan use;
- reduction in GPU core frequency to reduce power use to about one third at maximal tasks;
- significant reduction in CPU and (probably) GPU energy use.
For the M3 Pro they include:
- reduction in P core frequency, but no change in E core frequency for high QoS threads that have spilt over from P cores;
- no further reduction in P core frequency was necessary, as power remained below 10 W even with 10 NEON threads running;
- no alteration in GPU core frequency or power use;
- significant reduction in CPU energy use.
As to which Apple silicon models support which set of features, I’ll leave it to Apple to get its facts straight in an updated version of its support note.
References
Inside M4 chips: CPU power, energy and mystery
Inside M4 chips: Matrix processing and Power Modes
Power Modes and Apple Silicon GPUs
Evaluating M3 Pro CPU cores: 1 General performance