Normal view

There are new articles available, click to refresh the page.

Yesterday — 8 January 2025Main stream

Power Modes and Apple silicon CPUs

By: hoakley

8 January 2025 at 15:30

Recent information from Apple about Power Modes available on Apple silicon Macs doesn’t match that found, and lacks detail. For example, according to that article all Mac mini M4 2024 models feature both Low and High Power modes, while a MacBook Pro 16-inch Nov 2023 M3 Pro has neither. In reality, only Mac mini models with the M4 Pro support High Power mode, and that MacBook Pro M3 Pro, like many other Apple silicon models, does support Low Power mode, but not High.

A first look at GPU performance and power use demonstrated that Low Power mode on an M4 Pro resulted in a dramatic reduction in power to a third of that expected, with a disproportionately small reduction in render performance to only 69%. However, Low Power mode in the M3 Pro resulted in no detectable change in GPU performance or power use. This article extends the comparison to CPU cores in both chips.

CPU core frequency

Most of the action here concerns Performance (P) cores, although E cores get the occasional look-in. Frequency of P cores is labile, and they don’t have a single operating frequency that’s increased by a ‘boost’ feature or reduced by ‘throttling’. Each cluster of P cores, six in the M3 Pro and five in the M4 Pro, runs at the same frequency as determined by macOS, according to its rules.

For example, cluster frequency may be reduced as the number of threads running in that cluster increases, and, as I’ll show below, can be set at frequencies well below maximum for other reasons. From the first base M1 chip, gaining an understanding of core frequency control has been central to understanding how these chips function.

Methods

The results described below were obtained using similar methods to those detailed in previous work on M1, M3 and M4 chips (links at the end). Tests used to assess core performance were a tight floating point loop in assembly language, and a similarly tight loop of NEON assembly code for vector processing. These use registers alone, without accessing memory in the loop, so are intended to be core-intensive and not representative of real-world code.

Floating point and NEON tests were run in 1-14 threads on the M4 Pro with Power Mode set to Automatic, and to Low Power. Time to complete each set of threads was recorded using Mach timing, and each test run was analysed in 0.1 second collection periods using powermetrics cpu_power sampler.

Performance

m4lowpowererformtime

This graph shows time taken to complete each set of test threads against the number of threads being run. Black lines show results for the floating point test, which follows that expected. Up to a thread count of 10, when all P cores in the M4 Pro are fully occupied, time remains fairly constant, then increases as threads spill over to be run on the slower E cores, until they are also full.

NEON performance times, shown in red, appear most peculiar. They are fairly constant up to 5 threads, then grow increasingly longer until they spill over to the E cores above 10 threads, when their rate of increase is reduced, despite being run on slower and less performant E cores.

Frequency and power

m4lowpowertime

Looking at individual test data from powermetrics revealed a striking difference between floating point and NEON tests, illustrated above for their tests with 10 threads, thus fully occupying both clusters of P cores. Here, floating point power measurements are shown in red, and those for NEON in black.

Power during floating point tests changed as expected from previous tests. It rose steeply to a peak as the threads were loaded, then settled to a steady value until threads were completed, then dropped rapidly back to near-idle. What differs here from the Automatic power setting is that, regardless of the number of threads, P cores were run at a frequency of 3,624 MHz, significantly below normal, reducing average power use to about 1,100 mW per thread.

When NEON threads were run in Low Power mode, the initial peak was much higher, and close to that measured with the Automatic power setting. It then fell over the following 0.3 seconds to a level that appeared independent of the number of threads between 5 and 10, at an average of about 14,000 mW as the total for all P cores. Between 1-5 threads, cluster frequency appeared fixed at 3,624 MHz, the same as that for floating point. As the number of threads rose beyond 5, frequencies fell progressively to 2,616 MHz, indicating they were being reduced by macOS, most probably to limit temperature rise.

Power differences

The following two graphs show average power use when running floating point threads, by the number of threads being run, first for the Automatic power setting, then for Low Power. Points are shown with error bars representing +1 standard deviation about the mean.

m4powerflopt1

m4lowpowerfloat

When running in Automatic power mode, each additional floating point thread required 1,300 mW on P cores, and 110 mW on E cores. In Low Power mode, threads on P cores were reduced to 1,100 mW, with no significant change on E cores.

NEON threads were more complex.

m4powerneon1

m4lowpowerneon

In Automatic power mode, each additional NEON thread required 3,000 mW on P cores, and 280 mW on E cores. In Low Power mode, for 1-5 threads, each required 2,500 mW. There was then a transition at 6 threads, and from 7 threads upwards power use was constrained to about 14,000 mW, rather than climbing to the peak of over 32,000 mW measured in Automatic mode.

M3 Pro

At this point, I performed a limited number of NEON tests on my M3 Pro, to see if its CPU cores behaved similarly when set to Always for Low Power mode. They didn’t, and instead ran all NEON threads at a cluster frequency reduced to 2,808 MHz, compared with 3,576-3,624 MHz when run with Low Power mode set to Never. That extended the time to execute threads from 2.1-2.7 seconds to a steady 3.0 seconds regardless of their number.

Energy use

While power use is important in determining cooling requirements, for those using Low Power mode when their Mac is running on battery, energy use is often more important. Reducing cluster frequency not only reduces power use but also extends the time to complete the same computational task. If frequency reduction reduces power used to 50% but tasks take twice the time, then the only net saving in energy would be that required by cooling fans.

Although I may return for more extensive calculations of total energy required for the M4 Pro, figures for the M3 Pro appear impressive:

for a single NEON thread, total energy used fell from 6.7 to 3.6 KJ
for 6 threads, its full P cluster, total energy fell from 32 to 20 KJ
for 10 threads, 6 on P cores, and 4 on E, total energy fell from 41 to 26 KJ.

Those are for the CPU cores alone, and don’t include savings from memory, SSD, or cooling fan use.

What does Low Power mode do?

The effects of Low Power mode differ according to Apple silicon chip. For the M4 Pro they include:

reduction in P core frequency, but no change in E core frequency for high QoS threads that have spilt over from P cores;
further reduction in P core frequency to limit total power to about 14 W, so restricting heat generation and allowing reduced fan use;
reduction in GPU core frequency to reduce power use to about one third at maximal tasks;
significant reduction in CPU and (probably) GPU energy use.

For the M3 Pro they include:

reduction in P core frequency, but no change in E core frequency for high QoS threads that have spilt over from P cores;
no further reduction in P core frequency was necessary, as power remained below 10 W even with 10 NEON threads running;
no alteration in GPU core frequency or power use;
significant reduction in CPU energy use.

As to which Apple silicon models support which set of features, I’ll leave it to Apple to get its facts straight in an updated version of its support note.

References

Inside M4 chips: CPU power, energy and mystery
Inside M4 chips: Matrix processing and Power Modes
Power Modes and Apple Silicon GPUs
Evaluating M3 Pro CPU cores: 1 General performance

Before yesterdayMain stream

Power Modes and Apple Silicon GPUs

The Eclectic Light Company

By: hoakley

6 January 2025 at 15:30

Some recent Apple silicon Macs offer High and Low Power modes that appear to differ from the Low Power mode offered previously. This article discovers what they do, in particular how these modes affect the GPU.

New Power modes

According to Apple’s most recent explanation of Power modes in Apple silicon Macs, these are restricted to specific models. Low Power mode is apparently only available in:

MacBook Pro with M1 Max, M2 Max, M3 Max or M4 chips,
iMac with M3 or M4 chips,
Mac mini with M2 or M4 chips.

Low Power mode “reduces energy use to increase battery life” and, in macOS Sequoia 15.1 and later reduces fan speeds to minimise generated noise and to reduce power use.

High Power mode is only available in:

MacBook Pro with M1 Max, M2 Max, M3 Max or M4 chips,
Mac mini with M4 chips.

High Power mode primarily runs the cooling fans at higher speeds to allow longer sustained high performance, and “can improve performance in graphics-intensive workflows”. Apple gives examples of the latter, including colour grading 8K video, video editing and 3D apps.

If your MacBook Pro doesn’t have an earlier Max variant or an M4, then what it refers to as Low Power mode in its Battery settings doesn’t appear to be Low Power mode according to that article. That might be the Low Power mode apparently introduced for MacBook and MacBook Pro models from early 2016 and later, when running macOS Monterey 12.0 and later. Juli Clover of MacRumors stated that “reduces the system clock speed and the display brightness in order to extend your battery life even further.”

Apple uses the words energy and power interchangeably, and in places refers not to Power Modes, but to Energy Modes, which is surprisingly inconsistent and thoroughly confusing.

Previous results

When recently investigating CPU core power use in the M4 Pro, I examined the effect of Low and High Power modes. As might be expected from Apple’s description, High Power mode had no effect on CPU core frequencies, their control, power use or performance.

However, Low Power mode had substantial effects on core frequency, performance and power use. When running floating point tests in 10 threads, their cluster frequency was reduced from 3,852 to 3,624 MHz, 94% of Automatic and High Power. That reduced power use from a mean of 13.9 W to 11.2 W, and increased the time to complete threads. Time taken by floating point threads increased to 106% of Automatic and High, while that for NEON increased to 135% and vDSP_mmul to 177%.

My previous measurements in M4 Pro CPU cores found peak average CPU power use of 14 W when running floating point instructions, 33 W for NEON, and 36 W for vDSP_mmul.

Assessing the GPU

My previous attempts to find a performance test that will reliably put maximum load on Apple silicon GPUs had limited success, but Blender Benchmark now appears capable, and has been used extensively. This runs three demanding renders using Metal, and reports for each the estimated number of rendered path tracing samples per minute. Running the Blender 4.3.0 version on the GPU using Metal results in consistent 100% active residency and maximum frequency for periods of several seconds. The standard test sequence used here renders three scenes, named monster, junkshop and classroom.

Tests were performed on a Mac mini M4 Pro with 20 GPU cores running macOS 15.2, thus one of the systems able to make use of both High and Low Power modes, and a MacBook Pro M3 Pro with 18 GPU cores running macOS 15.2, one of the models previously claimed to use ‘old’ Low Power mode, but excluded from the list of those capable of either High or Low Power modes.

GPUs were assessed using the powermetrics command tool’s gpu_power sampler for sample periods of 0.1 second over a total of 5 seconds early in each of the test renders. Activity Monitor’s GPU History window was used to check when each of the test renders had reached 100% GPU, at which point the powermetrics samples were collected into a file. Once the three renders had completed, their scores were recorded and the Blender Benchmark app was quitted. Tests were performed for:

M4 Pro, for Automatic, High Power and Low Power,
M3 Pro, for Low Power Mode Never and Always.

To ensure the latter had fully engaged, the MacBook Pro was also set to Always and restarted on battery power alone, but that made no difference to render performance.

Power

Differences between GPU performance were most apparent when comparing Automatic and Low Power modes on the M4 Pro, as shown in the chart below.

m4m3gpupower1

GPU power measurements given by the gpu_power sampler are shown against time over the 5 second period during the first, monster render. Filled circles and the regression line in black are those with the Automatic setting, and those in red are at Low Power. Thus, the obvious effect of Low Power mode is to reduce GPU power from about 20 to 6 W.

m4m3gpupower2

This chart adds measurements made during High Power mode in purple. If anything, power use was slightly lower in High Power than in Automatic mode, but there’s extensive overlap between values.

m4m3gpupower3

This adds measurements made with Low Power set to Never on the M3 Pro, shown in blue. Those too overlap with the M4 Pro set to Automatic and High Power, and are about 18 W. No difference was seen with the M3 Pro Low Power mode set to Always.

Blender Benchmark results are compared in the table below.

m4m3gpupower4

As expected, performance was almost identical between M4 Pro automatic and High Power modes, and between M3 Pro normal and Low Power modes. The only substantial differences were between M4 Pro automatic and Low Power, and between M4 Pro automatic and M3 Pro Low Power Never modes.

In the M4 Pro, engaging Low Power mode reduced GPU render performance to 65-71% of normal. The 18-core GPU in the M3 Pro achieved 66-72% of the performance of the 20-core GPU in the M4 Pro. Given that the M3 Pro has 90% of the number of cores, that implies that the GPU cores in the M4 Pro are significantly more performant than those in the M3 Pro, as has been claimed.

m4m3gpupower5

This table brings together performance and power use for the monster render alone. On the M4 Pro, Low Power mode delivered 69% of the render performance using about a third (32%) of the power. That contrasts with the M3 Pro, whose 90% core count used 90% of the power to deliver 72% of the performance, relative to the M4 Pro.

Settings

m4m3gpupower0m3

In the MacBook Pro M3 Pro, Battery settings explicitly refer to Low Power Mode. As this is completely different from the ‘new’ Low Power mode, that’s misleading.

m4m3gpupower0m4

In the Mac mini M4 Pro, Energy settings explicitly refer to Energy Mode, offering Low Power (at the top of the popup menu) and High Power (at the bottom of the popup menu).

Terms used and Apple’s detailed guidance need to be rationalised and made consistent across all recent Macs.

What Low Power mode does

Detailed comparison between powermetrics measurements in automatic and Low Power modes on the M4 Pro confirm that there are substantial changes to control of GPU cores. Running any of the three renders normally results in GPU cores being run at their maximum frequency of 1,578 MHz with 100% active residency and a software state of P10. In Low Power mode, their frequency is capped at 1,056 MHz (67%), active residency remains at 100%, but with a software state of P4-P5.

powermetrics reports two software ‘states’ for the GPU, one that is requested, which for these renders is invariably P10 (P8 on the M3 Pro), and the other the state apparently applied. Although these appear to be some form of priority, with higher P numbers being higher priority, I’ve been unable to find any explanation.

For comparison, when running on the M3 Pro, active frequency was the maximum of 1,380 MHz (87% of the M4 Pro), 100% active residency, and software states requested and given at P8 throughout.

CPU and GPU power use

Perhaps as a result of many GPUs now using 200-1,000 W of power, we have come to assume that GPUs inevitably consume the lion’s share of power in a computer. While that may be true of many PCs, even of some Intel Macs, it isn’t true of Apple silicon.

Maximum power used by the GPU in any 0.1 second sampling period during these tests was 25.3 W, and measurements on M4 Max chips with 40 GPU cores suggest they can use a maximum of about 50 W. Any PC gamer with a GPU using so little power would hang their head in shame, yet performance of Apple silicon GPUs is by no means poor.

While power used by the GPU during heavy workloads can remain at 14 W or more for sustained periods, when performing scalar floating point calculations, CPU cores generally remain below that. However, vector and matrix calculations in CPU cores and possibly the AMX coprocessor can exceed the maximum for a 20-core GPU, at 33 W for NEON floating point vectors, and 36 W for floating point matrix multiplication.

Summary

There are two different types of Power Mode in Macs. MacBook Pro models without M1 Max, M2 Max, M3 Max or M4 chips support an older Low Power mode that dims the display to extend battery endurance, but in Apple silicon Macs has no significant effect on their performance.
New Power Modes are available in MacBook Pros with M1 Max, M2 Max, M3 Max or M4 chips, and Mac minis with M4 chips.
High Power mode doesn’t alter performance, but makes fan strategy more aggressive to support longer periods of high power use.
Low Power mode constrains the frequency of CPU and GPU cores to reduce CPU power to about 80% and GPU power to about 33%, so achieving substantial savings and less fan use, at the cost of impaired performance.
M4 GPU cores achieve better performance over those in M3 chips by a combination of higher frequency and improvements in the core itself.
Apple needs to correct terms used in System Settings, and in its documentation, which are currently misleading and confusing.
It’s not clear why Low Power mode is only available in selected models and chips.