Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

In the background: Putting threads to work

By: hoakley
17 February 2026 at 15:30

To take best advantage of background processing, multiple cores, and different core types, apps have to be designed to run efficiently in multiple threads that co-operate with one another and with macOS. This article explains the basics of how that works in Apple silicon Macs.

Single thread

Early apps, and many current command tools, run as simple sequences of instructions. Cast your mind back to days of learning to code in a first language like Basic, and you’d effectively write something like
BEGIN
handle any options passed in command
process data or files until all done
report any errors
END

There are still plenty of command tools that work like that, and run entirely in a single thread. One example is the tar command, which dates back to 1979, and its current version in macOS remains single-threaded. As a result it can’t benefit from any additional cores, and runs at essentially the same speed on base, Pro, Max and Ultra variants of an M chip family.

Multiple threads

Since apps gained windowed interfaces, rather than running from BEGIN to END and quitting, they’re primarily driven by the user. The app itself sits in an event loop waiting for you to tell it what to do through menu commands, tools or other controls. When you do that, the app hands that task over to the required code to handle, and returns to its main event loop. This is much better, as you see an app that remains responsive at all times, even though it may have other threads that are working frantically in the background on tasks you have given them.

We’ve now divided that simple linear code up into at least two parts: a foreground main thread that the user interacts with and farms work out to worker threads that run in the background. When run on a single core with multitasking, that ensures the main thread appears responsive at all times, and we could use that with a Stop command, as has been commonly implemented with ⌘-., to signal the worker thread to halt its processing.

Threads come into their own, though, with multiple CPU cores. If the worker code can be implemented so more than one thread can be processing a job at any time, then each core can run a separate thread, and complete that job more quickly. For example, instead of compressing an image one row of pixels at a time, the image could be divided up into rectangles, and each farmed out to be compressed by a separate thread. Alternatively, the compression process could be designed in stages, each of which runs in a separate thread, with image data fed through their pipeline.

Using multiple threads has its own limitations, and there are trade-offs as the number of threads increases, and more work has to be done moving data between threads and the cores they’re running on.

These can be seen in a little demonstration involving CPU-intensive floating point computations in a tight loop:

  • running the computation in a single thread, on a single CPU core, normally processes 0.25 billion loops per second
  • splitting the total work to be performed into 2 threads increases that to 0.48 billion/s
  • in 10 threads, it rises to 2.1 billion/s
  • in 20 threads, it reaches a peak of 4.2 billion/s
  • in 50 threads, overhead has a negative impact on performance, and the rate falls to 3.3 billion/s.

Those are for the 10 P and 4 E cores in an M4 Pro, at high Quality of Service, so run preferentially on its P cores. In this case, picking the optimum number of threads could accelerate its performance by a factor of 16.8.

Core allocation

CPUs with one core type normally try to balance load across their cores, and coupled with a scheme for software to indicate the priority of its threads, aren’t as complex as those with two or more core types with contrasting performance characteristics. For its Alder Lake CPUs, Intel uses an elaborate Hardware Feedback Interface, marketed as Intel Thread Director, that works with operating systems such as Windows 11 and recent Linux kernels to allocate threads to cores.

With its long experience managing P and E cores in iPhones and iPads, Apple has chosen a scheme based on a Quality of Service (QoS) metric, together with software management of core cluster frequency. The developer API is simplified to offer a limited number of values for QoS:

  • QoS 9 (binary 001001), named background and intended for threads performing maintenance, which don’t need to be run with any higher priority.
  • QoS 17 (binary 010001), utility, for tasks the user doesn’t track actively.
  • QoS 25 (binary 011001), userInitiated, for tasks that the user needs to complete to be able to use the app.
  • QoS 33 (binary 100001), userInteractive, for user-interactive tasks, such as handling events and the app’s interface.

There’s also a ‘default’ value between 17 and 25, an unspecified value, and you might come across others used by macOS.

As a general rule, threads assigned a QoS of background and below will be allocated to run on E cores, while those of utility and above will be allocated to run on P cores when they’re available. Those higher QoS threads may also be run on E cores when P cores are already fully committed, but low QoS threads are seldom if ever promoted to run on P cores.

Internally, QoS uses a different, finer-grained metric, taking into account other factors, but those values aren’t exposed to the developer. QoS provided by the developer and set in their code is considered a request to guide macOS, and not an absolute determinant of which cores that code will be run on.

Cluster frequency

P and E cores are operated at a wide range of frequencies, that are determined by macOS according to more complex heuristics also involving QoS. In broad terms, when P cores are running threads they do so at frequencies close to their maximum, as are E cores when they’re running high QoS threads that should have been run on P cores. However, when E cores are only running low QoS threads, they do so at a frequency close to that of idle for maximum efficiency.

One significant exception to that are the two E cores in M1 Pro and M1 Max chips. To compensate for the fact that there are only two of them, when they’re running two or more threads of low QoS they have higher frequency. Frequency control in P cores in M4 Pro chips is also more complicated, as their frequency is progressively reduced as they’re loaded with more threads, presumably to constrain heat generated when running at those higher loads.

There are times when internal conditions, such as Low Power Mode, override a request for code to be run as userInteractive. You can experience that yourself if you try running apps that would normally be given P cores on a laptop with little remaining in its battery. All threads are then diverted to E cores to eke out the remaining battery life, and their cluster frequency is reduced to little more than idle.

Summary

  • Developers determine the performance of their code according to how it’s divided into threads, and their QoS.
  • QoS advises macOS of the performance expectation for each thread, and is a request not a requirement.
  • macOS uses QoS and other factors to allocate threads to specific core types and clusters.
  • macOS applies more complex heuristics to determine the frequency at which to run each cluster.
  • System-wide policy such as Low Power Mode can override QoS.

Last Week on My Mac: Why E cores make Apple silicon fast

By: hoakley
8 February 2026 at 16:00

If you use an Apple silicon Mac I’m sure you have been impressed by its performance. Whether you’re working with images, audio, video or building software, we’ve enjoyed a new turn of speed since the M1 on day 1. While most attribute this to their Performance cores, as it goes with the name, much is in truth the result of the unsung Efficiency cores, and how they keep background tasks where they should be.

To see what I mean, start your Apple silicon Mac up from the cold, and open Activity Monitor in its CPU view, with its CPU History window open as well. For the first five to ten minutes you’ll see its E cores are a wall of red and green with Spotlight’s indexing services, CGPDFService, mediaanalysisd, BackgroundShortcutRunner, Siri components, its initial Time Machine backup, and often an XProtect Remediator scan. Meanwhile its P cores are largely idle, and if you were to dive straight into using your working apps, there’s plenty of capacity for them to run unaffected by all that background mayhem.

handecpuhistory

It’s this stage that scares those who are still accustomed to using Intel Macs. Seeing processes using more than 100% CPU is terrifying, because they know that Intel cores can struggle under so much load, affecting user apps. But on an Apple silicon Mac, who notices or cares that there’s over a dozen mdworker processes each taking a good 50% CPU simultaneously? After all, this is what the Apple silicon architecture is designed for. Admittedly the impression isn’t helped by a dreadful piece of psychology, as those E cores at 100% are probably running at a frequency a quarter of those of P cores shown at the same 100%, making visual comparison completely false.*

This is nothing new. Apple brought it to the iPhone 7 in 2016, in its first SoC with separate P and E cores. That’s an implementation of Arm’s big.LITTLE announced in 2011, and development work at Cray and elsewhere in the previous decade. What makes the difference in Apple silicon Macs is how threads are allocated to the two different CPU core types on the basis of a metric known as Quality of Service, or QoS.

As with so much in today’s Macs, QoS has been around since OS X 10.10 Yosemite, six years before it became so central in performance. When all CPU cores are the same, it has limited usefulness over more traditional controls like Posix’s nice scheduling priority. All those background tasks still have to be completed, and giving them a lower priority only prolongs the time they take on the CPU cores, and the period in which the user’s apps are competing with them for CPU cycles.

With the experience gained from its iPhones and other devices, Apple’s engineers had a better solution for future Macs. In addition to providing priority-based queues, QoS makes a fundamental distinction between those threads run in the foreground, and those of the background. While foreground threads will be run on P cores when they’re available, they can also be scheduled on E cores when necessary. But background threads aren’t normally allowed to run on P cores, even if they’re delayed by the load on the E cores they’re restricted to. We know this from our inability to promote existing background threads to run on P cores using St. Clair Software’s App Tamer and the command tool taskpolicy.

This is why, even if you sit and watch all those background processes loading the E cores immediately after starting up, leaving the P cores mostly idle, macOS won’t try running them on its P cores. If it did, even if you wanted it to, the distinction between foreground and background, P and E cores would start to fall apart, our apps would suffer as a consequence, and battery endurance would decline. Gone are the days of crashing mdworker processes bringing our Macs to their knees with a spinning beachball every few seconds.

If seeing all those processes using high % CPU can look scary, the inevitable consequence in terms of software architecture might seem terrifying. Rather than building monolithic apps, many of their tasks are now broken out into discrete processes run in the background on demand, on the E cores when appropriate. The fact that an idle Mac has over 2,000 threads running in over 600 processes is good news, and the more of those that are run on the E cores, the faster our apps will be. The first and last M-series chips to have only two E cores were the M1 Pro and Max, since when every one has had at least four E cores, and some as many as six or eight.

Because Efficiency cores get the background threads off the cores we need for performance.

* For the record, I have measured those frequencies using powermetrics. For an M4 Pro, for example, high QoS threads running on the P cores benefit from frequencies close to the P core max of 4,512 MHz. Low QoS threads running on the E cores are run at frequencies close to idle, typically around 1,050 MHz. However, when the E cores run high QoS threads that have overflowed from the P cores, the E cores are normally run at around their maximum of 2,592 MHz. By my arithmetic, 1,050 divided by 4,512 is 0.233, which is slightly less than a quarter. Other M-series chips are similar.

❌
❌