Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Explainer: I/O throttling

By: hoakley
28 March 2026 at 16:00

Input and output have long been key factors in computer performance, and capable of bringing everything to a crawl unless carefully managed. This is most obvious when foreground and background processes contend for common resources such as disk storage. A popular solution has been to impose delays on I/O performed for intensive background tasks, like backing up, so giving more important tasks preferential access to I/O. This policy is known as I/O throttling, and has come to determine hardware choice.

For example, folk wisdom is that there’s little point in wasting a faster drive to store your Time Machine backups, because when those backups are made, their write speed is throttled by macOS.

I/O Throttling

Documentation of I/O policy is in the man page for the getiopolicy_np() call in the Standard C Library, last revised in 2019, and prevailing settings are in sysctl’s debug.lowpri_throttle_* values. Those draw a distinction between I/O to local disks, being “I/O sent to the media without going through a network”, such as I/O “to internal and external hard drives, optical media in internal and external drives, flash drives, floppy disks, ram disks, and mounted disk images which reside on these media”, and those to “remote volumes” “that require network activity to complete the operation”. The latter are “currently only supported for remote volumes mounted by SMB or AFP.”

Inclusion of remote volumes is a relatively recent change, as in the previous version of this man page from 2006, they were explicitly excluded as “remote volumes mounted through networks (AFP, SMB, NFS, etc) or disk images residing on remote volumes.”

Five policy levels are supported for IOPOL_TYPE_DISK:

  • IOPOL_IMPORTANT, the default, where I/O is critical to system responsiveness.
  • IOPOL_STANDARD, which may be delayed slightly to allow IOPOL_IMPORTANT to complete quickly, and presumably referred to in sysctl as Tier 1.
  • IOPOL_UTILITY, for brief background threads that may be throttled to prevent impact on higher policy levels, in Tier 2.
  • IOPOL_THROTTLE, for “long-running I/O intensive background work, such as backups, search indexing, or file synchronization”, that will be throttled to prevent impact on higher policy levels, in Tier 3.
  • IOPOL_PASSIVE, for mounting files from disk images, and the like, intended more for server situations so that lower policy levels aren’t slowed by them.

However, the idea that throttled I/O is intentionally slowed at all times isn’t supported by the explanation of how throttling works: “If a throttleable request occurs within a small time window of a request of higher priority, the thread that issued the throttleable I/O is forced to a sleep for a short period. This slows down the thread that issues the throttleable I/O so that higher-priority I/Os can complete with low-latency and receive a greater share of the disk bandwidth.”

Settings in sysctl for Tier 3 give the window duration as 500 milliseconds, and the sleep period as 200 ms, except for SSDs, whose sleep period is considerably shorter at just 25 ms. Those also set a maximum size for I/O at 131,072 bytes. You can view those settings in the debug section of Mints’ sysctl viewer.

Some years ago it was discovered that the user can globally disable IOPOL_THROTTLE and presumably all other IOPOL_TYPE_DISK policy with the command
sudo sysctl debug.lowpri_throttle_enabled=0
although that doesn’t persist across restarts, and isn’t documented in the man page for sysctl. This is provided in an option in St. Clair Software’s App Tamer, to “Accelerate Time Machine backups”, for those who’d rather avoid the command line.

Apple silicon Macs

Since the man page for getiopolicy_np() was last revised, Apple silicon Macs have changed I/O. Most is now handled in some way through specialised Arm cores within the chip, and CPU cores come in two (now three) types, Efficiency and Performance (and Super in the M5 family). Background threads such as those performing the work associated with IOPOL_THROTTLE are now most likely to be running on Efficiency (or M5 Performance) CPU cores.

If you wanted to do away with all throttling, you not only have to change the debug.lowpri_throttle_enabled setting, but you’d need to move those background threads to run them on Performance (M5 Super) CPU cores, which isn’t possible unless you have control over the code. There is evidence that macOS may do this anyway when performing the first full Time Machine backup, which achieves significantly faster speeds than subsequent incremental backups.

Dangers

The big disadvantage of completely disabling I/O throttling policy is this can only be applied globally, so including I/O for Spotlight indexing, file synchronisation, and other background tasks, as well as those needing high-priority access to I/O. Leaving the policy disabled in normal circumstances could readily lead to adverse side-effects, allowing Spotlight indexing threads to swamp important user storage access, for example.

Conclusions

Even when Macs were simpler and all CPU cores were the same, disabling I/O throttling was at best experimental, and ran the risk of compromising higher priority I/O. In Apple silicon Macs, relocating threads from E to P cores isn’t possible, except when running them in a virtual machine. Without that, there seems little if any benefit from disabling I/O throttling. However, the claim that there’s no performance benefit to be obtained from faster backup storage media is demonstrably false. Perhaps it’s time to return I/O throttling to the obscurity it deserves.

In the background: Putting threads to work

By: hoakley
17 February 2026 at 15:30

To take best advantage of background processing, multiple cores, and different core types, apps have to be designed to run efficiently in multiple threads that co-operate with one another and with macOS. This article explains the basics of how that works in Apple silicon Macs.

Single thread

Early apps, and many current command tools, run as simple sequences of instructions. Cast your mind back to days of learning to code in a first language like Basic, and you’d effectively write something like
BEGIN
handle any options passed in command
process data or files until all done
report any errors
END

There are still plenty of command tools that work like that, and run entirely in a single thread. One example is the tar command, which dates back to 1979, and its current version in macOS remains single-threaded. As a result it can’t benefit from any additional cores, and runs at essentially the same speed on base, Pro, Max and Ultra variants of an M chip family.

Multiple threads

Since apps gained windowed interfaces, rather than running from BEGIN to END and quitting, they’re primarily driven by the user. The app itself sits in an event loop waiting for you to tell it what to do through menu commands, tools or other controls. When you do that, the app hands that task over to the required code to handle, and returns to its main event loop. This is much better, as you see an app that remains responsive at all times, even though it may have other threads that are working frantically in the background on tasks you have given them.

We’ve now divided that simple linear code up into at least two parts: a foreground main thread that the user interacts with and farms work out to worker threads that run in the background. When run on a single core with multitasking, that ensures the main thread appears responsive at all times, and we could use that with a Stop command, as has been commonly implemented with ⌘-., to signal the worker thread to halt its processing.

Threads come into their own, though, with multiple CPU cores. If the worker code can be implemented so more than one thread can be processing a job at any time, then each core can run a separate thread, and complete that job more quickly. For example, instead of compressing an image one row of pixels at a time, the image could be divided up into rectangles, and each farmed out to be compressed by a separate thread. Alternatively, the compression process could be designed in stages, each of which runs in a separate thread, with image data fed through their pipeline.

Using multiple threads has its own limitations, and there are trade-offs as the number of threads increases, and more work has to be done moving data between threads and the cores they’re running on.

These can be seen in a little demonstration involving CPU-intensive floating point computations in a tight loop:

  • running the computation in a single thread, on a single CPU core, normally processes 0.25 billion loops per second
  • splitting the total work to be performed into 2 threads increases that to 0.48 billion/s
  • in 10 threads, it rises to 2.1 billion/s
  • in 20 threads, it reaches a peak of 4.2 billion/s
  • in 50 threads, overhead has a negative impact on performance, and the rate falls to 3.3 billion/s.

Those are for the 10 P and 4 E cores in an M4 Pro, at high Quality of Service, so run preferentially on its P cores. In this case, picking the optimum number of threads could accelerate its performance by a factor of 16.8.

Core allocation

CPUs with one core type normally try to balance load across their cores, and coupled with a scheme for software to indicate the priority of its threads, aren’t as complex as those with two or more core types with contrasting performance characteristics. For its Alder Lake CPUs, Intel uses an elaborate Hardware Feedback Interface, marketed as Intel Thread Director, that works with operating systems such as Windows 11 and recent Linux kernels to allocate threads to cores.

With its long experience managing P and E cores in iPhones and iPads, Apple has chosen a scheme based on a Quality of Service (QoS) metric, together with software management of core cluster frequency. The developer API is simplified to offer a limited number of values for QoS:

  • QoS 9 (binary 001001), named background and intended for threads performing maintenance, which don’t need to be run with any higher priority.
  • QoS 17 (binary 010001), utility, for tasks the user doesn’t track actively.
  • QoS 25 (binary 011001), userInitiated, for tasks that the user needs to complete to be able to use the app.
  • QoS 33 (binary 100001), userInteractive, for user-interactive tasks, such as handling events and the app’s interface.

There’s also a ‘default’ value between 17 and 25, an unspecified value, and you might come across others used by macOS.

As a general rule, threads assigned a QoS of background and below will be allocated to run on E cores, while those of utility and above will be allocated to run on P cores when they’re available. Those higher QoS threads may also be run on E cores when P cores are already fully committed, but low QoS threads are seldom if ever promoted to run on P cores.

Internally, QoS uses a different, finer-grained metric, taking into account other factors, but those values aren’t exposed to the developer. QoS provided by the developer and set in their code is considered a request to guide macOS, and not an absolute determinant of which cores that code will be run on.

Cluster frequency

P and E cores are operated at a wide range of frequencies, that are determined by macOS according to more complex heuristics also involving QoS. In broad terms, when P cores are running threads they do so at frequencies close to their maximum, as are E cores when they’re running high QoS threads that should have been run on P cores. However, when E cores are only running low QoS threads, they do so at a frequency close to that of idle for maximum efficiency.

One significant exception to that are the two E cores in M1 Pro and M1 Max chips. To compensate for the fact that there are only two of them, when they’re running two or more threads of low QoS they have higher frequency. Frequency control in P cores in M4 Pro chips is also more complicated, as their frequency is progressively reduced as they’re loaded with more threads, presumably to constrain heat generated when running at those higher loads.

There are times when internal conditions, such as Low Power Mode, override a request for code to be run as userInteractive. You can experience that yourself if you try running apps that would normally be given P cores on a laptop with little remaining in its battery. All threads are then diverted to E cores to eke out the remaining battery life, and their cluster frequency is reduced to little more than idle.

Summary

  • Developers determine the performance of their code according to how it’s divided into threads, and their QoS.
  • QoS advises macOS of the performance expectation for each thread, and is a request not a requirement.
  • macOS uses QoS and other factors to allocate threads to specific core types and clusters.
  • macOS applies more complex heuristics to determine the frequency at which to run each cluster.
  • System-wide policy such as Low Power Mode can override QoS.

Last Week on My Mac: Why E cores make Apple silicon fast

By: hoakley
8 February 2026 at 16:00

If you use an Apple silicon Mac I’m sure you have been impressed by its performance. Whether you’re working with images, audio, video or building software, we’ve enjoyed a new turn of speed since the M1 on day 1. While most attribute this to their Performance cores, as it goes with the name, much is in truth the result of the unsung Efficiency cores, and how they keep background tasks where they should be.

To see what I mean, start your Apple silicon Mac up from the cold, and open Activity Monitor in its CPU view, with its CPU History window open as well. For the first five to ten minutes you’ll see its E cores are a wall of red and green with Spotlight’s indexing services, CGPDFService, mediaanalysisd, BackgroundShortcutRunner, Siri components, its initial Time Machine backup, and often an XProtect Remediator scan. Meanwhile its P cores are largely idle, and if you were to dive straight into using your working apps, there’s plenty of capacity for them to run unaffected by all that background mayhem.

handecpuhistory

It’s this stage that scares those who are still accustomed to using Intel Macs. Seeing processes using more than 100% CPU is terrifying, because they know that Intel cores can struggle under so much load, affecting user apps. But on an Apple silicon Mac, who notices or cares that there’s over a dozen mdworker processes each taking a good 50% CPU simultaneously? After all, this is what the Apple silicon architecture is designed for. Admittedly the impression isn’t helped by a dreadful piece of psychology, as those E cores at 100% are probably running at a frequency a quarter of those of P cores shown at the same 100%, making visual comparison completely false.*

This is nothing new. Apple brought it to the iPhone 7 in 2016, in its first SoC with separate P and E cores. That’s an implementation of Arm’s big.LITTLE announced in 2011, and development work at Cray and elsewhere in the previous decade. What makes the difference in Apple silicon Macs is how threads are allocated to the two different CPU core types on the basis of a metric known as Quality of Service, or QoS.

As with so much in today’s Macs, QoS has been around since OS X 10.10 Yosemite, six years before it became so central in performance. When all CPU cores are the same, it has limited usefulness over more traditional controls like Posix’s nice scheduling priority. All those background tasks still have to be completed, and giving them a lower priority only prolongs the time they take on the CPU cores, and the period in which the user’s apps are competing with them for CPU cycles.

With the experience gained from its iPhones and other devices, Apple’s engineers had a better solution for future Macs. In addition to providing priority-based queues, QoS makes a fundamental distinction between those threads run in the foreground, and those of the background. While foreground threads will be run on P cores when they’re available, they can also be scheduled on E cores when necessary. But background threads aren’t normally allowed to run on P cores, even if they’re delayed by the load on the E cores they’re restricted to. We know this from our inability to promote existing background threads to run on P cores using St. Clair Software’s App Tamer and the command tool taskpolicy.

This is why, even if you sit and watch all those background processes loading the E cores immediately after starting up, leaving the P cores mostly idle, macOS won’t try running them on its P cores. If it did, even if you wanted it to, the distinction between foreground and background, P and E cores would start to fall apart, our apps would suffer as a consequence, and battery endurance would decline. Gone are the days of crashing mdworker processes bringing our Macs to their knees with a spinning beachball every few seconds.

If seeing all those processes using high % CPU can look scary, the inevitable consequence in terms of software architecture might seem terrifying. Rather than building monolithic apps, many of their tasks are now broken out into discrete processes run in the background on demand, on the E cores when appropriate. The fact that an idle Mac has over 2,000 threads running in over 600 processes is good news, and the more of those that are run on the E cores, the faster our apps will be. The first and last M-series chips to have only two E cores were the M1 Pro and Max, since when every one has had at least four E cores, and some as many as six or eight.

Because Efficiency cores get the background threads off the cores we need for performance.

* For the record, I have measured those frequencies using powermetrics. For an M4 Pro, for example, high QoS threads running on the P cores benefit from frequencies close to the P core max of 4,512 MHz. Low QoS threads running on the E cores are run at frequencies close to idle, typically around 1,050 MHz. However, when the E cores run high QoS threads that have overflowed from the P cores, the E cores are normally run at around their maximum of 2,592 MHz. By my arithmetic, 1,050 divided by 4,512 is 0.233, which is slightly less than a quarter. Other M-series chips are similar.

❌
❌