In the background: Putting threads to work

By: hoakley

17 February 2026 at 15:30

To take best advantage of background processing, multiple cores, and different core types, apps have to be designed to run efficiently in multiple threads that co-operate with one another and with macOS. This article explains the basics of how that works in Apple silicon Macs.

Single thread

Early apps, and many current command tools, run as simple sequences of instructions. Cast your mind back to days of learning to code in a first language like Basic, and you’d effectively write something like
BEGIN handle any options passed in command process data or files until all done report any errors END

There are still plenty of command tools that work like that, and run entirely in a single thread. One example is the tar command, which dates back to 1979, and its current version in macOS remains single-threaded. As a result it can’t benefit from any additional cores, and runs at essentially the same speed on base, Pro, Max and Ultra variants of an M chip family.

Multiple threads

Since apps gained windowed interfaces, rather than running from BEGIN to END and quitting, they’re primarily driven by the user. The app itself sits in an event loop waiting for you to tell it what to do through menu commands, tools or other controls. When you do that, the app hands that task over to the required code to handle, and returns to its main event loop. This is much better, as you see an app that remains responsive at all times, even though it may have other threads that are working frantically in the background on tasks you have given them.

We’ve now divided that simple linear code up into at least two parts: a foreground main thread that the user interacts with and farms work out to worker threads that run in the background. When run on a single core with multitasking, that ensures the main thread appears responsive at all times, and we could use that with a Stop command, as has been commonly implemented with ⌘-., to signal the worker thread to halt its processing.

Threads come into their own, though, with multiple CPU cores. If the worker code can be implemented so more than one thread can be processing a job at any time, then each core can run a separate thread, and complete that job more quickly. For example, instead of compressing an image one row of pixels at a time, the image could be divided up into rectangles, and each farmed out to be compressed by a separate thread. Alternatively, the compression process could be designed in stages, each of which runs in a separate thread, with image data fed through their pipeline.

Using multiple threads has its own limitations, and there are trade-offs as the number of threads increases, and more work has to be done moving data between threads and the cores they’re running on.

These can be seen in a little demonstration involving CPU-intensive floating point computations in a tight loop:

running the computation in a single thread, on a single CPU core, normally processes 0.25 billion loops per second
splitting the total work to be performed into 2 threads increases that to 0.48 billion/s
in 10 threads, it rises to 2.1 billion/s
in 20 threads, it reaches a peak of 4.2 billion/s
in 50 threads, overhead has a negative impact on performance, and the rate falls to 3.3 billion/s.

Those are for the 10 P and 4 E cores in an M4 Pro, at high Quality of Service, so run preferentially on its P cores. In this case, picking the optimum number of threads could accelerate its performance by a factor of 16.8.

Core allocation

CPUs with one core type normally try to balance load across their cores, and coupled with a scheme for software to indicate the priority of its threads, aren’t as complex as those with two or more core types with contrasting performance characteristics. For its Alder Lake CPUs, Intel uses an elaborate Hardware Feedback Interface, marketed as Intel Thread Director, that works with operating systems such as Windows 11 and recent Linux kernels to allocate threads to cores.

With its long experience managing P and E cores in iPhones and iPads, Apple has chosen a scheme based on a Quality of Service (QoS) metric, together with software management of core cluster frequency. The developer API is simplified to offer a limited number of values for QoS:

QoS 9 (binary 001001), named background and intended for threads performing maintenance, which don’t need to be run with any higher priority.
QoS 17 (binary 010001), utility, for tasks the user doesn’t track actively.
QoS 25 (binary 011001), userInitiated, for tasks that the user needs to complete to be able to use the app.
QoS 33 (binary 100001), userInteractive, for user-interactive tasks, such as handling events and the app’s interface.

There’s also a ‘default’ value between 17 and 25, an unspecified value, and you might come across others used by macOS.

As a general rule, threads assigned a QoS of background and below will be allocated to run on E cores, while those of utility and above will be allocated to run on P cores when they’re available. Those higher QoS threads may also be run on E cores when P cores are already fully committed, but low QoS threads are seldom if ever promoted to run on P cores.

Internally, QoS uses a different, finer-grained metric, taking into account other factors, but those values aren’t exposed to the developer. QoS provided by the developer and set in their code is considered a request to guide macOS, and not an absolute determinant of which cores that code will be run on.

Cluster frequency

P and E cores are operated at a wide range of frequencies, that are determined by macOS according to more complex heuristics also involving QoS. In broad terms, when P cores are running threads they do so at frequencies close to their maximum, as are E cores when they’re running high QoS threads that should have been run on P cores. However, when E cores are only running low QoS threads, they do so at a frequency close to that of idle for maximum efficiency.

One significant exception to that are the two E cores in M1 Pro and M1 Max chips. To compensate for the fact that there are only two of them, when they’re running two or more threads of low QoS they have higher frequency. Frequency control in P cores in M4 Pro chips is also more complicated, as their frequency is progressively reduced as they’re loaded with more threads, presumably to constrain heat generated when running at those higher loads.

There are times when internal conditions, such as Low Power Mode, override a request for code to be run as userInteractive. You can experience that yourself if you try running apps that would normally be given P cores on a laptop with little remaining in its battery. All threads are then diverted to E cores to eke out the remaining battery life, and their cluster frequency is reduced to little more than idle.

Summary

Developers determine the performance of their code according to how it’s divided into threads, and their QoS.
QoS advises macOS of the performance expectation for each thread, and is a request not a requirement.
macOS uses QoS and other factors to allocate threads to specific core types and clusters.
macOS applies more complex heuristics to determine the frequency at which to run each cluster.
System-wide policy such as Low Power Mode can override QoS.

Explainer: % CPU in Activity Monitor

The Eclectic Light Company

By: hoakley

14 February 2026 at 16:00

The faster and more sophisticated the CPUs in our Macs get, the more anguished we get over their activity and performance. While there are alternatives that can display measurements of CPU activity in the menu bar and elsewhere, the most available tool is Activity Monitor. This article explains what it displays as % CPU, and how that needs careful interpretation.

Activity Monitor

The CPU view in Activity Monitor samples CPU and GPU activity over brief periods of time, displays results for the last sampling period, and updates those every 1-5 seconds. You can change the sampling period used in the Update Frequency section of the View menu, and that should normally be set to Very Often, for a period of 1 second.

This was adequate for many purposes with older M-series chips, but thread mobility in more recent chips can be expected to move threads from core to core, and between whole clusters, at a frequency similar to available sampling periods. That loses detail as to what’s going on in cores and clusters, and may give the false impression that a single thread is running simultaneously on multiple cores.

However, sampling frequency also determines how much % CPU is taken by Activity Monitor itself. While periods of 0.1 second and less are feasible with the command tool powermetrics, in Activity Monitor they would start to affect its results. If you need to see finer details, then you’ll need to use Xcode Instruments or powermetrics instead.

% CPU

The heart of the CPU view is what Activity Monitor refers to as % CPU, defined as the “percentage of CPU capability that’s being used by processes”. As far as I can tell, this is essentially the same as active residency in powermetrics, and it’s central to understanding its strengths and shortcomings.

Take a CPU core that’s running at 1 GHz. Every second it ‘ticks’ forward one billion times. If an instruction were to take just one clock cycle, then it could execute a billion of those every second. In any given second, that core is likely to spend some time idle and not executing any instructions. If it were to execute half a billion instructions in any given second, and spend the other half of the time idle, then it has an idle residency of 50% and an active residency of 50%, and that would be represented by Activity Monitor as 50% CPU. So a CPU core that’s fully occupied executing instructions, and doesn’t idle at all, has an active residency of 100%.

Expressed more formally, residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency, a feature of more recent cores, is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.

To arrive at the % CPU figures shown in Activity Monitor, the active residency of all the CPU cores is added together. If your Mac has four P and four E cores and they’re all fully occupied with 100% active residency each, then the total % CPU shown will be 800%.

Cautions

There are two situations where this can be misleading if you’re not careful.

Intel CPUs feature Hyper-threading, where each physical core acquires a second, virtual core that can also run at another 100% active residency. In the CPU History window those virtual cores are shown with even numbers, and in % CPU they double the total percentage. So an 8-core Intel CPU then has a total of 16 cores, and can reach 1,600% when running flat out with Hyper-threading.

coremanintel

This eight-core Intel Xeon runs a short burst with full Hyper-threading, during which it gains the eight virtual cores seen on the right. According to the % CPU in Activity Monitor shown below, it was then running at over 1,000%.

cpuendstop

The other situation affects Apple silicon chips, as their CPU cores can be run at a wide range of different frequencies under the control of macOS. However, Activity Monitor makes no allowance for their frequency. When it shows a core or total % CPU, that could be running at a frequency as low as 600 MHz in the M1, or as high as 4,512 MHz in the M4, nine times as fast. Totalling these percentages also makes no allowance for the different processing capacity of Performance and Efficiency cores.

Thus an M4 chip’s CPU cores could show a total of 400% CPU when all four E cores are running at 1,020 MHz with 100% active residency, or when four of its P cores are running at 4,512 MHz with 100% active residency. Yet the P cores would have an effective throughput of as much as six times that of the E cores. Interpreting % CPU isn’t straightforward, as nowhere does Activity Monitor provide core frequency data.

tuneperf1

In this example from an M4 Pro, the left of each trace shows the final few seconds of four test threads running on the E cores, where they took 99 seconds to complete at a frequency of around 1,020 MHz, then in the right exactly the same four test threads completed in 23 seconds on P cores running at nearer 4,000 MHz. Note how lightly loaded the P cores appear, although they’re executing the same code at almost four times the speed.

Threads and more

For most work, you should display all the relevant columns in the CPU view, including Threads and GPU.

tuneperf2

Threads are particularly important for processes to be able run on multiple cores simultaneously, as they’re fairly self-contained packages of executable code that macOS can allocate to a core to run. Processes that consist of just a single thread may get shuffled around between different cores, but can’t run on more than one of them at a time.

Another limitation of Activity Monitor is that it can’t tell you which threads are running on each core, or even which type of core they’re running on. When there are no other substantial threads active, you can usually guess which threads are running where by looking in the CPU History window, but when there are many active threads on both E and P cores, you can’t tell which process owns which thread.

Beachballing

A common mistake is to assume that high % CPU is somehow related to the appearance of a spinning beachball pointer. Although they can be related, they tell you different things about threads, processes and apps.

spinningbeachball

If you look in the Force Quit Applications window when an app is spinning a beachball, it doesn’t tell you anything about how much % CPU the app is taking, merely that the app is unresponsive. The most common cause of that is when the app’s main thread is too busy with a task to check in with macOS periodically.

All apps have a main thread, and many also have additional threads that handle time-consuming or computationally-intensive work. In most cases, well-written apps will avoid the main thread getting bogged down and unresponsive. One of the most common examples of this is with connections to remote sites. If those are handled in the main thread, then the whole app could be waiting for a slow-responding server to deliver its data, during which the app will be unresponsive, and macOS displays the spinning beachball.

The solution there is to handle the Internet connection asynchronously, allowing the main thread to get on with interacting with the user. When a background thread receives its data from the remote server, it can then update the main thread with that information.

Sometimes time-consuming tasks have to be handled in the main thread, and there may be no way to avoid unresponsiveness, but those are unusual if not exceptional now. At the same time, the appearance of the spinning beachball doesn’t mean that app has crashed or frozen, and it may well just be trying to get on with its work as well as it can in the circumstances.

It’s easy for an app to spin the beachball when it’s taking far less than 100% CPU, and many apps that can take over 500% in the right circumstances should remain fully responsive throughout.

Key points

% CPU is the percentage of time that CPU core isn’t idle, but is actively processing instructions. It takes no account of core type or frequency.
Total % CPU is the total of all individual values for CPU cores, and a maximum of 100% times the number of cores. For a chip with 8 cores, maximum total % CPU is 800%.
This can become confused in Intel CPUs with Hyper-threading, as that adds another set of virtual cores.
Apple silicon CPU cores operate at a wide range of frequencies, which aren’t taken into account in % CPU.
High % CPU is completely different from what happens when an app spins the beachball, which is the result of the app’s main thread becoming unresponsive.
An app can spin the beachball when its total % CPU is relatively low, and an app with a high total % CPU may remain highly responsive.

KAIX.IN
人类大脑是单线程的吗？
24 October 2025 at 10:50

人类大脑是单线程的吗？

KAIX.IN

By: dimlau

24 October 2025 at 10:50

作为人类，我不太愿意接受「大脑是单线程的」。但是很多时候我盘算着要做甲乙丙丁好多件事，但是最终发现只能一件一件去做，怎么解释？

这十来天我一直在做一款游戏。做游戏，怎么理解都可以，构思、编写、制作一款游戏，同时这个过程对我来说也是在做游戏，玩游戏。因为水平太差，一开始倒算平稳推进，但是到了这个游戏有了世界、人物、地图之类的功能划分，就时常在关闭程序时卡住，我把错误日志复制出来去查原因，发现是一种叫做死锁的问题。意思大致上就是，为了防止一个人物增加游戏世界里的某个物品数量时，另一个人物减少同一物品，没法清算到底结果是多少，所以给这个物品加了个锁，要增减都得先开拿到锁，把自己锁在里面才能操作，免得这个时候别人篡改数据。但是问题就是复杂逻辑里，有许多数据变动是嵌套的，最终导致：一个人把甲锁住，打算修改完乙才解锁；与此同时另一个人则是锁住了乙，打算修改完甲才解锁乙。两人僵持不下，谁也动不了。

上述例子里的人，就是所谓线程，设计合理的线程应该并行不悖。锁尽量开小一些，操作什么数据就只锁住那一个数据……总之，如果不愿意接受大脑是单线程的，大概就只能接受，大脑进化得不够好，锁太大，在思考游戏如何设计时，就完全无法同时完成博客构思。两件事的思绪其实同时都在飞，但是妳选择游戏这个线程，就同时锁住了思绪「产生、梳理、记录」三个区域，其他思绪可能还会随时迸发出来，但是迸发然后消散，总是无法完成最终想要的那个成品。这很让人恼火，但是，事情还是只能一件一件去做。

fin.

激赏！来信。

逗号的文字电台
伪需求
28 February 2024 at 21:57

伪需求

逗号的文字电台

By: 河石子

28 February 2024 at 21:57

最近小半年来因为工作的问题作为销售外勤的我也经常需要用到电脑处理些许文档了，有几次遇到过临时紧急的需要弄一个文档的时候只能在外面找个网吧临时对付一下。有过几次这样的经历之后就有了买台笔记本放包里用的想法，加上去年已经解决了温饱问题，兜里有一点点可以支配的私房钱了，于是就正式的开始选购起了笔记本，原则就是轻便和能打开一些复杂的报表就可以了。

考虑到数码产品“买新不买旧，除非钱不够”的原则，最开始是打算买个 ThinkPad X 系列，毕竟这个牌子是我用上电脑就接触到的第一个品牌。但是看了下新款的价格，以及老款那种傻大黑粗的造型，最终是在同城论坛买了个 2020 款的丐版 M1 的MacBook Air。买来前两天还是有些不习惯的，因为很多在 Windows 上用得得心应手的快捷键到了 macOS 上就变了，但是 macOS 下的 Office 软件对应的快捷键和 Windows 下又是一样的，为了减轻本来容量就小的脑子的负担，只能把快捷键映射成和 Windows 下一样的操作。

恰好家里的台式机还是 10 年前的联想扬天一体机，i3 4130的性能已经不堪用了，打开个 5M 左右的 Excel 报表都要转半天。笔记本都升级了，台式机也升级一下吧，又花了 400 块在同城买了一台 8100T+16G+256G 的主机，又在京东花了 1399 买了个杂牌的 23.8 寸 4K 显示器。这个后面觉得买亏了，没有 VESA 接口上不了支架，同等价位下都可以买到底端品牌的 27“ 4K 了。不过作为穷人要有穷人的觉悟，用一句“又不是不能用“就能简单的安慰自己。现在作为天选打工人再也没有什么能够阻挡我随时随地的工作了。

正常用了一个多星期，在网上看了些视频说是乞丐版的 MacBook Air 剪辑视频会很卡，至少需要 16G 以上的内存才能流畅使用。为什么会有这样的需求呢，因为打算把娃每一年的视频和照片剪辑到一起，方便分享给家里人看。但是考虑到“买都买了”、“又不是不能用”的时候，只能从其它方面入手解决这个问题了。

新买的 i3 8100T 不是正好 16G 的内存嘛，可以用来 Hackintosh ，再认真的了解了一下之后现在的 Hackintosh 安装已经不像几年前用变色龙、Clover 那么复杂了。使用 Opencore 简单的配置一下就能启动起来，剩下的细节问题就看在不在乎了，如果不在乎所谓的“完美”配置，只要能启动就起来就是能正常使用的。于是又在小黄鱼上买了 200 块买了张“拆机”RX570 8G 显卡，其实都明白这是个 RX470 矿渣刷出来的，但是本着“又不是不能用”的心态，买家卖家都看破不说破了。其实说不定 i3 8100T 自带的核显 UHD630 都是够用的。这么配置下来性能强于 2018款的 Mac mini，约等于同配置的 2019 款的 iMac，而且我这个算上显示器还不到 2000 块，真是划算呢。

因为这台算上显卡 600 块买的这台主机没有 M.2 接口，上不了 NVME 的固态硬盘，又打算把主板处理器主板硬盘升级一下，打算升级到 i5 8500 和带 M.2 接口的主办以及 500G 的 NVME 硬盘，预计花费 700 左右。虽然 10 代处理器是最后能完美使用核显装黑苹果的处理器，但还是那个买新不买旧除非钱不够的原则只能考虑 8 代。

又在网上看到了 18-19 款的 MacBook Pro 下半身，想着有 4K 显示器了可以高一个来玩玩，预计又要花费 1500 左右。

这么一折腾的话目前家里的台式主机花了 600 ，显示器 1400，笔记本 3600，准备更新的配置的台式机预计花费 700，苹果无头骑士 1500，这样算下来我就得到了一台性能将就的 PC 机，1.5 台 Mac 电脑，总计将会花费 8000。

眼看着购物车里的东西越来越多，回过头来我只是想有个能移动处理工作的笔记本和同时能把熊孩子平时的照片视频素材剪到一起的工具而已。更何况都还没有用现有的设备尝试能不能完成自己的需求，因为下载好的“剪映”软件图标下到现在都还有个小蓝点（还没打开过），淘宝买的共享 ID 下载的 FCPX 也同样没有打开过（还没用过就不算用盗版吧）。

很突然的，我觉得应该打住了，都本命年的人了不应该由着自己的想法来，看是的看看自己的真实需求，不用用一些借口来创造伪需求。就像之前玩无线电、学钓鱼、骑摩托车一样，都是刚刚开始用就已经无限预算的想买买买了，更何况我到现在为止做什么都是三分钟热度。

及时的通过其它方式转移注意力，这两天又迷上了通过脚本来签到各种 APP 的玩法，换个其它东西吸引注意力之后就不会花太多的心思来想折腾电脑的问题了，毕竟只是工具。

CPU 和 GPU - 异构计算的演进与发展

面向信仰编程

9 April 2021 at 08:00