Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Tune for Performance: Core types

By: hoakley
17 December 2024 at 15:30

When running apps on Intel Macs, because all their CPU cores are identical, the best you can hope for is that tasks make use of the number of cores available by running in multiple threads. Even single-threaded processes running on Apple silicon Macs get a choice of CPU core type, and this article explains the limited control you have over that.

m4coremanagement1

In my current model of CPU core allocation by macOS, shown in part above, Quality of Service (QoS) is the factor determining which core type a thread will be allocated to. QoS is normally hard-coded into an app, and seldom exposed to the user’s control.

If a thread is assigned a minimal QoS of Background (value 9), or less, then macOS will allocate it to be run on E cores, and it won’t be run on a P core. With a higher QoS, macOS allocates it to a P core if one is available; if not, then it will be run on E cores, with that cluster running at high frequency. Thus, threads with a higher QoS can be run on either P or E cores, depending on their availability.

taskpolicy

There are times when you might wish to accelerate completion of threads normally run exclusively on E cores. For example, knowing that a particular backup might be large, you might elect to leave the Mac to get on with that as quickly as possible. There are two methods that appear intended to change the QoS of processes: the command tool taskpolicy, and its equivalent code function setpriority().

Experience with using those demonstrates that, while they can be used to demote threads to E cores, they can’t promote processes or threads already confined to E cores so that they can use both types. For instance, the command
taskpolicy -B -p 567
that should promote the process with PID 567 to run on both types of core, has no effect on processes or their threads that are run at low QoS. taskpolicy can be used to demote processes and their threads with higher QoS to use only E cores, though. Running the command
taskpolicy -b -p 567
does confine all threads to the E cluster, and can be reversed using the -B option for threads with higher QoS (but not those set to low QoS by the process).

qoscores1

That can be seen in this CPU History window from Activity Monitor. An app has run four threads, two at low QoS and two at high QoS. In the left side of each core trace they are run on their respective cores, as set by their QoS. The app’s process was then changed using taskpolicy -b and the threads run again, as seen in the right. The two threads with high QoS are then run together with the two with low QoS in the four E cores alone.

The best way to take advantage of this ability to demote high QoS threads to run them on E cores is in St. Clair Software’s excellent utility App Tamer.

Virtualisation

macOS virtual machines running on Apple silicon chips are automatically assigned a high QoS, and run preferentially on P cores. Thus, even when running threads at low QoS, those are run within threads on the host’s P cores. This remains the only known method of electively running low QoS threads on P cores.

Game Mode, CPU cores

In the Info.plist of an application, the developer should assign that app to one of the standard LSApplicationCategoryTypes. If that’s one of the named game types, macOS automatically gives that app access to Game Mode. Among its benefits are “highest priority access” to the CPU cores, in particular the E cores, whose use by background threads is reduced. Other benefits include highest priority access to the GPU, and doubled Bluetooth sampling rate to reduce latency for input controllers and audio output. Game Mode is automatically turned on when the app is put into full-screen mode, and turned off when full-screen mode is stopped, although the user also has manual control through the mode’s item in the menu bar.

In practice, while this is beneficial to many games, it has little if any use for modifying core type allocation in other circumstances. As the user can’t modify an app’s Info.plist without breaking its signature and notarization, this is only of use to developers.

Summary

  • The QoS assigned by an app to its threads is used by macOS to determine which core type to allocate them to.
  • Threads with a low (Background, 9) QoS are run exclusively on E cores. Those with higher QoS are run preferentially on P cores, and normally only on E cores when no P core is available.
  • App Tamer and taskpolicy can be used to demote higher QoS threads to be run on E cores, but low QoS threads can’t be promoted to be run on P cores.
  • macOS VMs run on P cores, so low QoS threads running in a VM will be run on P cores, and can be used to accelerate Background threads.
  • Game Mode gives games priority access to CPU cores, particularly to E cores, but users can’t elect to run other apps in Game Mode, and there don’t appear to be benefits in doing so.
  • If you feel an app would benefit from user control over CPU core type allocation through access to their QoS, suggest it to the app’s developer.

Explainer

Quality of Service (QoS) is a property of each thread in a process, and normally chosen from the macOS standard list:

  • QoS 9 (binary 001001), named background and intended for threads performing maintenance, which don’t need to be run with any higher priority.
  • QoS 17 (binary 010001), utility, for tasks the user doesn’t track actively.
  • QoS 25 (binary 011001), userInitiated, for tasks that the user needs to complete to be able to use the app.
  • QoS 33 (binary 100001), userInteractive, for user-interactive tasks, such as handling events and the app’s interface.

There’s also a ‘default’ value of QoS between 17 and 25, an unspecified value, and in some circumstances you might come across others.

What has changed in macOS Sequoia 15.2?

By: hoakley
12 December 2024 at 05:29

The macOS 15.2 update includes the second phase of AI support for Apple silicon Macs, introducing the Image Playground app, and integrated ChatGPT support in both Siri and Writing Tools. AI now extends support to several of the non-US variants of English, including English (UK), although non-English languages won’t gain support until next year.

Apple’s release notes are a real joy to read and contain more detailed information at last, including the following:

  • Photos enhancements,
  • Safari supports background images for its Start Page, tries to use HTTPS on all sites, and more,
  • Sharing item locations in Find My,
  • Sudoku for News+,
  • Presenter preview for AirPlay,
  • Pre-market quotes in Stocks.

Among the more significant bugs fixed is that Apple silicon virtualisation on M4 Macs can now open all VMs, including macOS guests before 13.4. For those running Ruby with YJIT enabled, this update should fix kernel panics with M4 chips. Further fixes are detailed in the developer release notes, and enterprise release notes are here.

Security release notes are available here, and list 42 entries including 4 in the kernel, none of which Apple reports may already have been exploited.

iBoot firmware on Apple silicon Macs is updated to version 11881.61.3, and T2 firmware to 2069.40.2.0.0 (iBridge: 22.16.12093.0.0,0). The macOS build number is 24C101, with kernel version 24.2.0.

Version changes in bundled apps include:

  • Books, version 7.2
  • Freeform, version 3.2
  • iPhone Mirroring, version 1.2
  • Music, version 1.5.2
  • News, version 10.2
  • Passwords, version 1.2
  • Safari, version 18.2 (20620.1.16.11.8)
  • Screen Sharing, version 5.2
  • Stocks, version 7.1
  • TV, version 1..5.2
  • Tips, version 15.2
  • VoiceMemos, version 3.1.

Inevitably, there are many build increments in components related to Apple Intelligence, and a great many across private frameworks. Other significant changes to /System/Library include:

  • Screen Time, build increment
  • Siri, version increment
  • VoiceOver, build increment
  • Kernel extensions including AGX… kexts, AOP Audio kexts, AppleEmbeddedAudio, AppleUSBAudio, and several virtualisation kexts
  • One new kernel extension, AppleDisplayManager
  • APFS to version 2317.61.2
  • Most of the Core frameworks have build increments
  • FileProvider framework, build increment
  • Virtualisation framework, build increment
  • PrivateCloudCompute framework, new version
  • Spotlight frameworks, build increments
  • New private frameworks include Anvil, AppSystemSettings (and its UI relative), AskToDaemon, many Generative… frameworks involved with AI, OSEligibility, TrustKit, WalletBlastDoorSupport
  • Several qlgenerators have build increments.

After that lot, the next scheduled update to macOS Sequoia is in the New Year.

Tune for Performance: do more threads run faster?

By: hoakley
10 December 2024 at 15:30

One of the most distinctive features about modern Macs is that they have multiple cores, in Apple silicon models a minimum of eight. For an app to be able to make use of more than one core (or its equivalent) at a time it needs to divide its processing into threads, discrete blocks of code that can be run on different cores by macOS. If it doesn’t do that, then running that app on a high-end Pro, Max or Ultra chip is unlikely to be significantly faster than on a base model (assuming that task is CPU-bound).

Activity Monitor

You might be able to get a good idea as to how well an app makes use of multiple cores from watching it in use in Activity Monitor’s CPU History window, but in many cases that isn’t conclusive, and needs to be confirmed.

To illustrate ways to tackle this, I take the example of file compression. Some methods lend themselves to multiple threads better than others, and some may be implemented in a way that won’t accelerate when several cores are available. For the occasional user this might make little difference, but if you were to spend much of your day waiting for 10-100 GB files to compress, it merits a little exploration.

polycore1

This CPU History window from a compression task run on an M4 Pro is typically unhelpful. A single file compression is seen in the group of 4-5 peaks in CPU in the right half of each trace. Although compression used all ten P cores, it appears to have been moved between P clusters, seen by comparing the timing of peaks on Cores 8 and 10, 9 and 11. At no time does the task appear to exceed 50% active residency on any of the cores, though. All you can do is guess as to what might be going on, and whether it might run faster on twelve or more P cores.

polycore2

On another run, the same task is reported to have reached a peak of 500% CPU in 12 threads, but would it exceed that on more P cores, given that 10 were available?

Timing performance

As usual, I’ve been cheating a little to generate those results, by using my simple compression-decompression app Cormorant, in which I control how many threads it uses. If the app you’re trying to investigate offers a similar feature, then you can set up a standard task, here the compression of a 15.517 GB IPSW file, and time how long it takes using different numbers of threads.

polycore3

The answers from Cormorant, which conveniently performs its own timing, are:

  • 1 thread takes 49.32 seconds
  • 2 take 26.74 s
  • 3 take 18.60 s
  • 4 take 14.29 s
  • 5 take 11.21 s.

So Cormorant’s compression can make good use of more P cores, although with diminishing returns.

Very few apps give you this level of control, though. The only other compression utility that does appear to is Keka.

polycore4

In its settings, you can give its tasks a maximum number of threads, and even run them at custom Quality of Service (QoS) if you want them to be run in the background on E cores, and not interrupt your work on P cores.

Controlling threads

There is one way that you can limit the effective number of threads used by an arbitrary app, and that’s to run it in a Virtual Machine, as you control the number of virtual cores that it uses. While you can just about run a macOS VM on a single core alone, I suggest that a more workable starting point is two cores, and you can increase that number up to the total of P cores in the host without causing problems.

VMs have other virtues, including their relative lack of background processes, allowing their virtual cores to be almost entirely devoted to your test task. However, as they can’t run apps from the App Store other than Apple’s free suite of Pages, Numbers and Keynote, that could prevent you from using them for testing.

To set up a VM for thread tests, I duplicated a standard Sonoma 14.7.1 VM so I could throw it away at the end, opened it with five virtual cores, and copied over the test app Cormorant and file. I then closed that VM down, set it to use 2 virtual cores, opened it and ran my test. I repeated that with an increasing number of virtual cores up to a total of 5. Compression times are:

  • 2 vCPUs take 30.13 seconds
  • 3 take 21.36 s
  • 4 take 17.18 s
  • 5 take 13.59 s.

Those are only slightly slower than their equivalents from the host tests above.

Analysis

polycore5

Plotting those results out using DataGraph, the lines of best fit follow power laws:

  • for real cores, time = 49.8863/(T^0.91169)
  • for virtual cores, time = 54.737/(T^0.854318)

where T is the number of threads. That explains the apparently diminishing returns with increasing numbers of threads, although the maths isn’t as simple as we might like.

A better way to look at this is by calculating the rate of compression in GB/s, simply by dividing the file size of 15.517 GB by each time. Here we end up with straight lines from linear regression, that are more amenable to thought.

polycore6

Those regressions are:

  • for real cores, rate of compression = 0.0464 + (0.264 x T)
  • for virtual cores, rate of compression = 0.102 + (0.206 x T)

which are more generally useful.

Tuning performance

This might appear over-elaborate and of little practical use, but we now have a much better understanding of the factors limiting compression performance:

  • The more threads compression uses, the shorter time a task will take.
  • There are limits to that improvement, though, when substantially more than 5 threads are used.
  • Compression rates achieved in M4 P cores are significantly lower than read or write speeds of the internal SSD, so aren’t likely to be limited by the speed of a faster SSD (TB3 or USB4).
  • As compression appears to be CPU-bound, faster P cores would also be expected to result in shorter times.
  • Improving the efficiency of the compression code could increase performance.
  • Compression in a VM runs at about 78% of speed on the host.

To see how reliable these are, I therefore repeated the Cormorant test using all 10 P cores on the host Mac, which took 6.80 seconds, a little more than half the time for 5 cores. That’s a compression rate of 2.28 GB/s, rather less than the 2.69 GB/s predicted by the linear regression. That’s now approaching the write speed of some TB3 SSDs, and the rate-limiting step could then change to be the write performance of the storage being used, rather than CPU cores.

M4 Macs can’t virtualise older macOS

By: hoakley
14 November 2024 at 15:30

If you’ve already got a new M4 Mac and tried to run a macOS virtual machine on it, then you might have been disappointed. It seems that M4 chips can’t virtualise any version of macOS before 13.4 Ventura. So before you trade in or pass on your M1, M2 or M3 Mac, if you need access to older VMs, you might like to check whether this affects you.

I’m indebted to Csaba Fitzl for drawing my attention to this problem, and for reporting it to Apple in Feedback FB15774587. It has also been reported as affecting UTM, and I believe affects all other macOS virtualisation software for Apple silicon.

The bug

Running a macOS VM for any version before 13.4 Ventura on an M4 Mac results in a black screen, and the VM fails to boot. This is true whatever settings are used in the virtualiser, even if it’s set to boot the VM in Recovery mode. It’s also true when that VM has been built on that Mac: although that appears to complete successfully, when first run that VM opens as a black screen and never proceeds with personalisation and setup.

Currently the only way to run a VM with macOS prior to 13.4 Ventura is to do so on a host with an M1, M2 or M3 chip.

Can this be fixed?

Unfortunately, as this bug prevents the VM from booting, there’s no reliable way to access its log to discover what’s going wrong there. There’s no sign of the failure in the host’s log either: the host appears to initialise its Virtio and other support normally, without errors or faults. After those, virtualisation processes on the host fall silent as they wait for the VM to start, which never happens.

There is a useful clue in Activity Monitor, though: in its CPU pane, despite being allocated multiple virtual cores, only one is seen to be active on the host. That implies the failure is occurring before the VM kernel boots the other cores, an event that occurs early during the kernel boot phase. Until that point, pre-boot phases and the kernel run on just a single CPU core.

macOS 13.4 updated iBoot to version 8422.121.1, so at first sight the VM could be failing when running older firmware. That doesn’t appear likely, as version 8422.121.1 was also installed in the 12.6.6 security update, so 12.7.x shouldn’t suffer this problem, but it does.

It thus appears most likely that this bug strikes in the early part of kernel boot, in which case the most feasible solution would be to fix the bug in macOS kernels prior to 13.4, and promulgate new IPSW image files for those. I suspect that’s very unlikely to happen, and as far as I’m aware it would be the first time that Apple has issued revised IPSWs.

Which macOS can you virtualise?

Support for lightweight virtualisation of macOS on Apple silicon Macs was still in progress in the first version of macOS to run on M1 chips, Big Sur. You therefore cannot create or run Big Sur VMs.

macos12

The first versions that can run in VMs are macOS 12 Monterey, although prior to 12.4 they can sometimes be a bit fractious. They also have major limitations, such as not supporting shared folders with the host.

macos121

This is 12.1 with its old System Preferences, running happily on an M3 Pro.

macos13

macOS Ventura should run well on M1, M2 and M3 hosts, and 13.4 and later on M4 hosts too.

macos14

macOS Sonoma should run even better on all current Apple silicon Macs, and delivers a much improved display with autoscaling. However, 14.2 and 14.2.1 don’t automatically support shared folders because of a bug that was fixed in 14.3.

macos15

macOS Sequoia is fully compatible, and adds support for iCloud Drive and some other Apple Account features, although still won’t run App Store apps. It may also fail to install the required extras to support Apple Intelligence.

Summary

  • Currently, M4 Macs can only run VMs of macOS Ventura 13.4 and later, 14 Sonoma, and 15 Sequoia.
  • M1, M2 and M3 Macs can run VMs of macOS Monterey 12.0.1 and later.
  • macOS Big Sur 11 can’t be virtualised on Apple silicon.

Further information

Viable and virtualisation
Why macOS 14.2 & 14.2.1 VMs lose shared folders, and how to work around it

Postscript

I’m grateful for the suggestion that it might be possible to work around this problem by running the older VM on a single core. Although I did manage to do that on an M3 (the first time I have seen recent macOS running on just one core!), that fails just the same on an M4, I’m afraid.

Inside M4 chips: P cores hosting a VM

By: hoakley
13 November 2024 at 15:30

One common but atypical situation for any M-series chip is running a macOS virtual machine. This article explores how virtual CPU cores are handled on physical cores of an M4 Pro host, and provides further insight into their management, and thread mobility across P core clusters.

Unless otherwise stated, all results here are obtained from a macOS 15.1 Sequoia VM in my free virtualiser Viable, with that VM allocated 5 virtual cores and 16 GB of memory, on a Mac mini M4 Pro running macOS Sequoia 15.1 with 48 GB of memory, 10 P and 4 E cores.

How virtual cores are allocated

All virtualised threads are treated by the host as if they are running at high Quality of Service (QoS), so are preferentially allocated to P cores, even though their original thread may be running at the lowest QoS. This has the side-effect of running virtual background processes considerably quicker than real background threads on the host.

In this case, the VM was given 5 virtual cores so they could all run in a single P cluster on the host. That doesn’t assign 5 physical cores to the VM, but runs VM threads up to a total of 500% active residency across all the P cores in the host. If the VM is assigned more virtual cores than are available in the host’s P cores, then some of its threads will spill over and be run on host E cores, but at the high frequency typical of host threads with high QoS.

Performance

There is a slight performance hit in virtualisation, but it’s surprisingly small compared to other virtualisers. Geekbench 6.3.0 benchmarks for guest and host were:

  • CPU single core VM 3,643, host 3,892
  • CPU multi-core VM 12,454, host 22,706
  • GPU Metal VM 102,282, host 110,960, with the VM as an Apple Paravirtual device.

Some tests are even closer: using my in-core floating point test, 1,000 Mloops run in the VM in 4.7 seconds, and in the host in 4.68 seconds.

Host core allocation

To assess P core allocation on the host, an in-core floating point test was run in the VM. This consisted of 5 threads with sufficient loops to fully occupy the virtual cores for about 20 seconds. In the following charts, I show results from just the first 15 seconds, as representative of the whole.

vmpcoresm4pro0

When viewed by cluster, those threads were mainly loaded first onto the first P cluster (pale blue bars), where they ran for just over 1 second before being moved to the second cluster (red bars). They were then regularly switched between the two P clusters every few seconds throughout the test. Four cycles were completed in this section of the results, with each taking 2.825 seconds, so threads were switched between clusters every 1.4 seconds, the same time as I found when running threads on the host alone, as reported previously.

For most of the 15 seconds shown here, total active residency across both P clusters was pegged at 500%, as allocated to the VM in its 5 virtual cores, with small bursts exceeding that. Thus that 500% represents those virtual cores, and the small bursts are threads from the host. Although the great majority of that 500% was run on the active P cluster, a total of about 30% active residency consisted of other threads from the VM, and ran on the less active P cluster. That probably represents the VM’s macOS background processes and overhead from its folder sharing, networking, and other Virtio device use.

vmpcoresm4pro1

When broken down to individual cores within each cluster, seen in the first above and the second below, total activity differs little across the cores in the active cluster. During its period in the active cluster, each core had an active residency of 80-100%, bringing the cluster total to about 450% while most active.

vmpcoresm4pro2

In case you’re wondering whether this occurs on older Apple silicon, and it’s just a feature of macOS Sequoia, here’s a similar example of a 4-core VM running 3 floating point threads in an M1 Max with macOS 15.1, seen in Activity Monitor’s CPU History window. There’s no movement of threads between clusters.

vmm1maxtest1

P core frequencies

powermetrics, used to obtain this data, provides two types of core frequency information. For each cluster it gives a hardware active frequency, then for each core it gives an individual frequency, which often differs within each cluster. Cores in the active P cluster were typically reported as running at a frequency of 4512 MHz, although the cluster frequency was lower, at about 3858 MHz. For simplicity, cluster frequencies are used here.

vmpcoresm4pro3

This chart shows reported frequencies for the two P clusters in the upper lines. Below them are total cluster active residencies to show which cluster was active during each period.

The active cluster had a steady frequency of just below 3,900 MHz, but when it became the less active one, its frequency varied greatly, from idle at 1,260 MHz up to almost 4,400 MHz, often for very brief periods. This is consistent with the active cluster running the intensive in-core test threads, and the other cluster handling other threads from both the VM and host.

CPU power

Several who have run VMs on notebooks report that they appear to drain the battery quickly. Using the previous results from the host, the floating point test used here would be expected to use a steady 7,000 mW.

vmpcoresm4pro4

This last chart shows the total CPU power use in mW over the same period, again with cluster active residency (here multiplied by 10), added to aid recognition of cluster cycles. This appears to average about 7,500 mW, only 500 mW more than expected when run on the host alone. That shouldn’t result in a noticeable increase in power usage in a notebook.

In the previous article, I remarked on how power used appeared to differ between the two clusters, and this is also reflected in these results. When the second cluster (P1) is active, power use is less, at about 7,100 mW, and it’s higher at about 7,700 when the first cluster (P0) is active. This needs to be confirmed on other M4 Pro chips before it can be interpreted.

Key information

  • macOS guests perform almost as well as the M4 Pro host, although multi-core benchmarks are proportionate to the number of virtual cores allocated to them. In particular, Metal GPU performance is excellent.
  • All threads in a VM are run as if at high QoS, thus preferentially on host P cores. This accelerates low QoS background threads running in the VM.
  • Virtual core allocation includes all VM overhead from the VM, such as its macOS background threads.
  • Guest threads are as mobile as those of the host, and are moved between P clusters every 1.4 seconds.
  • Although threads run in a VM incur a small penalty in additional power use, this shouldn’t be significant for most purposes.
  • Once again, evidence suggests that the first P cluster (P0) in an M4 Pro uses slightly more power than the second (P1). This needs to be confirmed in other systems.
  • powermetrics can’t be used in a VM, not unsurprisingly.

Previous article

Inside M4 chips: P cores

Explainer

Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.

Sequoia catches: periodic and VMs

By: hoakley
6 November 2024 at 15:30

This article describes one change that has caught out some using macOS Sequoia, and considers what has changed in Sequoia Virtual Machines (VMs).

periodic has been removed

After many years of deprecation, the periodic scheduled maintenance command tool has been removed from macOS 15.0. In its heyday, periodic was responsible for running daily, weekly and monthly maintenance and housekeeping schedules including rolling the system logs. Over that time, macOS has been given other means for achieving similar ends. For example, logs are now maintained constantly by the logd service, and aren’t retained by age, but to keep the total size of log files fairly constant. I don’t think that Sonoma performed any routine maintenance using periodic.

If you use periodic, then the best option is to use launchd with a LaunchAgent or LaunchDaemon. If you’d prefer to use cron, that’s still available but is disabled in macOS standard configuration.

Sequoia VMs: AI

Sequoia VMs created from an IPSW image of Sequoia (rather than upgraded from Sonoma or earlier) running on Sequoia hosts are the first to gain access to iCloud features. Now that 15.1 has been released with AI, I’ve been trying to discover whether that can also be used in a VM. So far, my 15.1 VM has sat for hours ‘preparing’, but AI still hasn’t activated on it. I suspect that, for the present, AI isn’t available to VMs. If you have had success, please let me know.

Sequoia VMs: macOS builds

My test 15.1 VM has also behaved strangely. It was originally created in 15.0, updated successfully to 15.0.1, then to 15.1, where it was running build 24B83, the version released generally on 28 October. Later that week Software Update reported that a macOS update was available, and that turned out to be a full install of 15.1 build 24B2083, released on 30 October for the new M4 Macs. This VM is hosted on a Studio M1 Max!

Installation completed normally, and that VM now seems to be running the new build perfectly happily, although it hasn’t proved any help in activating AI.

Don’t be surprised if your 15.1 build 24B83 VMs behave similarly. If anyone can suggest why that occurred I’d be interested to know, as it’s generally believed that build 24B2083 has been forked to support only M4 models.

❌
❌