Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Tune for Performance: Do you really need a big internal SSD?

By: hoakley
12 December 2024 at 15:30

The most common economy that many make when specifying their next Mac is to opt for just 512 GB internal storage, and save the $/€/£600 or so it would cost to increase that to 2 TB. After all, if you can buy a 2 TB Thunderbolt 5 SSD for around two-thirds of the price, why pay more? And who needs the blistering speed of that expensive internal storage when your apps work perfectly well with a far cheaper SSD? This article considers whether that choice matters in terms of performance.

To look at this, I’m going to use the same compression task that I used in this previous article, my app Cormorant relying on the AppleArchive framework in macOS to do all the heavy lifting. As results are going to differ considerably when using other apps and other tasks, I’d like to make it clear that this can’t reach general conclusions that apply to every task and every app: your mileage will vary. My purpose here is to show how you can work out whether using a slower disk will affect your app’s performance.

Methods

In that previous article, I concluded that compression speed was unlikely to be determined by disk performance, as even with all 10 P cores running compression threads, that rate only reached 2.28 GB/s, around the same as write speed to a good Thunderbolt 3 SSD. For today’s tests, I therefore set Cormorant to use the default number of threads (all 14 on my M4 Pro) so it would run as fast as the CPU cores would allow when compressing my standard 15.517 GB IPSW test file.

I’m fortunate to have a range of SSDs to test, and here use the 2 TB internal SSD of my Mac mini M4 Pro, an external USB4 enclosure (OWC Express 1M2) with a 2 TB Samsung 990 Pro SSD, and an external 2 TB Thunderbolt 3 SSD (OWC Envoy Pro SX). As few are likely to have access to such a range, I included two disk images stored on the internal SSD, a 100 GB sparse bundle, and a 50 GB read-write disk image, to see if they could be used to model external storage. All tests used unencrypted APFS file systems.

The first step with each was to measure its write speed using Stibium. Unlike more popular benchmarking apps for the Mac, Stibium measures the speed across a wide range of file sizes, from 2 MB to 2 GB, providing more insight into performance. After those measurements, those test files were removed, the large test file copied to the volume, and compressed by Cormorant at high QoS with the default number of threads.

Disk write speeds

compressionbydisk1

Results in each test followed a familiar pattern, with rapidly increasing write speeds to a peak at a file size of about 200 MB, then a steady rate or slow decline up to the 2 GB tested. The read-write disk image was a bit more erratic, though, with high write speeds at 800 and 1000 MB.

At 2 GB file size, write speeds were:

  • 7.69 GB/s for the internal SSD
  • 7.35 GB/s for the sparse bundle
  • 3.61 GB/s for the USB4 SSD
  • 2.26 GB/s for the Thunderbolt 3 SSD
  • 1.35 GB/s for the disk image.

Those are in accord with my many previous measurements of write speeds for those types of storage.

Compression rates

Times to compress the 15.517 GB test file ranked differently:

  • 5.57 s for the internal SSD
  • 5.84 s for the USB4 SSD
  • 9.67 s for the sparse bundle
  • 10.49 s for the Thunderbolt 3 SSD
  • 16.87 s for the disk image.

When converted to compression rates, sparse bundle results are even more obviously an outlier, as shown in the chart below.

compressionbydisk2

There’s a roughly linear relationship between measured write speed and compression rate in the disk image, Thunderbolt 3 SSD, and USB4 SSD, and little difference between the latter and the internal SSD. These suggest that disk write speed becomes the rate-limiting factor for compression when write speed falls below about 3 GB/s, but above that faster disks make little difference to compression rate.

Poor performance of the sparse bundle was a surprise, given how close its write speeds are to those of the host internal SSD. This is probably the result of compression writing a single very large file across its 14 threads; as the sparse bundle stores file data on a large number of band files, their overhead appears to have got the better of it. I will return to look at this in more detail in the near future, as sparse bundles have become popular largely because of their perceived superior performance.

The difference in compression rates between USB4 and Thunderbolt 3 SSDs is also surprisingly large. Of course, a Mac with fewer cores to run compression threads might not show any significant difference: a base M3 chip with 4 P and 4 E cores is unlikely to achieve a compression rate much in excess of 1.5 GB/s on its internal SSD because of its limited cores, so the restricted write speed of a Thunderbolt 3 SSD may not there become the rate-limiting factor.

Conclusions

  • The rate-limiting step in task performance will change according to multiple factors, including the effective use of multiple threads on multiple cores, and disk performance.
  • There’s no simple model you can apply to assess the effects of disk performance, and tests using disk images can be misleading.
  • You can’t predict whether a task will be disk-bound from disk benchmark performance.
  • Even expensive high-performance external SSDs can result in noticeably poor task performance. Maybe that money would be better spent on a larger internal SSD after all.

Last Week on My Mac: Mac mini M4 Pro first impressions, cores and more

By: hoakley
10 November 2024 at 16:00

My Mac mini M4 Pro arrived four days early, and swiftly exceeded all expectations. Everyone who has seen this new design remarks on how tiny it is, and I still keep looking at it, wondering how the smallest Mac I have ever owned is also by far the fastest. For those who collect numbers, it matched the growing collection in Geekbench’s database, with a single-core score of 3,892, a multi-core of 22,706, and Metal GPU of 110,960. Of course those new MacBook Pros with M4 Max chips are doing better with their extra two CPU cores and twice the number of GPU cores, but second best is still far superior to anything I’ve run before.

Not only is this my fastest Mac ever, but it was also the quickest and simplest to commission. It replaces my Mac Studio M1 Max, and now sits under my Studio Display using the Studio’s old keyboard and trackpad. As I’ll explain in a future article, I opted to migrate to the mini during initial setup using the Studio’s backup SSD, a sleek OWC Envoy Pro SX connected by Thunderbolt 3. Including the inevitable macOS update, the whole process took less than two hours, from the start of unboxing to the first full Time Machine backup.

With 48 GB memory and an internal SSD of 2 TB, I was keen to benchmark that once its initial backup was done. Although I have reduced confidence in these figures, there’s no doubt that this SSD is significantly quicker than the 2 TB in the Studio; how much quicker is open to debate. AmorphousDiskMark reported a write speed of 7.7 GB/s, while my own Stibium gave 10.3 GB/s across a broad range of file sizes. There was closer agreement on read speed, at around 6.8 GB/s. Both of those are comfortably above what I expect when I can finally get hold of an external Thunderbolt 5 SSD.

CPU cores

My M4 Pro is the full-performance variant, with 10 P and 4 E cores, set in three clusters: an E cluster with all four E cores, and two clusters of five P cores. Since its M3 chips, Apple has increased the maximum number of cores in a cluster from 4 (in M1 and M2 chips) to 6 (M3 and M4).

M4 P cores run at frequencies between 1260 and 4512 MHz (1.3-4.5 GHz), their maximum frequency being 111% that of the M3 P core, and 140% that of the M1 P core. M4 E cores have a narrower range of frequencies than those in the M3, between 1020-2592 MHz (1.0-2.6 GHz), so running at 137% of M3 frequency when running low QoS threads, and 94% of M3 when running at maximum frequency for high QoS threads spilt over from P cores. Those are going to make comparisons between E core performance very interesting indeed.

Single-thread and single-core comparisons of P core throughput are inevitably in the M4’s favour, with significantly better performance across integer, floating point and vector performance, as shown in the chart below.

M4M3multiTests

The Y axis here gives loop throughput per second for my four basic in-core performance tests, a tight assembly code integer math loop, another tight assembly code loop of floating point math, NEON vector processor assembly code, and a tight loop calling an Accelerate routine run in the NEON unit. Pale blue bars are results for the M1, purple for the M3, and red for the shiny new M4.

Although improvements in integer and floating point performance might appear small here, these results are for single threads. When scaled up across more P cores, the differences are magnified, as shown in the regressions below.

M4vM3floatTimevThreads

This plots total times taken to execute multiple threads, each consisting of 10^9 (one thousand million, or one billion) loops of floating point assembly code, against the number of threads, the same as the number of cores used, as each core runs one thread. According to the fitted linear regression equations, each thread/core of 10^9 loops takes the same period: 6.04 seconds for the M3 P cores, and 4.69 for the M4. Those equate to 1.66 x 10^8 loops/second per thread for the M3, and 2.13 x 10^8 for the M4. On that basis, the M4 performs at 128% that of the M3, significantly better than expected by the 111% increase in maximum core frequency. That requires the M4 to have improved processing speed for the same frequency of the M3, a frequency-independent improvement.

Core allocation

One striking difference between M4 CPU cores and those of previous Apple silicon chips is their pattern of core allocation, something you may notice even in Activity Monitor, for all its faults. This is best illustrated in its CPU History window.

For these two series of tests, I waited until the Mac was idling quietly, with little happening on its CPU cores. I then ran, in rapid succession, a series of my in-core floating point tests of 10^9 loops at high QoS, starting with a single thread, then two, and so on up to six or eight threads. This results in a pattern distinct from anything you’ll see in Intel cores, as I have shown here since the early days of the M1.

m3profloptcompo

This is the series seen on my MacBook Pro M3 Pro running macOS 15.2 beta, and is typical of M1, M2 and M3 chips. Although the cores are out of order here, these are the six cores in the chip’s single P cluster. At the left is the obvious peak from 1 thread running on Core 9, then the 2 threads of the next test appear on cores 9 and 12. Three fully occupy 7, 9 and 12, with a little spilt onto 8, and so on until all six are fully occupied with 6 threads.

Activity Monitor is a little too crude to see how distinct this is, for which you have to resort to shorter sampling periods of 0.1 second, and the finer detail of active residency reported by powermetrics. Those confirm that, much of the time, these heavy in-core compute loads run in a single thread on a single core, and if they are moved around, it’s relatively infrequently. E cores are different, though.

Compare that with a similar sequence, this time for 1-8 threads, on the M4 Pro. Its ten P cores are divided into two clusters, which I’ve separated here with the blue line, although within each cluster the cores aren’t in numeric order.

m4profloptcompo

Reading again from the left, a single thread is run in cores 6 and 8, with a little in 14 in the second cluster. Two threads are run in all five cores of the second cluster, with a little at the end from those of the first cluster. Three threads are similar, with significant contributions from all ten cores, and from then on they are similar.

So, in the M4 P cores threads are no longer allocated to single cores, and it appears from the CPU History window that they may even be allocated across more than one cluster. In fact, when the detailed active residency and frequency data from powermetrics are used, while threads are often moved between two cores in the same cluster, macOS will normally avoid unnecessarily running threads on other clusters, although it will happily move all threads from one cluster to another.

The rationale behind running all threads within the same cluster is clear, as CPU core frequencies of all cores in the same cluster are the same. Constraining threads to a single cluster thus allows others to idle and use less energy. While that continues in M4 chips, it isn’t clear why threads are moved between cores so frequently, or moved together to another cluster, and whether that brings improvements in performance.

Disk Images: Bands, Compaction and Space Efficiency

By: hoakley
21 October 2024 at 14:30

Of the two most dependable types of disk image, read-write (UDRW) disk images are the simpler. The only options you can set are its internal file system, whether the container is encrypted, and its maximum capacity. From there on, macOS will automatically Trim the disk image when it’s mounted, and adjust the size it takes on disk accordingly.

Sparse bundles can be more complex to configure in the first place, and may need routine maintenance to ensure their size on disk doesn’t grow steadily with use. This article considers how significant those complications are.

Band size

Most if not all storage divides data into blocks; in the case of SSDs, the default block size in APFS is 4,096 bytes, and that’s effectively the unit of storage in the single file of a read-write disk image.

Its equivalent in a sparse bundle is the maximum size that each of its band files can reach, normally set to 8.4 MB in macOS Sequoia and its predecessors, and equivalent to slightly more than 2,000 SSD blocks. Smaller band sizes are more efficient in the space required to store the contents of a sparse bundle, but require more band files for the same quantity of data stored. Experience with older versions of macOS suggests that it’s not a good idea for the total number of band files to exceed 100,000, but I’m not aware of any recent evidence to support that.

When creating very large sparse bundles, of the order of TB, macOS now may adjust the band size to limit their number. Some users report that those sparse bundles perform better as a result, for instance when being used to store backups on a network. There don’t appear to be any advantages to using band sizes less than 8.4 MB.

Neither DropDMG nor Disk Utility currently appear to allow any control over band size; Spundle and hdiutil not only let you specify custom band sizes when creating sparse bundles, but also allow you to change band size by copying an existing sparse bundle into a new one, which could be useful if you were changing the maximum size of a sparse bundle.

Compaction

Each time a writeable APFS or HFS+ file system is mounted in a disk image, including both sparse bundle and read-write types, storage blocks that are no longer in use should be returned for reuse by the process of Trimming. This is automatic, and in the case of single-file read-write disk images, is the only maintenance that occurs.

Because sparse bundles add another layer of band files over SSD storage, they too require periodic maintenance in the process of compaction. To compact a sparse bundle, macOS scans the band files and removes those that are no longer being used by the file system in the sparse bundle. This only applies to sparse bundles with APFS or HFS+ file systems, though, and isn’t guaranteed to free up any space. One small catch is that, by default, compaction won’t take place on a notebook running on battery power; hdiutil and Spundle provide an option to override that.

Space efficiency in use

I’ve been unable to find any objective comparison between space efficiency of modern read-write disk images and sparse bundles, so have performed my own on a USB4 external SSD connected to my Mac Studio M1 Max. I created two test cases of 125 GB images on a freshly-formatted APFS volume, one a sparse bundle with the default 8.4 MB band size, the other a read-write disk image (UDRW). In macOS Sequoia 15.0.1, the initial size they took on disk was 14.8 MB for the sparse bundle, and 336 MB for the disk image.

Using my utility Stibium, I then wrote and deleted a series of files in each, as follows:

  1. 160 files of 2 MB to 2 GB written totalling 53 GB
  2. 40 largest of those, sizes 600 MB to 2 GB, were then deleted, leaving 120
  3. another 160 files written totalling 53 GB
  4. 40 largest of those deleted, leaving a total of 240
  5. another 160 files written totalling 53 GB
  6. 40 largest of those deleted, leaving a total of 360
  7. another 160 files written totalling 53 GB
  8. 40 largest of those deleted, leaving a total of 480
  9. another 160 files written totalling 53 GB, leaving a total of 640 files.

At that point, the sparse bundle had 104.08 GB stored in 12,408 band files, and the read-write disk image had 91.63 GB stored in its single file. I then deleted all files and folders in each of the images, and unmounted and remounted each image several times to ensure Trimming had completed. The sparse bundle then had 1.12 GB in 135 band files, and offered 124.66 GB of free space when mounted; the read-write disk image was slightly smaller at 937.9 MB on disk and offered exactly the same amount of free space when mounted.

Compacting the empty sparse bundle was reported as reclaiming 52.3 MB, and reduced the disk space taken by the band files slightly to 1.06 GB while retaining the same number of bands, and still offering 124.66 GB of free space when mounted.

In this case, following writing 800 files totalling over 250 GB, and deleting a total of 160 of them, compaction made a remarkably small difference to the free space returned following removal of all files. There is also very little difference in the space efficiency of sparse bundles and modern read-write disk images.

One consistent difference observed throughout was in write speeds: they remained constant at 3.2 GB/s for the sparse bundle, and 1.0 GB/s for the read-write disk image, as reported here previously.

Conclusions

  • Sparse bundles are more complicated than read-write disk images (UDRW), with band size to be set, and compaction to be performed.
  • Default band size appears to work well, and manually setting band size should seldom be necessary.
  • Both types appear highly efficient in their use of disk space, with only small differences between them.
  • Although it might be important to compact sparse bundles in some cases, the amount of free space returned by compaction is unlikely to be significant in many circumstances.

Previous articles

Introduction
Tools
How read-write disk images have gone sparse
Performance

Disk Images: Performance

By: hoakley
16 October 2024 at 14:30

Over the last few years, the performance of disk images has been keenly debated. In some cases, writing to a disk image proceeds at a snail’s pace, but this has appeared unpredictable. Over two years ago, I reported sometimes dismal write performance to disk images, summarised in the table below.

diskimages13

This article presents new results from tests performed using macOS 15.0.1 Sequoia, which should give a clearer picture of what performance to expect now.

Methods

Previous work highlighted discrepancies depending on how tests were performed, whether on freshly made disk images, or on those that had already been mounted and written to. The following protocol was used throughout these new tests:

  1. A 100 GB APFS disk image was created using Disk Utility, which automatically mounts the disk image on completion.
  2. A single folder was created on the mounted disk image, then it was unmounted.
  3. After a few seconds, the disk image was mounted again by double-clicking it in the Finder, and was left mounted for at least 10 seconds before performing any tests. That should ensure read-write disk images are converted into sparse file format, and allows time for Trimming.
  4. My utility Stibium then wrote 160 test files ranging in size from 2 MB to 2 GB in randomised order, a total of just over 53 GB, to the test folder.
  5. Stibium then read those files back, in the same randomised order.
  6. The disk image was then unmounted, its size checked, and it was trashed.

All tests were performed on a Mac Studio M1 Max, using its 2 TB internal SSD, and an external Samsung 980 Pro 2 TB SSD in an OWC Express 1M2 enclosure running in USB4 mode.

Results

These are summarised in the table below.

xferdiskimage24

Read speeds for sparse bundles and read-write disk images were high, whether the container was encrypted or not. On the internal SSD, encryption resulted in read speeds about 1 GB/s lower than those for unencrypted tests, but differences for the external SSD were small and not consistent.

Write speeds were only high for sparse bundles, where they showed similar effects with encryption. Read-write disk images showed write speeds consistently about 1 GB/s, whether on the internal or external SSD, and regardless of encryption.

When unencrypted, read and write speeds for sparse (disk) images were also slower. Although faster than read-write disk images when writing to the internal SSD, read speed was around 2.2 GB/s for both. Results for encrypted sparse images were by far the worst of all the tests performed, and ranged between 0.08 to 0.5 GB/s.

Surprisingly good results were obtained from a new-style virtual machine with FileVault enabled in its disk image. Although previous tests had found read and write speeds of 4.4 and 0.7 GB/s respectively, the Sequoia VM achieved 5.9 and 4.5 GB/s.

Which disk image?

On grounds of performance only, the fastest and most consistent type of disk image is a sparse bundle (UDSB). Although on fast internal SSDs there is a reduction in read and write speeds of about 1 GB/s when encrypted using 256-bit AES, no such reduction should be seen on fast external SSDs.

On read speed alone, a read-write disk image is slightly faster than a sparse bundle, but its write speed is limited to 1 GB/s. For disk images that are more frequently read from than written to, read-write disk images should be almost as performant as sparse bundles.

However, sparse (disk) images delivered weakest performance, being particularly slow when encrypted. Compared with previous results from 2022, unencrypted write performance has improved, from 0.9 to 2.0 GB/s, but their use still can’t be recommended.

Performance range

It’s hard to explain how three different types of disk image can differ so widely in their performance. Using the same container encryption, write speeds ranged from 0.08 to 3.2 GB/s, for a sparse image and sparse bundle, on an external SSD with a native write speed of 3.2 GB/s. It’s almost as if sparse images are being deprecated by neglect.

Currently, excellent performance is also delivered by FileVault images used by Apple’s lightweight virtualisation on Apple silicon. The contrast is great between its 4.5 GB/s write speed and that of an encrypted sparse image at 0.1 GB/s, a factor of 45 when running on identical hardware.

Recommendations

  • For general use, sparse bundles (UDSB) are to be preferred for their consistently good read and write performance, whether encrypted or unencrypted.
  • When good write speed is less important, read-write disk images (UDRW) can be used, although their write performance is comparable to that of USB 3.2 Gen 2 at 10 Gb/s and no faster.
  • Sparse (disk) images (UDSP) are to be avoided, particularly when encrypted, as they’re likely to give disappointing performance.
  • Encrypting UDSB or UDRW disk images adds little if any performance overhead, and should be used whenever needed.

Previous articles

Introduction
Tools
How read-write disk images have gone sparse

❌
❌