Canada Plans to Fast-Track Immigration for US H1-B Visa Holders in New Talent Drive

© Wa Lone/Reuters

© Wa Lone/Reuters
For the last seven years or so there have been many folk complaining that Spotlight local search hasn’t been finding the files they know are there. Many have resorted to repeatedly rebuilding its indexes, usually without success. Last week, thanks to Jürgen, Drew, aldous and others who have contributed, we have discovered one cause. A bug that appears to have been introduced in macOS Mojave, and is still present in Tahoe 26.0.1, that prevents Spotlight from indexing any of the contents of plain text files that start with certain characters.
Jürgen stumbled across the first example, with files starting with the two capital letters LG. At the time, that seemed extremely unusual and unlikely to affect many files. Then Drew added HPA and Draw to the list of forbidden characters. What looked like a rare event was becoming increasingly commonplace, and that list can only grow. How many indexing failures it could account for is impossible to guess.
Piecing together the evidence, it looks like this bug is inside the standard macOS RichText.mdimporter, now embedded in the Signed System Volume in /System/Library/Spotlight and at version 6.9 (350), as it has been since Sonoma (Ventura 13.7.8 has build 345.60.106, although that also suffers this bug). What happens is that saving a text file starting with forbidden characters correctly triggers Spotlight’s indexing service. That identifies the file as having the UTI public.plain-text and hands it over for its contents to be indexed. But the indexer inspects those first few characters, decides it’s a different type of file altogether, and promptly returns an error 4864 for an NSCoderReadCorruptError without going any further.
Apart from the text content not being added to Spotlight’s indexes, and a few lines buried in the Unified log, there are no indications of anything going wrong. If you test the importer usingmdimport -t -d3 filename
the file appears to import correctly, but that command doesn’t give any insight into the import of its contents, only standard attributes such as the filename that are indexed separately.
It was Drew who first suggested a plausible reason for this failure, confirmed by aldous: prior to attempting to index the text contents, Spotlight’s service was using a completely different method to check the type of the contents, the ‘magic’ database used by the file(1) command.
file(1) is an old Unix utility dating back to 1973 or earlier, operating independently of UTIs that were adopted in Mac OS X 10.4 Tiger 20 years ago. Rather than relying on a type assigned to a file, it ‘sniffs’ the contents, particularly the first few bytes of data, and uses a sprawling set of ad hoc rules to guess the file type. It turns out that files starting with the characters Draw were characteristic of a binary vector graphics format used by the !Draw app for RISC OS 2 in 1989. Rather than believing the file’s UTI for one of the most common types of files in macOS, Spotlight’s indexer therefore decided that it was trying to import file data that must now be as rare as hens’ teeth, and wouldn’t go any further.
If you’re sceptical about this coincidence, open the acorn magic data in /usr/share/file/magic in a text editor, and you’ll see the file opening string of Draw identified as RISC OS Draw file data. There are 332 other magic data files containing similar rules for identifying file types. I leave it as an exercise to the Unix wizard to build a list of all those that could cause similar problems with Spotlight indexing.
When this bug hunt started and it affected just LG and HPA, it was fairly esoteric and faintly amusing, at least as long as you didn’t write about your LG TV, high pressure air or Horizontal Pod Autoscaling. When Draw was added, and all those 333 magic files piled in, I realised how extensive this could be, and how little testing can be performed on Spotlight indexing and search.
Given that about eight years ago an Apple engineer wrote code for the RichText.mdimporter in macOS that introduced testing against some or all of the magic database, wouldn’t you have thought they’d test and debug that against test cases, such as text files starting with characters (mis)recognised by magic rules? And maybe occasionally over subsequent years and new versions of macOS, wouldn’t revised versions of the importer be tested again?
Apple likes Spotlight to be opaque to the user, for it to ‘just work’. There’s almost no documentation even for developers, and tools provided are strictly limited in what they can do, as demonstrated here in the case of mdimport. That’s all very well until Spotlight doesn’t work and no one outside Apple can do anything about it. Third-parties can’t even write custom mdimporters to do the job properly, as those bundled in macOS take priority.
If this was the first time that Spotlight indexing had let us down, I might feel more charitable. But between macOS Catalina 10.15.6 in July 2020 and Big Sur 11.3 in April 2021 macOS was incapable of indexing the content of any Rich Text files. There are still many documents that haven’t been indexed as a result. Those whose contents haven’t been indexed as a result of this bug will similarly be excluded from search until they too are reindexed by a fixed mdimporter. For Intel Macs that won’t be supported by macOS 27, that could well be forever.
There’s a bug in Spotlight that can prevent it from indexing any of the contents of susceptible text files. This has been present since macOS 13 Ventura if not before, and is still present in Tahoe 26.0.1. I didn’t discover this myself, though: it was reported to me by Jürgen, to whom full credit is due. It’s also one of the strangest bugs I’ve come across, and all depends on two letters.
To demonstrate this bug, all you need is a single UTF-8 plain text file, created by TextEdit or any other app capable of saving plain text. Start the text with the two characters L and G, both in capitals. Then add one or more distinctive words, such asLG syzygy
Save that file to a folder that you know is indexed and searched by Spotlight, then a few seconds later try searching for the word syzygy in its contents. Extend this as much as you want, maybe appending the whole of one of Charles Dickens’ novels, but no matter how you search for its contents, that file will never be found. If you want to get more serious, use that text file in my Spotlight test app SpotTest, and it will also be unable to find that file.
This only works with plain text files, not Rich Text, PDF or HTML. It’s also sensitive to those two letters. Set one of them in lowercase, preface them with a space, or substitute a different letter, and the contents of that file will then be indexed correctly and searchable as normal.
I have tested this in virtual machines going back as far as macOS 13 Ventura, and it’s present in them all. If you have access to an earlier version of macOS, I’d be interested to know whether it affects that as well.
The two UTF-8 characters concerned, 4c 47, don’t appear to be anything special that could be misinterpreted.
Although it’s not easy to distinguish failure to index from search errors, saving a test file does result in repeated reports of an error that could cause Spotlight to fail when trying to index the file, for example the log entries30.946740 mdwrite Decoding error: Error Domain=NSCocoaErrorDomain Code=4864 UserInfo={NSDebugDescription=[private]} for [private]
30.951004 mds Decoding error: Error Domain=NSCocoaErrorDomain Code=4864 UserInfo={NSDebugDescription=[private]} for [private]
Error code 4864 is NSCoderReadCorruptError, implying that the presence of those two characters at the start of a text file may be triggering a bug in RichText.mdimporter, the importer module shipped in macOS that’s responsible for indexing plain text files.
My current hypothesis is therefore that text files starting with the characters LG are failing to have their contents indexed correctly because of a bug in RichText.mdimporter.
This isn’t the first bug in the RichText.mdimporter. In macOS Catalina 10.15.6, the same mdimporter (then build 319.60.100) introduced a bug that broke indexing of Rich Text (RTF) files. That was perpetuated through early releases of Big Sur until it was finally fixed in RichText.mdimporter build 326.11 in Big Sur 11.3.
Because text files starting with the characters LG are exceedingly unusual, this bug appears to have been left in RichText.mdimporter for a great deal longer.
I will be reporting this to Apple in Feedback later this month. Please feel free to file your own Feedback if you can spare the time.
LG cannot be searched for their contents.I’m very grateful to Jürgen for drawing this to my attention.
The greatest challenge in using the Unified log is how to navigate its many thousands of entries, to find those you want to read. Success depends on the combination of two aids: time and waypoints (or landmarks).
No matter how you obtain log extracts, you need to know when to look for those entries. The more precisely you can work out the time of interest, the quicker and easier it will be to locate the entries you’re interested in. While the log command offers alternatives, LogUI works throughout using the local time applicable when you access the log, allowing for your current time zone and any seasonal adjustment to it, when accessing the live log in that Mac.
However, the underlying times given in log extracts are those recorded by the Mac or device whose log you’re accessing. If its system clock was five minutes slow when those entries were written to its log, then you need to allow for that. For example, when I first started my Mac yesterday its clock might have been 1 minute slow. An event that occurred at 10:56 yesterday by the room clock would therefore appear in the log entries for 10:55.
One important time you can discover is the boot time of the Mac. Mints offers a Boot button to retrieve boot times over the last 24 hours. If the logs were written by a different Mac or device, then you’ll need to search for the time of that last boot. Fortunately the first two log entries are easily recognised:11:41:37.562774+0100 === system boot: D3CEA9B4-F045-434D-8D12-C6E794A02F14
11:41:42.758780+0100 kprintf initialized
The long gap between the first two entries is accounted for by the firmware phase of the boot process. If necessary you can search for a message containing === (three equals signs). Mints provides the time of the first of those for each boot, and its UUID.
There are two occasions when time can become confusing, when clock corrections are applied, and when clocks are moved forward or back to add or remove summer or seasonal time changes. Fortunately the latter only change twice each year, although when they do, you really don’t want to see what happened in the log, and those changes aren’t even applied at a predictable time.
Clock corrections, like kernel boot, are readily found by the === text in their message. They normally happen in pairs, with the first correction the larger, and the second often far smaller. Here’s an example seen in consecutive log entries:08:26:16.140474+0100 /usr/libexec/sandboxd[80] ==> com.apple.sandboxd
08:26:10.043353+0100 === system wallclock time adjusted
08:26:10.044335+0100 Sandbox: distnoted(72) deny(1) file-read-metadata /private
08:26:10.044601+0100 2 duplicate reports for Sandbox: distnoted(72) deny(1) file-read-metadata /private
08:26:10.044606+0100 Sandbox: distnoted(72) deny(1) file-read-metadata /Library
08:26:10.089204+0100 === system wallclock time adjusted
08:26:10.091850+0100 started normally
The first adjustment dropped the clock back by 6.1 seconds, from 08:26:16.140474 to 08:26:10.043353. This means that you’ll see times of 08:26:12 both before the correction and afterwards. The second adjustment, from 08:26:10.044606 to 08:26:10.089204, was far smaller at 0.045 seconds, and at least went in the right direction.
The most substantial clock corrections are made shortly after booting. Although macOS does make them later, the size of those should be smaller.
Even working with times resolved to the second, those can still leave you browsing thousands of log entries. To locate more precisely you need details of one or more entries that will be sufficiently distinctive to focus in on a few dozen. These are waypoints for navigation.
LogUI provides three methods for locating these waypoints:
These are best used when the time period of your extract needs to be relatively long, so would return a large number of entries. For example, if you can only narrow the time down to several minutes, and are looking for the time that a specific app was launched, you can look for that app’s job description when it’s created and written to the log by RunningBoard.*
Over a period of two minutes, RunningBoard might write thousands of entries in the log, so looking for your app’s job description among them would be time-consuming. Set the start time and period to cover the whole of the time you want to search, then set a predicate for the subsystem com.apple.runningboard.
When LogUI fetches that log extract, there might still be over 2,000 entries, so now is the time to apply search text to filter those further.
To filter those 2,000 entries and show only those containing job descriptions created by RunningBoard, enter the text constructed job in LogUI’s search box, with its menu set to Messages, and press Return. You’ll now see that list reduced to just a handful, and looking through them you can discover exactly when your waypoint occurred.
My example for this article starts with a period of just 2 minutes, in which there were more than 100,000 log entries.
Using the com.apple.runningboard predicate whittled those down to 13,443 entries.
Searching within those for constructed job left me with just 8 entries to look through.
Sometimes you can’t devise the right combination of predicate and search filter to discover what you’re looking for, which might be an error reported in a subsystem or a process that you can’t identify. One good way forward is to narrow your log extract as much as you can, then save the extract as Rich Text, open that in a suitable editor, and search through it for the word error. That will discover every log entry containing the word error anywhere, rather than confining it to the message text.
Armed with your waypoint and the exact time of its entry in the log, you can now set that as the start time, set a period of a couple of seconds, and get a full log extract containing all the detail you might need. This should give you further clues to allow you to move through time using predicates and search filters to discover what happened. This is much quicker and less frustrating than trying to scan through thousands of log entries in search of vague clues.
* Sadly, the days of being able to access freely RunningBoard’s informative job descriptions in the log are over. As of macOS Tahoe, all you’ll see is the dreaded <private> of censorship. If you want to examine these now, you’ll have to remove log privacy protection first. Thanks, Apple, for providing such useful tools then rendering them next to useless.