Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

How to tell if Spotlight has indexed a file correctly

By: hoakley
22 November 2024 at 15:30

When you search but Spotlight doesn’t find what you’re looking for, you might wonder whether that’s because the file containing it hasn’t been indexed correctly. This article suggests some methods you can use to investigate this.

Spotlight relies on hidden indexes it maintains at the top level of each volume, in the .Spotlight-V100 folder. When an app or process creates a new file or changes the contents of an existing file, that’s recorded in the volume’s FSEvents database. If that file is in a location that isn’t excluded from Spotlight’s index, an mdworker process should then start adding its contents to the volume indexes. To do this, it first checks what type of file it is, by its UTI. Spotlight then looks up the correct mdimporter for that type, and uses that to generate data that Spotlight adds to that volume’s indexes as a set of attributes.

Recent versions of macOS can extract additional information from certain types of files such as images and PDFs. This includes text recognised within images using Live Text, and object recognition and classification, performed in the background by other services such as photoanalysisd. Those are slower and take significantly more processing, and are normally run after traditional mdimporter extraction.

mdimporter plugins

For many file types, these are provided as part of macOS and stored in the system, in /System/Library/Spotlight on the SSV. Importers for third-party apps may be in /Library/Spotlight or ~/Library/Spotlight, or in the /Library/Spotlight folder inside the app bundle. To discover all mdimporter plugins currently installed, use the command
mdimport -L
although that doesn’t inform you which file types they support.

To get more information, identify a file of the correct document type, normally with that type’s standard filename extension, and use mdimport to discover which mdimporter is used for that type, and what it can extract from that specific file, in a command of the form
mdimport -t -d3 [filepath]
where [filepath] is the path of that file. You can vary the digit used in the -d3 option from 1-3, with larger numbers delivering more detail. For -d3 you’ll first be given the file type and mdimporter used, following which all the data extracted is given according to its Spotlight attributes:
Imported '/Users/hoakley/Documents/SpotTestA.rtf' of type 'public.rtf' with plugIn /System/Library/Spotlight/RichText.mdimporter.
37 attributes returned

followed by a long list.

If you’re still in the dark, the nuclear option is to perform a diagnostic dump similar to sysdiagnose but concentrating on Spotlight, using the command
mddiagnose -f [path]
to write the diagnostic output to the specified path. Further details are given in man mddiagnose.

Mints

My free utility Mints has a whole section devoted to testing and observing Spotlight. This includes creation of a folder of nine test files in the user’s Home folder, carrying out a test search that should find those test files, custom log analysis, and cleaning up test files afterwards. This is all fully documented in the Spotlight check and test section of its Help book.

To start the test, click on the Create Test button at a known clock time, such as 12:30:00. Wait at least ten seconds, say until 12:30:10, then click the Check Search button. Give your Mac a good 20 seconds to complete that, then open the Log Window and view entries from 12:30:00 for a period of 30 seconds, to cover both the indexing of the test files and the search. Once you’re happy that all is well, you can delete the test files.

Test files include:

  • RTF rich text
  • PDF with embedded text
  • plain UTF-8 text
  • HTML
  • Microsoft Word DOCX
  • JPEG image with the search term in its EXIF metadata
  • Numbers spreadsheet
  • Pages document
  • plain UTF-8 text with the search term in a kMDItemKeywords extended attribute.

You can also customise the test files with other formats, provided that they contain the search text syzygy999 that Mints will look for.

Exclusions

Perhaps the most common reason for Spotlight not indexing a file’s contents is that it has been excluded from doing so. This is always worth checking before making the assumption the problem is in Spotlight.

There have been three general methods of excluding folders from indexing and search, although only two of them still work reliably:

  • appending the extension .noindex to the folder name (this previously worked using .no_index instead);
  • making the folder invisible to the Finder by prefixing a dot ‘.’ to its name;
  • putting an empty file named .metadata_never_index inside the folder; that no longer works in recent macOS.

System Settings offers Spotlight Privacy settings in two sections. Items listed under Search results won’t normally prevent indexing of those items, but will block them from appearing in search results. Spotlight’s indexing exclusion list is accessed from the Search Privacy… button, where listed items won’t be indexed at all.

Summary of useful commands

  • mdimport -L lists known Spotlight mdimporter plugins.
  • mdimport -t -d3 [filepath] reports the mdimporter used for a file, and data extracted from that file.
  • mddiagnose -f [path] runs full Spotlight diagnostics.

When and how to rebuild Spotlight indexes

By: hoakley
19 November 2024 at 15:30

Forcing Spotlight’s indexes to be rebuilt has become a panacea, popularly used when anything appears amiss with Spotlight. In many cases, it’s exactly the wrong response to what’s likely to be normal indexing activity. This article explains when it can sometimes be useful, and how to make more effective use of it.

Spotlight works by searching the indexes it maintains on each volume, stored in the hidden .Spotlight-V100 folder at the top level of each searchable volume. Within that folder is a property list containing the volume configuration details, and the Store-V2 folder containing a folder named using a UUID, within which are all the files composing the indexes. As those are opaque to the user they shouldn’t be tampered with.

Indexing

The process of indexing is simplest to understand when considered for a single newly created or changed file. That change is recorded in the volume’s FSEvents database, which in turn triggers an XPC call to process that file if it’s in a location within Spotlight’s scope.

Provided the changed file isn’t in an excluded location, an mdworker process should then start adding its contents to the volume indexes. To do this, it first checks what type of file it is, in terms of its UTI. If that’s incorrect, then the remainder of the steps won’t work properly. In most cases now, that means the file must have the correct extension for its type. If it doesn’t, then mdworker won’t be able to index it correctly.

Spotlight then looks up the correct mdimporter for that type. For many file types, those are provided as part of macOS and stored in the system, in /System/Library/Spotlight on the SSV. Importers for third-party apps may be in /Library/Spotlight or ~/Library/Spotlight, or in the /Library/Spotlight folder inside the app itself. To check all mdimporter plugins currently installed, use the command
mdimport -L

Spotlight importers and mdworker itself can crash when there’s a bug, or the mdimporter encounters a malformed file. If that happens, the log normally records repeated crashes and restarts of that mdworker process. If you can identify and remove the file that’s causing those, that should allow indexing to return to normal.

Once the mdworker has extracted the data from the file, that’s added to the volume’s indexes, typically reflected in a log entry from mds_stores containing the message
compressing 5686 bytes to <private>
or similar, for each file that has had content extracted and added to the indexes. Other services involved include mdsync and mdwrite.

Recent versions of macOS can extract additional information from certain types of files such as images and PDFs. This includes text recognised within images using Live Text, and object recognition and classification, performed in the background by other services such as photoanalysisd. Those are slower and take significantly more processing, and are normally run after mdimporter extraction.

When might it be useful to rebuild indexes?

The only indication for rebuilding Spotlight indexes on a volume is when they are known to be damaged or corrupted, in which case rebuilding offers the best chance of restoring normal search function to that volume.

One way to check the functional integrity of indexes is to perform searches for known targets, a feature available in my free Mints. I added this when the system mdimporter for Rich Text had a bug that effectively made searching the contents of any RTF file impossible, thankfully a rare situation, although third-party mdimporters may not be as reliable.

In the past rebuilding indexes was often used when mdworker processes repeatedly crashed when trying to extract index data from a file. However, that relied on the assumption that those crashes wouldn’t recur. If they did, then rebuilding wouldn’t solve the problem. One way to investigate this further is to discover from the log which file is causing mdworker workers to crash, and removing the cause. This isn’t as straightforward now, as log entries no longer identify the file(s) causing the problem unless privacy protection is removed from the log.

Recently, it has become popular to force indexes to be rebuilt whenever Spotlight’s indexing processes appear to be taking a long time maintaining current indexes, on the assumption that starting that from scratch is going to be quicker than leaving them to complete. This isn’t likely to help, as it’s likely to force rebuilding of indexes that are already fully up to date, and indexing provides no information as to its progress. So there’s no way of telling whether allowing current indexing activity to complete would take another few seconds or days. However, it’s most unlikely that forcing full reindexing would ever be faster than allowing the completion of indexing that’s already in progress.

Rebuilding the index

To force a volume to be re-indexed, open Spotlight (or Siri & Spotlight) in System Settings and click the Search Privacy… or Spotlight Privacy… button at the bottom. Click the + button at the foot, select the volume and add it to the list, then click Done. Pause thirty seconds or so, click the Search Privacy… button again, select that volume in the list, and click the – button to remove it from the list. You don’t normally need to close System Settings or restart between adding and removing the volume.

If you prefer, you can instead use the mdutil command in Terminal. The command
mdutil -E /
erases the indexes on the Data volume and forces them to be rebuilt, and you can use the same option on other volumes.

As Spotlight indexes are maintained and stored on each volume for its contents, you’ll need to include each volume on which you want to be able to search files by their contents. Unless you have been able to identify which volume’s indexes merit rebuilding, you may have to rebuild indexes on every mounted volume, which can take many hours or days to complete.

The simplest way to check that re-indexing is taking place is to open Activity Monitor, and in its CPU view check that processes with names starting with md are taking plenty of CPU. These should include mds_stores, mdworker (often multiple copies) and mds itself. On Apple silicon, those processes run almost exclusively on E cores, and are usually obvious in Activity Monitor’s CPU History window.

Key points

  • Each volume may have its own Spotlight indexes, used when searching that volume’s contents.
  • Modern macOS indexes more extensive data, including text recognised within images, and types of object found within them. More advanced metadata take longer to analyse and index.
  • Rebuilding indexes is indicated if they are known to be damaged or corrupted.
  • If you don’t know which volume’s indexes require to be rebuilt, rebuilding them on all volumes can take many hours or days.
  • If the problem is in a file or mdimporter then it’s likely to recur during rebuilding indexes unless the file is identified and removed from indexing.
  • As there’s no way to determine progress in building indexes, forcing a rebuild is likely to take longer than allowing current indexing to complete.
  • Rebuilding is best triggered by adding the volume to Spotlight exclusions, then removing it again.
  • Alternatively, use mdutil -E [volume].
  • Check rebuilding is taking place using Activity Monitor.

Postscript

Several have commented that the Spotlight Search item opened from the menu bar can show indexing progress. That’s correct, but it doesn’t actually show all Spotlight indexing by any means. For example, on my M4 Mac mini, that shows only the first minute or so, then pretends that indexing is complete despite a further 10 minutes of intensive activity filling the E cores in CPU History, almost all of it the result of continuing Spotlight indexing activity. So what that progress bar shows is the period during which Spotlight search is unavailable, not the period of indexing.

Regarding additional mdutil commands, a quick read through the man page is informative. That makes it clear that the -E option “will cause each local store for the volumes indicated to be erased. The stores will be rebuilt if appropriate.” There is no need to halt indexing before doing that.

However, if you do use -i off before performing other mdutil commands, then you will need to turn indexing back on again using -i on before Spotlight will either recreate a deleted index directory or rebuild the indexes within that. There is absolutely no need to remove the index directory using -X if all you want to do is force the indexes to be rebuilt: -E is perfectly sufficient to do that.

❌
❌