Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

What happens to images when you disable Live Text?

By: hoakley
19 August 2025 at 14:30

For many of us, Live Text and Visual Look Up are real boons, making it simple to copy text from images, to perform Optical Character Recognition (OCR) on images of text, and to identify objects of interest. They are also used by the mediaanalysisd service to extract text and object identifications for indexing by Spotlight, making those image contents searchable. Although the latter can be disabled generally or for specific folders in Spotlight settings, there appears to be only one control over Live Text, and none for Visual Look Up. This article examines what that single control does, using log extracts obtained from a Mac mini M4 Pro running macOS Sequoia 15.6. It follows my recent article on how these features work.

Setting

Live Text, which is enabled by default, can be disabled in Language & Region settings. When that is turned off, opening an image containing text in Preview no longer makes any text selectable. However, Visual Look Up still works as normal.

Textual images

When an image containing recognisable text but no other objects is opened in Preview, the VisionKit subsystem is still activated soon after the image is loaded. VisionKit initially reports that the “device” supports analysis, but immediately clarifies that to
Device does not support image analysis, but does support Visual Search, limiting to just Visual Search.
It then starts a VKImageAnalyzerProcessRequestEvent with a MAD parse request. That leads to a Visual Search Gating Task being run, and the Apple Neural Engine (ANE) and all CPU P cores are prepared for that.

Less than 0.1 second later, the end of the VKImageAnalyzerProcessRequestEvent is reported, and VisionKit returns an analysis that no image segments merits further analysis. Preview’s ⓘ Info button remains in its normal state, and clicking on that doesn’t alter the image displayed.

Images with other objects

An image containing potentially recognisable objects doesn’t stop there. If VisionKit returns an analysis indicating Visual Search could extract objects from the image, the ⓘ Info button adds stars and waits for the user to open the Info window.

VisionKit reports in the log
Setting Active Interaction Types: [private], [private]
then that it
DidShowVisualSearchHints with invocationType: VisualSearchHintsActivated, id: 1

When one of those Visual Search Hints is clicked, the Lookup View is prepared, followed by a notice from LookupViewService that it’s Changing state from LVSDisplayStateConfigured to LVSDisplayStateSearching. That leads to VisionKit making a Visual Search request from mediaanalysisd.

After the Apple Neural Engine (ANE) is run to progress that, successful search results in PegasusKit making its internet connection to identify the object, exactly as it does when Live Text is enabled and text has been recovered from the image:
Querying https: // api-glb-aeuw1b.smoot.apple.com/apple.parsec.visualsearch.v2.VisualSearch/VisualSearch with request (requestId: [UUID]) : (headers: ["Content-Type": "application/grpc+proto", "grpc-encoding": "gzip", "grpc-accept-encoding": "gzip", "grpc-message-type": "apple.parsec.visualsearch.v2.VisualSearchRequest", "X-Apple-RequestId": "[UUID]", "User-Agent": "PegasusKit/1 (Mac16,11; macOS 15.6 24G84) visualintelligence/1", "grpc-timeout": "10S"]) [private]

The ANE is finally cleaned up and shut down as the search results are displayed in the Lookup View.

Conclusions

  • When Live Text is disabled in Language & Region settings, images are still analysed when they are opened, to determine if they’re likely to contain objects that can be recognised in Visual Search for Visual Look Up.
  • If there are no such objects detected, VisionKit proceeds no further.
  • If there are suitable objects, mediaanalysisd and VisionKit proceed to identify them using Visual Search Hints, as normal.
  • If the user clicks on a Visual Search Hint, PegasusKit connects to Apple’s servers to identify that object and provide information for display in the Lookup View.
  • Although there is less extensive use of the ANE and CPU cores than when Live Text is enabled, neural networks are still run locally to perform a more limited image analysis.

I’m very grateful to Benjamin for pointing out this control over Live Text.

Last Week on My Mac: Drought and neural engines

By: hoakley
17 August 2025 at 15:00

If there’s one thing you can rely on about the UK weather, it’s rain. Unless you live in that narrow belt of East Anglia officially classed as semi-arid, you’ll be used to rain whatever the season or forecast.

The last time we had a long dry summer was 1976, when much of Northern Europe basked in sunshine from late May until the end of August. This year has proved similar, so here we are again, dry as a bone, banned from using hosepipes except to wash down horses, wondering when the inevitable floods will start. In 1976, dry weather broke but a couple of weeks after the appointment of a Minister for Drought, whose brief was promptly extended to cover the ensuing inundation.

With this shortage of water, it might seem surprising that over the next five years around a hundred new data centres are expected to be built in the UK. These are the data centres we all want to support our AI chatbots and cloud services, but nobody wants in their neighbourhood. No one has explained where all their power and water supplies will come from, although apparently ten new reservoirs are already being built in anticipation.

The best piece of advice we have been given to help our shortage of water is to delete all our old emails and photos. Apparently by reducing what we have stored in the cloud, those data centres won’t get so hot, and will consume less water. Really?

Meanwhile back on planet Earth, last week I was studying the log entries made on behalf of the Apple Neural Engine, ANE, inside my Mac mini’s M4 Pro chip, when it was running local models to support Live Text and Visual Look Up. We now take these features for granted, and maybe aren’t even aware of using them, or of what our Mac’s ANE is doing. Yet every Apple silicon Mac sold over the last five years has the dedicated hardware possessed by only a small minority of PCs. They can, of course, use other hardware including GPUs, well known for their excessive power and cooling demands. For many the only solution is to go off-device and call on some of those data centres, as you do with ChatGPT, Google’s answer engine, and even Elon Musk’s Grok if you really must.

Live Text is a particularly good example of a task that can, given the right hardware, be performed entirely on-device, and at relatively low energy cost. It’s also one that many of us would rather not farm out to someone’s data centre, but keep to the privacy of our own Mac. While it does work surprisingly well on recent Intel Macs, it’s just what the ANE was intended to make sufficiently performant that it can be commonplace. Just over three years ago, before WWDC 2022, I wrote: “But if I had to put my money anywhere, it would be on the ANE working harder in the coming months and years, to our advantage.”

With so many Macs now capable of what seemed miraculous in the recent past, we’re only going to see more apps taking advantage of those millions of ANEs. Developers are already starting to use Apple’s new Foundation Models supported by macOS 26 Tahoe, all of which run on-device rather than in those data centres. In case you’re concerned about the ethics of what this might be unleashing, Apple has already anticipated that in a stringent set of acceptable use requirements, that also apply to apps provided outside the App Store.

Obtaining reliable estimates of the performance and power consumption of the ANE is fraught, but I have measured them during Visual Look Up on an M1 Max (with an H11ANE), and found peak power used was 30-50 mW. According to mot’s comment to that article, when running an inference task intended to push that in an M1 Pro to the maximum, its ANE drew a maximum of 2 W. That’s frugal compared to running equivalent intensive tasks on Performance CPU cores or an Apple silicon GPU, which can readily use more than 1 W per P core.

Can someone suggest that, instead of deleting old emails and photos, we’d be better off running our favourite AI on-device using an Apple Neural Engine? I still don’t think it would do anything to help our current drought, but it could spare us a few of those projected data centres.

How to search Spotlight for Live Text and objects in images

By: hoakley
13 August 2025 at 14:30

Spotlight has been able to find text extracted from images using Live Text, and the names of objects recognised using Visual Look Up, for some years now. This article considers how you can and cannot search for those. Although this might seem obvious, it’s more confusing than it appears and could mislead you into thinking that Spotlight indexing or search isn’t working.

As detailed in yesterday’s account of Live Text, text recognition in images uses a lexicon to match words rather than proceeding in single characters. Terms assigned to recognised objects are also words. Thus, when searching for either type you should use words as much as possible to increase the chances of success.

Global Spotlight 🔍

Type a word like cattle into Spotlight’s search window and you can expect to see a full selection of documents and images containing the term or cattle objects. Those include images containing the word, and images containing objects identified as cattle, but don’t include images in PDF files, as they’re not analysed by mediaanalysisd, so don’t undergo character or object recognition in the same way that regular images like JPEGs do.

The Finder’s Find

Open a new window in the Finder and turn it into a Spotlight search using the Finder’s File menu Find command. In its search box at the top right, type in a word like cattle and in the popup menu select the lower option, Content contains. Press Return and the search box will now display ANY cattle. Then set the Kind to Image in the toolbar above search results, to narrow results down to image files. You should then see a full listing of image files that either contain the word cattle, or contain objects identified by image analysis as being cattle. Note how many of those appear.

Reconfigure the search so the search box is empty, and there are two rows of search settings: the first can remain the same as Kind is Image, but set the second to Contents contains cattle. Those images containing objects identified as cattle will now vanish, leaving images containing the word cattle still listed.

To understand the difference between these, you can save those two Find windows and read the underlying terms used for each search. The search that returned text obtained by both Live Text and Visual Look Up used
(((** = "cattle*"cdw)) && (_kMDItemGroupId = 13))
while the one excluding Visual Look Up used
(((kMDItemTextContent = "cattle*"cdw)) && (_kMDItemGroupId = 13))
instead. We can transfer those to the mdfind command tool to explore further.

mdfind

To use those as search predicates with mdfind we’ll translate them into more general form,
mdfind "(** == 'cattle*'cdw) && (kMDItemContentTypeTree == 'public.image'cd)"
should return both Live Text and Visual Look Up, while
mdfind "(kMDItemTextContent == "cattle*"cdw) && (kMDItemContentTypeTree == 'public.image'cd)"
only returns Live Text results.

The term (** == 'cattle*'cdw) has a special meaning because of its wild card **, and will return any match found in the metadata and contents of files. kMDItemTextContent is similar, but confined to text content, which doesn’t include the names of objects recognised in an image, and Apple doesn’t reveal whether there’s an equivalent that does.

Code search

Although apps can call mdfind to perform searches, they normally use NSMetadataQuery with an NSPredicate instead. That isn’t allowed to use a predicate like
** ==[cdw] "cattle"
so can’t search for objects identified using Visual Look Up. When it uses the substitute of
kMDItemTextContent ==[cdw] "cattle"
it also fails to find text obtained using Live Text. So the only way to search for recovered text in a compiled app is to call mdfind.

Timing

Searching for text obtained using Live Text, or object labels obtained using Visual Look Up, depends entirely on those being added to the Spotlight indexes on the volume. Observations of the log demonstrate just how quickly normal indexing by mdworker processes takes place. Here’s an example for a screenshot:
06.148292 com.apple.screencapture Write screenshot to temporary location
06.151242 [0x6000009bc4b0] activating connection: mach=true listener=false peer=false name=com.apple.metadata.mds
06.162302 com.apple.screencapture Moving screenshot to final location
06.169565 user/501/com.apple.mdworker.shared.1E000000-0600-0000-0000-000000000000 internal event: WILL_SPAWN, code = 0
06.198868 com.apple.DiskArbitration.diskarbitrationd mdworker_shared [7266]:118071 -> diskarbitrationd [380]
06.226997 kernel Sandbox apply: mdworker_shared[7266]

In less than 0.1 second an mdworker process has been launched and granted the sandbox it uses to generate metadata and content for Spotlight’s indexes. Unfortunately, that doesn’t include any Live Text or Visual Look Up content, which are generated separately by mediaanalysisd later. It’s hard to estimate how much later, although you shouldn’t expect to find such recovered text for several hours or days, depending on the opportunities for mediaanalysisd to perform background image analysis for this purpose.

Summary

  • Words recovered from images (not those in PDF files, though) and objects recognised in them can be found by Spotlight search. For best results, choose words rather than letter fragments for search terms.
  • Global Spotlight gives full access to both types.
  • The Finder’s Find gives full access to both only when the search term is entered in the search box as Content contains, otherwise it may exclude objects recognised.
  • mdfind can give full access to both types only when using a wildcard term such as (** == 'cattle*'cdw).
  • NSMetadataQuery currently appears unable to access either type. Call mdfind instead.
  • The delay before either type is added to the volume Spotlight indexes can be hours or days.

How do Live Text and Visual Look Up work now?

By: hoakley
12 August 2025 at 14:30

Live Text and Visual Look Up are recent features in macOS, first appearing in Monterey nearly four years ago. As that immediately followed Apple’s short debacle over its now-abandoned intention to scan images for CSAM, most concerns have been over whether these features send details of images to Apple.

Although recent Intel Macs also support both these features, they don’t have the special hardware to accelerate them, so are far slower. For this walkthrough, I’ll only present information from Apple silicon Macs, in particular for the M4 Pro chip in a Mac mini.

Initiation

When an image is opened from disk, the VisionKit system starts up early, often within 0.1 second of it starting to open. Its initial goal is to segment the image according to its content, and identify whether there’s any part of it that could provide text. If there is, then Live Text is run first so you can select and use that as quickly as possible.

In the log, first mention of this comes from com.apple.VisionKit announcing
Setting DDTypes: All, [private]
Cancelling all requests: [private]
Signpost Begin: "VKImageAnalyzerProcessRequestEvent"

It then adds a request to Mad Interface, that’s the mediaanalysisd service, with a total method return time, and that’s processed with another signpost (a regular log entry rather than a Signpost):
Signpost Begin: "VisionKit MAD Parse Request"

com.apple.mediaanalysis receives that request
[MADServicePublic] Received on-demand image processing request (CVPixelBuffer) with MADRequestID
then runs that
Running task VCPMADServiceImageProcessingTask

This is run at a high Quality of Service of 25, termed userInitiated, for tasks the user needs to complete to be able to use an app, and is scheduled to run in the foreground. Next Espresso creates a plan for a neural network to perform segmentation analysis, and the Apple Neural Engine, ANE, is prepared to load and run that model. There’s then a long series of log entries posted for the ANE detailing its preparations. As this proceeds, segmentation may be adjusted and the model run repeatedly. This can involve CoreML, TextRecognition, Espresso and the ANE, which can appear in long series of log entries.

Text recognition

With segmentation analysis looking promising for the successful recognition of text in the image, LanguageModeling prepares its model by loading linguistic data, marked by com.apple.LanguageModeling reporting
Creating CompositeLanguageModel ([private]) for locale(s) ([private]): [private]
NgramModel: Loaded language model: [private]

and the appearance of com.apple.Lexicon. An n-gram is an ordered sequence of symbols, that could range from letters to whole words, and the lexicon depends on the language locales. In my case, two locales are used, en (generic English) and en_US (US English).

At this point, mediaanalysisd declares the VCPMADVIDocumentRecognitionTask is complete, and runs a VCPMADVIVisualSearchGatingTask, involving com.apple.triald, TRIClient, PegasusKit and Espresso again. Another neural network is loaded into the ANE, and is run until the VCPMADVIVisualSearchGatingTask is complete and text returned.

The next task is for the translationd service to perform any translations required. If that’s not needed, VisionKit reports
Translation Check completed with result: NO, [private]
These may be repeated with other image segments until all probable text has been discovered and recognised. At that stage, recognised text is fully accessible to the user.

Object recognition

Further image analysis is then undertaken on segments of interest, to support Visual Look Up. Successful completion of that phase is signalled in Preview by the addition of two small stars at the upper left of its ⓘ Info tool. That indicates objects of interest can be identified by clicking on that tool, and isn’t offered when only Live Text has been successful. VisionKit terms those small buttons Visual Search Hints, and remains paused until the user clicks on one.

Visual Search Hints

Clicking on a Visual Search Hint then engages the LookupViewService, and changes the state from Configured to Searching. VisionKit records another
Signpost Begin: "VisionKit MAD VisualSearch Request"
and submits a request to MAD for image processing to be performed. If necessary, the kernel then brings all P cores online in preparation, and the ANE is put to work with a new model.

PegasusKit then makes the first off-device connection for assistance in visual search:
Querying https: // api-glb-aeus2a.smoot.apple.com/apple.parsec.visualsearch.v2.VisualSearch/VisualSearch with request (requestId: [UUID]) : (headers: ["grpc-message-type": "apple.parsec.visualsearch.v2.VisualSearchRequest", "Content-Type": "application/grpc+proto", "grpc-encoding": "gzip", "grpc-accept-encoding": "gzip", "X-Apple-RequestId": "[UUID]", "grpc-timeout": "10S", "User-Agent": "PegasusKit/1 (Mac16,11; macOS 15.6 24G84) visualintelligence/1"]) [private]

When the visual search task completes, information is displayed about the object of interest in a floating window. Visual Look Up is then complete, and in the absence of any further demands, the ANE may be shut down to conserve energy, and any inactive CPU cluster likewise:
ANE0: power_off_hardware: Powering off... done
PE_cpu_power_disable>turning off power to cluster 2

Key points

  • Image analysis starts shortly after the image is loaded.
  • Central components are VisionKit and mediaanalysisd, MAD.
  • In Apple silicon Macs, extensive use is made of the Apple Neural Engine throughout, for neural network modelling.
  • Most if not all is run at high QoS of 25 and in the foreground, for performance.
  • Segmentation analysis identifies areas that might contain recoverable text.
  • Segments are then analysed, using language modelling for appropriate locales and languages, and lexicons, to return words rather than fragmented characters.
  • When Live Text is ready, segments are then analysed for recognisable objects. When that’s complete, each is marked by a Visual Search Hint.
  • Clicking on a Visual Search Hint initiates a network connection to provide information about that object, displayed in a floating window.

Last Week on My Mac: Spotlight sorcery

By: hoakley
10 August 2025 at 15:00

According to scientific tradition, we first observe then experiment. If you proceed to the latter before you understand how a system behaves, then you’re likely to labour under misapprehensions and your trials can become tribulations. Only when a system is thoroughly opaque and mysterious can we risk attempting both together.

That’s the case for Spotlight, which despite its name does everything but shine any light on its mechanisms. It presents itself in several guises, as a combination of web and local search (🔍), as local search using terms limited in their logical operators (Finder’s Find), as full-blown predicate-based local search (mdfind), as in-app file search (Core Spotlight), and the coder’s NSMetadataQuery and predicates. It relies on indexes scattered across hundreds of binary files, and runs multiple processes, while writing next to nothing in the log.

Last week’s code-doodling has been devoted to turning the Spotlight features in Mints into a separate app, SpotTest, so I can extend them to allow testing of different volumes, and search for text that has been derived from images. Those are proving thorny because of Spotlight’s unpredictable behaviour across different Macs running Sequoia.

Every week I search for screenshots to illustrate another article on Mac history. When using my old iMac Pro where most of them are stored, Spotlight will find many images containing search terms from the text shown within them, even from ancient QuickDraw PICT images, demonstrating that text is being recovered using Live Text’s optical character recognition. When I try to repeat this using test images on an Apple silicon Mac, Spotlight seems unable to recognise any such recovered text.

Image analysis on Macs has a stormy history. In a well-intentioned gaffe four years ago, Apple shocked us when it declared it was intending to check our images for CSAM content. Although it eventually dropped that idea, there have been rumours ever since about our Macs secretly looking through our images and reporting back to Apple. It didn’t help that at the same time Apple announced Live Text as one of the new features of macOS Monterey, and brought further image analysis in Visual Look Up.

Although I looked at this in detail, it’s hard to prove a negative, and every so often I’m confronted by someone who remains convinced that Apple is monitoring the images on their Mac. I was thus dragged back to reconsider it in macOS Sonoma. What I didn’t consider at that time was how text derived from Live Text and image analysis found its way into Spotlight’s indexes, which forms part of my quest in SpotTest.

This doesn’t of course apply to images in PDF documents. When I looked at those, I concluded: “If you have PDF documents that have been assembled from scans or other images without undergoing any form of text recognition, then macOS currently can’t index any text that you may still be able to extract using Live Text. If you want to make the text content of a PDF document searchable, then you must ensure that it contains its own text content.” I reiterated that in a later overview.

My old images aren’t PDFs but QuickDraw PICTs, TIFFs, PNGs and JPEGs, many from more than 20 years ago. When the circumstances are right, macOS quietly runs Live Text over them and stores any text it recovers in Spotlight’s indexes. It also analyses each image for recognisable objects, and adds those too. These happen more slowly than regular content indexing by mdworker, some considerable time after the image has been created, and have nothing whatsoever to do with our viewing those images in QuickLook or the Finder, or even using Live Text or Visual Look Up ourselves.

There are deeper problems to come. Among them is discovering the results of image recognition as can be revealed in the command line using a search such as
mdfind "(** == 'cattle*'cdw) && (kMDItemContentTypeTree == 'public.image'cd)"
to discover all images that have been recognised as containing cattle. There’s no equivalent of the first part of that when calling NSMetadataQuery from Swift code, and a predicate of
kMDItemTextContent CONTAINS[cd] \"cattle\"
will only discover text recovered by Live Text, not the names of objects recognised within an image.

What started as a quick doodle is now bogged down in the quirks of Spotlight, which defies the scientific method. Perhaps it’s time for a little sorcery.

sandysmedea
Frederick Sandys (1829–1904), Medea (1866-68), oil on wood panel with gilded background, 61.2 x 45.6 cm, Birmingham Museum and Art Gallery, Birmingham England. Wikimedia Commons.

❌
❌