What happens to images when you disable Live Text?
For many of us, Live Text and Visual Look Up are real boons, making it simple to copy text from images, to perform Optical Character Recognition (OCR) on images of text, and to identify objects of interest. They are also used by the mediaanalysisd
service to extract text and object identifications for indexing by Spotlight, making those image contents searchable. Although the latter can be disabled generally or for specific folders in Spotlight settings, there appears to be only one control over Live Text, and none for Visual Look Up. This article examines what that single control does, using log extracts obtained from a Mac mini M4 Pro running macOS Sequoia 15.6. It follows my recent article on how these features work.
Setting
Live Text, which is enabled by default, can be disabled in Language & Region settings. When that is turned off, opening an image containing text in Preview no longer makes any text selectable. However, Visual Look Up still works as normal.
Textual images
When an image containing recognisable text but no other objects is opened in Preview, the VisionKit subsystem is still activated soon after the image is loaded. VisionKit initially reports that the “device” supports analysis, but immediately clarifies that toDevice does not support image analysis, but does support Visual Search, limiting to just Visual Search.
It then starts a VKImageAnalyzerProcessRequestEvent with a MAD parse request. That leads to a Visual Search Gating Task being run, and the Apple Neural Engine (ANE) and all CPU P cores are prepared for that.
Less than 0.1 second later, the end of the VKImageAnalyzerProcessRequestEvent is reported, and VisionKit returns an analysis that no image segments merits further analysis. Preview’s ⓘ Info button remains in its normal state, and clicking on that doesn’t alter the image displayed.
Images with other objects
An image containing potentially recognisable objects doesn’t stop there. If VisionKit returns an analysis indicating Visual Search could extract objects from the image, the ⓘ Info button adds stars and waits for the user to open the Info window.
VisionKit reports in the logSetting Active Interaction Types: [private], [private]
then that itDidShowVisualSearchHints with invocationType: VisualSearchHintsActivated, id: 1
When one of those Visual Search Hints is clicked, the Lookup View is prepared, followed by a notice from LookupViewService that it’s Changing state from LVSDisplayStateConfigured to LVSDisplayStateSearching. That leads to VisionKit making a Visual Search request from mediaanalysisd
.
After the Apple Neural Engine (ANE) is run to progress that, successful search results in PegasusKit making its internet connection to identify the object, exactly as it does when Live Text is enabled and text has been recovered from the image:Querying https: // api-glb-aeuw1b.smoot.apple.com/apple.parsec.visualsearch.v2.VisualSearch/VisualSearch with request (requestId: [UUID]) : (headers: ["Content-Type": "application/grpc+proto", "grpc-encoding": "gzip", "grpc-accept-encoding": "gzip", "grpc-message-type": "apple.parsec.visualsearch.v2.VisualSearchRequest", "X-Apple-RequestId": "[UUID]", "User-Agent": "PegasusKit/1 (Mac16,11; macOS 15.6 24G84) visualintelligence/1", "grpc-timeout": "10S"]) [private]
The ANE is finally cleaned up and shut down as the search results are displayed in the Lookup View.
Conclusions
- When Live Text is disabled in Language & Region settings, images are still analysed when they are opened, to determine if they’re likely to contain objects that can be recognised in Visual Search for Visual Look Up.
- If there are no such objects detected, VisionKit proceeds no further.
- If there are suitable objects,
mediaanalysisd
and VisionKit proceed to identify them using Visual Search Hints, as normal. - If the user clicks on a Visual Search Hint, PegasusKit connects to Apple’s servers to identify that object and provide information for display in the Lookup View.
- Although there is less extensive use of the ANE and CPU cores than when Live Text is enabled, neural networks are still run locally to perform a more limited image analysis.
I’m very grateful to Benjamin for pointing out this control over Live Text.