Reading view

There are new articles available, click to refresh the page.

Chinese whispers in PDF metadata

Chinese whispers is an old children’s game where everyone sits in a circle, and one child whispers into the ear of the next on their right a sentence like Send reinforcements, we’re going to advance. That child then whispers the message they heard to the child on their right, until it reaches the one who started it, who says out loud what they heard, classically Send three-and-fourpence, we’re going to a dance, as a demonstration of how messages can so easily become corrupted. What this has to do with China remains one of childhood’s mysteries. I should also explain that three-and-fourpence was idiomatic British English in the days before our currency was ‘decimalised’, and meant three shillings and four (old) pence, about 17 (new) pence, sufficient at one time to enjoy a good night out.

In this article I’m going to do much the same with metadata for a PDF document, tracing what gets indexed by Spotlight, so becoming discoverable by search, and what is displayed in the Finder. This relies on several of my utilities, most of which are available from this page.

Source PDF

I prepared a completely unrelated PDF using my favourite PDF editor, PDF Expert, by adding metadata to be saved in the file’s data. As you might expect, there are several ways that could be stored in the PDF format, including XMP metadata, but in this case for simplicity they were added in the document information dictionary.

I inspected that in a source view in Podofyllin, which found the following fields:
/Author (Author name in pdf)
/Creator (Pages)
/Keywords (keyword1 pdf)
/Subject (Subject in pdf)
/Title (0PDFtest1accessdefault)

When rendered in macOS, those are ‘flattened’ by its Quartz PDF engine, to
/Author (Author name in pdf)
/Creator (Pages)
/Keywords (keyword1 pdf)
/AAPL:Keywords [(keyword1 pdf)]
/Subject (Subject in pdf)
/Title (0PDFtest1accessdefault)

Note the copying of keywords into a new attribute AAPL:Keywords.

Extended attributes

I then added seven extended attributes using Metamer, with names such as com.apple.metadata:kMDItemAuthors, as shown below in xattred.

Spotlight import

I then inspected the file in SpotTest’s new Drop Window, which reported the following attributes found by mdimport:
":EA:kMDItemAuthors" = "author in xattr";
":EA:kMDItemComment" = "xattr comment";
":EA:kMDItemCreator" = "xattr creator";
":EA:kMDItemDescription" = "xattr description";
":EA:kMDItemKeywords" = "keyword1,xattr";
":EA:kMDItemSubject" = "xattr subject";
":EA:kMDItemTitle" = "xattr title";

all from the extended attributes, while those derived from the PDF data were
kMDItemAuthors = (Pages);
kMDItemCreator = Pages;
kMDItemDescription = "Subject in pdf";
kMDItemKeywords = ("keyword1 pdf");
kMDItemTitle = 0PDFtest1accessdefault;

Those attributes have already changed, with PDF Subject becoming kMDItemDescription, Creator being duplicated into kMDItemAuthors, and the loss of PDF Author.

Spotlight indexes

Attributes reported by mdls changed again to
kMDItemAuthors = (Pages)
kMDItemComment = "xattr comment"
kMDItemCreator = "Pages"
kMDItemDescription = "Subject in pdf"
kMDItemKeywords = ("keyword1,xattr")
kMDItemSubject = "xattr subject"
kMDItemTitle = "0PDFtest1accessdefault"

This has lost the xattr attributes kMDItemAuthors, kMDItemCreator, kMDItemDescription and kMDItemTitle, and the PDF kMDItemKeywords. That list of 7 attributes should then be searchable using Spotlight.

The Finder

The final step was to discover which of those could be displayed in the Finder, either in its Get Info dialog, or in the Preview panel of a Finder window.

Only 5 of those attributes survived in the Finder, and were given as
Authors: Pages
Content Creator: Pages
Description: Subject in pdf
Keywords: keyword1,xattr
Title: 0PDFtest1accessdefault

Of those, 4 are taken from the metadata in the PDF file, and only the Keywords were taken from its extended attribute. The attribute named as Authors contains a duplicate of what had originally been in the PDF Creator field, but neither of the PDF Author or xattr kMDItemAuthors fields. Those paths are traced in the diagram below.

Conclusions

Of the total of 12 distinct metadata attributes added in the PDF data and extended attributes, only 6 different items were indexed by Spotlight, and 4 were displayed in the Finder (allowing for the duplication of Authors and Content Creator).

Before relying on metadata for search and access in the Finder, it’s essential to verify that the attributes you intend using are successfully indexed and displayed. Choose the wrong attributes and you’ll never find anything.

How to preserve versions, and how to create versioned PDFs

Keeping previous versions of a document you’re working on can be a great timesaver. Although I seldom restore files from backups, it’s far more frequent that I look back in a file’s versions to recover changed or removed contents. In most cases, those had changed so rapidly that even hourly backups didn’t capture them. Had those versions not been saved automatically by macOS, I would have wasted time trying to recreate them from scratch.

Although a great many apps now come with built-in version support, macOS doesn’t preserve those versions as well as it could. Duplicate a document with versions, save it as another document, or move it to a different volume, and all its versions are blown away. As I explained yesterday, versions don’t transfer in iCloud Drive either. This article explains how you can work around all those, how to ensure versions get backed up, and how you can create PDF documents with versions.

How versions work

Opening the version browser from the File menu.

Many apps now have built-in support to automatically save versions of documents you edit with them. The tell-tale sign is when the app has a Revert To command in its File menu, which opens the current document in a version browser resembling the Time Machine app.

revisions1

Each time you save a document in those apps, the existing document is saved to a hidden and locked-down database at the root level of that volume, in the .DocumentRevisions-V100 folder. When you use the Revert To command to browse all versions, you can look back at all the versions of that document saved to that volume. You can then revert to an older version, or copy contents from an older version to the current one. As long as that document remains in its current volume, those versions will remain accessible. However, if that document is moved to a different volume, even on the same disk, those versions don’t move with it, but will be retained in the original volume until it’s deleted from there.

If you’ve been editing a document in your Documents folder and move it into iCloud Drive, which is actually a folder inside your Home Library folder, its versions will be preserved when you access it from the same Mac. However, other Macs connected to the same iCloud account won’t see those versions, as they remain on the original Mac and don’t get synced to iCloud Drive.

How to extend versions

Apps can’t access stored versions directly, but have to do that via macOS. The most compatible way for them to do that is to fetch previous versions from the volume version database. They can then save each version as a separate file, and reconstitute a document complete with its versions by adding those files back to a document’s versions in the volume’s database. That’s exactly what my free utilities Versatility and Revisionist do. Versatility is the simpler of the two to use here, while Revisionist adds more features including checking documents to see how many versions they already have stored in their volume database.

Archive versions

Simply drag and drop a file with saved versions onto Versatility’s window, and it will extract all its versions and save them as a series of numbered files in a folder. You can then copy or move that folder to any other location, where you can reconstitute the original document with all its versions intact.

Unarchive versions

Simply drag and drop a folder containing archived version files onto Versatility’s window, and it will reconstitute the original document with all its previous versions.

Versions in iCloud Drive

To preserve all versions in iCloud Drive and make them available to all that connect to that folder, move the document from its existing location in your Home folder, to the correct folder in iCloud Drive. Once it’s there:

  1. Perform any editing necessary.
  2. Archive. Drop the document onto Versatility’s window, and save that archive folder to the same location.
  3. Move the original document elsewhere.

When you want to edit that document, particularly on another Mac, on that Mac:

  1. Unarchive. Drop the archive folder onto Versatility’s window, and save the document to the same location.
  2. Edit the document, saving whenever you want to create a new version.
  3. When you’ve finished, save for the last time, and close the document.
  4. Archive. Drop the document onto Versatility’s window, and save that archive folder to the same location.
  5. Tidy up old archive folders and files.

That leaves the most recent archive folder, with composite versions written in the correct order, with the right timestamps, ready to be unarchived on the next Mac to edit that document. This may appear complicated, but once you have tried it out, it’s really a simple sequence to unarchive-edit-save-archive and hand over to the next editor.

Backing up versions

I’m not aware of any backup utility for macOS, including Time Machine, that backs up and preserves versions, although they are stored in local volume snapshots. However, all you have to do is archive the versions of important files into folders using Versatility, and back up those folders. When you want to restore the original document with all its versions, simply unarchive that folder using Versatility.

PDF versions

One of the lesser-known features of the PDF format are incremental updates, which can provide a primitive form of versioning within a single file. In practice it catches users out when they publish a PDF containing old edited content they thought had been removed.

Few PDF editors and viewers support the macOS version system, but Preview does, and Versatility can be used to assemble a PDF document with versions.

For my example, rather than edit a PDF, I generated a series from an archived RTF, converting each file into a consecutively numbered PDF, starting from 000.pdf and rising to 010.pdf. I then dropped that folder onto Versatility’s window and it unarchived those into a single PDF document with all those versions.

Key points

  • Archive a file with saved versions to a folder by dropping it onto Versatility’s window.
  • Unarchive a folder containing version files to a file with saved versions by dropping it onto Versatility’s window.
  • In iCloud Drive, unarchive-edit-save-archive to preserve versions for the next editor.
  • Wherever you want to preserve versions, archive the file using Versatility.

❌