Reading view

There are new articles available, click to refresh the page.

A brief history of Internet search

Searching the Internet, more recently its web servers, has proceeded in four main phases. Initially, humans built structured directories of sites they considered worth visiting. When those couldn’t keep pace with the Internet’s growth, commercial search engines were developed, and their search results were ranked. Around 2000, Google’s PageRank algorithm became dominant for ranking pages by their popularity. Then from late 2024 that is being progressively replaced with AI-generated summaries. Each of these has been reflected in the tools provided by Mac OS.

Directories

In the earliest years of the Internet, when the first web servers started to appear, and files were downloaded using anonymous FTP, users compiled their own lists by hand. Some curated directories were made public, including one maintained by Tim Berners-Lee at CERN, and another at NCSA. Individuals started using Gopher, a client to discover the contents of servers using the service of the same name. The next step was the development of tools to catalogue Gopher and other servers, such as Veronica and Jughead, but it wasn’t until 1993 that the first search engine, W3Catalog, and a bot, the World Wide Web Wanderer, started to transform Internet search.

Berners-Lee’s directory grew into the World Wide Web Virtual Library, and still exists, although it was last updated several years ago, most is now hosted elsewhere, and some is broken. The most famous directory was originally launched in 1994 and was then known as Jerry and David’s Guide to the World Wide Web, later becoming Yahoo! Directory. This offered paid submission and entry subscriptions, and was closed down at the end of 2014.

The favourite of many (including me) was launched as GnuHoo in 1998, and later that year, when it been acquired by Netscape, became the Open Directory Project, then DMOZ, seen here in the Camino browser in 2004. Although owned by AOL, it was maintained by a volunteer community that grew rapidly to hold around 100,000 links maintained by about 4,500 volunteers, and exceeded a million links by the new millennium. DMOZ closed in 2017 when AOL lost interest, but went on as Curlie using the same hierarchy.

Sherlock was first released in Mac OS 8.5 in 1998. As access to the web grew, this came to encompass remote search through plug-ins that worked with new web search engines.

Those were expanded in Sherlock 2, part of Mac OS 9.0 from 1999 and shown above, and version 3 that came in Mac OS X 10.2 Jaguar in 2002.

Indexing and ranking

Human editors couldn’t keep pace with the growth of the web, and demand grew for searching of indexes. This posed the problem of how to rank pages, and development of a series of ranking algorithms, some of which were patented. The first to use links (‘hyperlinks’) was Robin Li’s RankDex, patented in 1996, two years before Sergey Brin and Larry Page’s PageRank that brought their success in Google.

Ranking search results wasn’t new. In the late twentieth century, sciences started measuring the ‘impact’ of published papers by counting their citations in other papers, and university departments and scientific journals laid claim to their greatness by quoting citation and impact indexes. Early search ranking used features such as the frequency of occurrence of the words in the search term, which proved too crude and was manipulated by those trying to promote pages for gain. The obvious replacement was incoming links from other sites, which also quickly became abused and misused.

Research into networks was limited before 1998, when Jon Kleinberg and the two founders of Google entered the field. As with citation indexes before, they envisaged link-based ranking as a measure of popularity, and popularity as a good way of determining the order in which search results should be presented. They also recognised some of the dangers, and the need to weight incoming links to a page according to the total number of such links made by each linking site. Oddly, Kleinberg’s prior work wasn’t incorporated into a search engine until 2001, by which time Brin and Page were powering Google to dominance, and in June 2000 provided the default search engine for Yahoo!

This is Yahoo! Search seen in Firefox in 2007, by which time it was using its own indexing and search engine.

PageRank and algorithms

Google grew prodigiously, and became rich because of its sales of advertising across the web, a business dependent on promotion of its clients, something that could be achieved by adjusting its PageRank algorithm.

Although it’s hard to find now, at one time Google’s Advanced Search was widely used, as it gives more extensive control. Here it’s seen in Safari of 2011.

Google Scholar gives access to published research in a wide range of fields, and was introduced in late 2004. Here it’s seen in use in 2011, listing work that’s recently become topical again. Scholar doesn’t use the same PageRank-based algorithm for ranking its results, but does give substantial weight to citation counts.

When Apple replaced Sherlock with Spotlight in Mac OS X 10.4 Tiger in April 2005, web search defaulted to newly-arrived Safari and Google’s search engine. Its major redesign, in OS X 10.10 Yosemite in 2014, merged web and local search into Global Spotlight, the search window that opens from the Spotlight icon at the right end of the menu bar. That in turn brought Spotlight Suggestions, which became Siri Suggestions in macOS Sierra.

spotlighticloud

This shows a search in Global Spotlight in macOS 10.12 Sierra, in 2017.

Apple has never explained how Siri Suggestions works, although it appears to use machine learning and includes partial results from web search probably using Google. It offers a taste of what is to come in the future of Internet search.

Summarising

Google started the transition to using Artificial Intelligence in 2024, and that September introduced Audio Overview to provide spoken summaries of documents. This year has brought full AI overviews, in which multiple pages are summarised succinctly, and presented alongside links to the pages used to produce them. Although some can be useful, many are vague and waffly, and some blatantly spurious.

We’ve come a long way from Tim Berners-Lee’s curated directories, and PageRank in particular has transformed the web and more besides.

References

Wikipedia:
Gopher
Web directory
Search engine
Google Scholar

Amy N Langville and Carl D Meyer (2006) Google’s PageRank and Beyond: the Science of Search Engine Rankings, Princeton UP. ISBN 978 0 691 12202 1.

Last Week on My Mac: The Swiss Army knife of search

The Swiss Army knife has fallen victim to unintended consequences. Once the dream of every schoolboy and pocketed by anyone who went out into the countryside, my small collection of Swiss Army knives and multi-tools now remains indoors and unused. This is the result of strict laws on the carriage of knives in the UK; although not deemed illegal, since 1988 carrying them in a public place has put you at risk of being stopped and searched, and one friend was subjected to that for carrying a mere paint-scraper.

Swiss Army knives have another more sinister danger, that they’re used in preference to dedicated tools. Over the last week or two as I’ve been digging deeper into Spotlight, I can’t help but think how it has turned into the Swiss Army knife of search tools, by compromising its powers for the sake of versatility.

At present, I know of four different Spotlights:

  • Global Spotlight, incorporating local, web, and some in-app search, accessed through the Spotlight tool in the menu bar;
  • Local Spotlight, restricted to searching files in local and some network storage, typically through a Find window in the Finder;
  • Core Spotlight, providing search features within an app, commonly in the contents of an app’s database;
  • Third-Party Local Spotlight, a more limited local search available to third-party apps.

Of those, it’s Global Spotlight that I find most concerning, as it’s the frontline search tool for many if not most who use Macs, and the most flawed of the four. It’s not even the fault of Spotlight, whose 20th birthday we should have celebrated just over a month ago. No, this flaw goes right back to Sherlock, first released in Mac OS 8.5 in 1998.

At that time, few Macs had more than 5 GB of hard disk storage, and local search typically dealt with tens of thousands of files. That was also the first year that Google published its index, estimating that there were about 25 million web pages in all. Apple didn’t have its own web browser to offer, but made Microsoft’s Internet Explorer the default until Safari was released five years later. Merging local and web search into a single app seemed a good idea, and that’s the dangerous precedent set by Sherlock 27 years ago.

The result today only conflates and confuses.

spotlighticloud

In the days of Sherlock, web search was more a journey of discovery, where most search engines ranked pages naïvely according to the number of times the search term appeared on that page. That only changed with the arrival of Google’s patented PageRank algorithm at the end of the twentieth century, and placement of ads didn’t start in earnest until the start of the new millennium, by which time Safari was established as the standard browser in Mac OS X.

Local search was and remains a completely different discipline, with no concept of ranking. As local storage increased relentlessly in capacity, file metadata and contents became increasingly important to its success. Internally local searches have been specified by a logical language of predicates that are directly accessible to remarkably few users, and most of us have come to expect Spotlight’s indexing to handle metadata for us.

The end result challenges the user with negotiating web search engines and dodging their ads using one language, confounded by the behaviour of Siri Suggestions, and hazarding a wild guess as to what might come up in the metadata and content of files. More often than not, we end up with a potpourri that fails on all counts.

As an example, I entered the terms manet painting civil war into Spotlight’s Global Search box and was rewarded with a link to Manet’s painting of The Battle of the Kearsarge and the Alabama from 1864, as I’d hope. But entered into the search box of a Find window, those found anything but, from Plutarch’s Lives to a medical review on Type 2 diabetes. In MarsEdit’s Core Spotlight, though, they found every article I have written for this blog that featured the painting.

manetkearsargealabama
Édouard Manet (1832–1883), The Battle of the Kearsarge and the Alabama (1864), oil on canvas, 134 x 127 cm, Philadelphia Museum of Art, Philadelphia, PA. Wikimedia Commons.

To get anything useful from local Spotlight, I had to know one of the ships was the USS Kearsarge, and that unusual word immediately found an image of the painting, but no useful content referring to it. Had I opted to search for the word Alabama instead, I would have been offered 94 hits, ranging from linguistics to the Mueller report into Russian interference in the 2016 US Presidential election. Adding the requirement that the file was an image narrowed the results down to the single image.

Conversely, entering Kearsarge into Global Spotlight offered a neighbourhood in North Conway, New Hampshire, in Maps, information about three different US warships from Siri Knowledge, Wikipedia’s comprehensive disambiguation page, a list of five US warships of that name, and three copies of the image of Manet’s painting without any further information about them.

Spotlight is also set to change with the inevitable addition of AI. Already suggestions are tailored using machine learning, but as far as I’m aware local Spotlight doesn’t yet use any form of AI-enhanced search. Words entered into search boxes and bars aren’t subject to autocorrection, and although Global Spotlight may suggest alternative searches using similar words, if you enter acotyle Spotlight doesn’t dismiss it as a mistake for acolyte. It remains to be seen whether and when local Spotlight switches from Boolean binaries to fuzziness and probability, but at least that will be more akin to the ranking of web pages, and we’ll no longer need to be bilingual.

For the time being, we’re left with a Swiss Army knife, ideal for finding where Apple has hidden Keychain Access, but disappointing when you don’t know exactly what you’re looking for.

❌