A brief history of Internet search
Searching the Internet, more recently its web servers, has proceeded in four main phases. Initially, humans built structured directories of sites they considered worth visiting. When those couldn’t keep pace with the Internet’s growth, commercial search engines were developed, and their search results were ranked. Around 2000, Google’s PageRank algorithm became dominant for ranking pages by their popularity. Then from late 2024 that is being progressively replaced with AI-generated summaries. Each of these has been reflected in the tools provided by Mac OS.
Directories
In the earliest years of the Internet, when the first web servers started to appear, and files were downloaded using anonymous FTP, users compiled their own lists by hand. Some curated directories were made public, including one maintained by Tim Berners-Lee at CERN, and another at NCSA. Individuals started using Gopher, a client to discover the contents of servers using the service of the same name. The next step was the development of tools to catalogue Gopher and other servers, such as Veronica and Jughead, but it wasn’t until 1993 that the first search engine, W3Catalog, and a bot, the World Wide Web Wanderer, started to transform Internet search.
Berners-Lee’s directory grew into the World Wide Web Virtual Library, and still exists, although it was last updated several years ago, most is now hosted elsewhere, and some is broken. The most famous directory was originally launched in 1994 and was then known as Jerry and David’s Guide to the World Wide Web, later becoming Yahoo! Directory. This offered paid submission and entry subscriptions, and was closed down at the end of 2014.
The favourite of many (including me) was launched as GnuHoo in 1998, and later that year, when it been acquired by Netscape, became the Open Directory Project, then DMOZ, seen here in the Camino browser in 2004. Although owned by AOL, it was maintained by a volunteer community that grew rapidly to hold around 100,000 links maintained by about 4,500 volunteers, and exceeded a million links by the new millennium. DMOZ closed in 2017 when AOL lost interest, but went on as Curlie using the same hierarchy.
Sherlock was first released in Mac OS 8.5 in 1998. As access to the web grew, this came to encompass remote search through plug-ins that worked with new web search engines.
Those were expanded in Sherlock 2, part of Mac OS 9.0 from 1999 and shown above, and version 3 that came in Mac OS X 10.2 Jaguar in 2002.
Indexing and ranking
Human editors couldn’t keep pace with the growth of the web, and demand grew for searching of indexes. This posed the problem of how to rank pages, and development of a series of ranking algorithms, some of which were patented. The first to use links (‘hyperlinks’) was Robin Li’s RankDex, patented in 1996, two years before Sergey Brin and Larry Page’s PageRank that brought their success in Google.
Ranking search results wasn’t new. In the late twentieth century, sciences started measuring the ‘impact’ of published papers by counting their citations in other papers, and university departments and scientific journals laid claim to their greatness by quoting citation and impact indexes. Early search ranking used features such as the frequency of occurrence of the words in the search term, which proved too crude and was manipulated by those trying to promote pages for gain. The obvious replacement was incoming links from other sites, which also quickly became abused and misused.
Research into networks was limited before 1998, when Jon Kleinberg and the two founders of Google entered the field. As with citation indexes before, they envisaged link-based ranking as a measure of popularity, and popularity as a good way of determining the order in which search results should be presented. They also recognised some of the dangers, and the need to weight incoming links to a page according to the total number of such links made by each linking site. Oddly, Kleinberg’s prior work wasn’t incorporated into a search engine until 2001, by which time Brin and Page were powering Google to dominance, and in June 2000 provided the default search engine for Yahoo!
This is Yahoo! Search seen in Firefox in 2007, by which time it was using its own indexing and search engine.
PageRank and algorithms
Google grew prodigiously, and became rich because of its sales of advertising across the web, a business dependent on promotion of its clients, something that could be achieved by adjusting its PageRank algorithm.
Although it’s hard to find now, at one time Google’s Advanced Search was widely used, as it gives more extensive control. Here it’s seen in Safari of 2011.
Google Scholar gives access to published research in a wide range of fields, and was introduced in late 2004. Here it’s seen in use in 2011, listing work that’s recently become topical again. Scholar doesn’t use the same PageRank-based algorithm for ranking its results, but does give substantial weight to citation counts.
When Apple replaced Sherlock with Spotlight in Mac OS X 10.4 Tiger in April 2005, web search defaulted to newly-arrived Safari and Google’s search engine. Its major redesign, in OS X 10.10 Yosemite in 2014, merged web and local search into Global Spotlight, the search window that opens from the Spotlight icon at the right end of the menu bar. That in turn brought Spotlight Suggestions, which became Siri Suggestions in macOS Sierra.
This shows a search in Global Spotlight in macOS 10.12 Sierra, in 2017.
Apple has never explained how Siri Suggestions works, although it appears to use machine learning and includes partial results from web search probably using Google. It offers a taste of what is to come in the future of Internet search.
Summarising
Google started the transition to using Artificial Intelligence in 2024, and that September introduced Audio Overview to provide spoken summaries of documents. This year has brought full AI overviews, in which multiple pages are summarised succinctly, and presented alongside links to the pages used to produce them. Although some can be useful, many are vague and waffly, and some blatantly spurious.
We’ve come a long way from Tim Berners-Lee’s curated directories, and PageRank in particular has transformed the web and more besides.
References
Wikipedia:
Gopher
Web directory
Search engine
Google Scholar
Amy N Langville and Carl D Meyer (2006) Google’s PageRank and Beyond: the Science of Search Engine Rankings, Princeton UP. ISBN 978 0 691 12202 1.