George Orwell didn’t specifically mention enterprise search in his visionary book 1984, but he made a statement that still resonates today. When it comes to relevant search query results, “it’s not about the statistics.” Sanity comes from the human element.
Until recently, this concept was mostly ignored by enterprise search solutions. Instead, search was based on text-matching algorithms and models that methodically sifted through link structures or categorization schemes. After analysis and compilation, a surge of search results (all quantifiably correct) were released. While a statistician might be impressed by the outcome, the user who initiated the search simply drowned in a sea of homogenous data.
Too Many Choices
The question became, “Is it possible to produce a set of meaningful search results that will help people, rather than inundate them?”
In response, enterprise search technologies introduced tools that allowed “experts” to tune the search algorithm for their specific content, and introduced meta-tagging best practices to structure content and produce more meaningful results. More recently, explicit user actions such as click-throughs, ratings and feedback were introduced to solve the issue of search relevancy. Today, advanced techniques including “social search” have evolved to take into account how humans search for, find, and consume information and products in the physical world.
Comparing Various Approaches to Solve Search
Enterprise search has long been the de facto standard for corporate knowledge databases and Web sites. Because this approach is typically based on text-matching algorithms, it does a thorough job in finding all documents that match a particular query term. Some highly evolved search engines use specialized classification systems or pattern recognition techniques that rely on statistical inference. As a result, enterprise search engines have experienced huge increases in performance, comprehensiveness and automation. However, they still lack the single most important ingredient that produces relevant search results: subjectivity. Guided by a few ambiguous words typed into a text field, a search engine still has no reliable way of accurately interpreting the actual intent of an individual user.
To cope with subjectivity, and the lack of understanding around user intent, various approaches including expert-based tuning and user-explicit tuning have been developed to try and provide more useful results to the end user.
Different expert tuning approaches can include tuning based on algorithms, tuning the actual content and tuning based on queries. The primary issues with these expert-driven approaches are that they can be severely labor intensive, biased and lack insight as to the true intent of the user. Ultimately they proved to be too rigid and laborious over time and so a more scalable model was envisioned; one that would take into account user feedback.
This second approach, known as user-explicit tuning, gained attention because of its focus on the “human element”: incorporating behaviors of Web site visitors.
User-centric search applications observe and analyze explicit visitor actions such as click-throughs, voting and visitor comments. More recently, some engines have begun incorporating social tagging. Popularized by del.icio.us and other Web 2.0 sites in the last few years, social tagging is a decentralized way to tag Web site content. Fundamentally, content is still manually organized by explicit human judgment, but in this case end users tag the content instead of internal experts.
By monitoring this feedback and behavior, user-centric models can tap into a deep well of human experience that reflects a wide range of subjective opinions and judgment related to the usefulness of the selected material. Also, because the feedback is dynamic, search results are more current than static rules or metadata.
However, like its predecessor, user-based models have several drawbacks. First off, because most of these search systems monitor only a limited set of explicit user actions, there is a large potential for error. Because click-throughs are erroneously considered a successful user query, search systems will give credit to that bad result, which will cause it to appear higher in the search results over time, and making it a more popular document in general.
This false popularity also creates a secondary hazard with tagging. In effect, the more popular the content is, the more diverse the set of subjects users tag it with, and the more likely it will show up in any given search result. In both cases, “most popular” doesn’t necessarily equate to “most relevant.”
Additionally, search results that involve explicit user feedback can be skewed because they may only take into account the behavior of a very small sample of the overall Web site visitor population. Deemed the “1:99 problem,” systems that require explicit feedback tend to attract the one percent of users who are most polarized and therefore are a poor representation of your user base, while ignoring the 99 percent of visitors who represent the silent majority.
Perhaps most concerning is that user-centric systems which rely on explicit feedback have severe exposure to gaming. Malicious users can manipulate the behavior of the live site using automated link bots, form bots, tagging engines that impersonate users, and various other techniques.
Enter Social Search
In the bestselling book, The Wisdom of Crowds, James Surowiecki explores a simple idea that has profound implications: large groups of people are smarter than an elite few — regardless of the intelligence of those few. According to Surowiecki, four essential elements form a “wise crowd”: diversity of opinion, independence, decentralization and aggregation.
Social search takes into account various forms of user behavior, while respecting Surowiecki’s premise. Most social search techniques rely on explicit (and often misleading) user actions and feedback, however. Social search techniques also frequently ignore a key element in determining a visitor’s true intent when conducting a search: the overall context of a visitor’s behavior.
Context, in the case of social search, refers not only to the context of the visitor’s individual actions on a Web site, but also how those actions compare and relate to the actions of other visitors.
Sites like NASA.gov, which serves more than 15 million visits per month, are seeing major benefits from social search. The NASA.gov Web site uses social search to understand a visitor’s true intent and produce search results as well as content and video recommendations that are most appropriate to each of NASA’s main audiences, including students, researchers and space enthusiasts. By capturing implicit site behaviors and adapting its responses appropriately, NASA automatically tailors the site experience for each visitor based on the information they and their like-minded peers would find helpful — regardless of whether it is text content, images, videos or animation.
Silent observation is key to the success of social search. This approach contrasts significantly with most existing social search monitoring techniques in which behavior is collected via explicit user actions (such as tagging, voting, feedback, etc.) The goal of silent observation, however, is to accurately interpret and determine the true intent of a Web site visitor without interrupting the flow of the experience.
This data, along with many other types of implicit on-page and pan-page actions, can be collected from every Web site visitor and analyzed. As this data is continuously distilled, virtual communities of like-minded visitors begin to emerge. Actions, patterns, and tendencies associated with these communities form the basis of a collective perspective.
In the Future
The exponential growth of content and dynamic nature of today’s Web sites requires a fundamental rethinking of how to determine the intent of Web site users and match them with the most appropriate content. Enterprise search technologists have begun asking these questions, but the industry is still early in its transition from an expert-centric to a user-centric mentality. The ultimate solution will require that search systems be highly scalable, intelligent and free of bias.
Social search is by definition labor-scalable, content-scalable and highly dynamic. Social search also benefits from the collective intelligence of site visitors, creating a more natural and compelling search experience than what is possible through experts only. Social search, if conceived of properly, will also be devoid of bias by utilizing the silent behaviors of Web site users. This matches content with true intent, while preventing gaming tactics or over representation of the vocal minority.
Scott Brave, Ph.D, is a founder and CTO of Baynote. Prior to Baynote, he was a postdoctoral scholar at Stanford University and served as lab manager for the CHIMe (Communication between Humans and Interactive Media) Lab.
Scott, great article. I do hope you can make it to the HCIR ’08 workshop in October, which is all about bringing the human into the problems of search and information retrieval. And of course this is what I talk about all the time on my blog.
HCIR ’08: http://research.microsoft.com/~ryenw/hcir2008/
My blog: http://thenoisychannel.blogspot.com/