Thinking about the future of search, Part 2: Semantic search

By kristen.koch

Semantics and the future of search
In Part I of our two-part “Thinking about the future of search” series, we discussed the social recommendation side of the future of search.  Here, we’ll explore the possibilities of semantic search.

At hypios, we’re really excited about the semantic web—so excited that we can’t stop talking about it.  We think that data will eventually be structured, machine-readable, and linked, vastly improving search.  Engines will return better (more relevant, more specific) results in an easier-to-read format.

Meaning and context
Search depends on meaning, and meaning depends on context.  Getting good search results depends a lot on our ability to define our terms and specify a certain meaning.  Right now, we have to put in more keywords and use a common vocabulary; in the future, we might see:

  • Personalized definitions: Perhaps my favorite sport is ultimate Frisbee.  How can I search without having to type in “ultimate frisbee” every time? Social semantic bookmarking sites like Faviki are working on this.  You bookmark articles you like or videos you’ve watched, then tag them with keywords.  You can then attach a concept to each keyword using Wikipedia articles.  For example, I would associate keywords like ‘championship’ or ‘tournament’ with the concept of ultimate Frisbee as defined by the Wikipedia article on the sport. The system then has a map of my definitions of these words (me, ‘tournament,’ ‘http://en.wikipedia.org/wiki/Ultimate_(sport)‘) that could be used to help search engines better understand my queries.
  • Social definitions: Many terms are better defined in a social context, not by individual tagging.  If a search engine can take your social network into account, it may be able to infer these socially defined meanings.  Perhaps my Frisbee team has an offensive play nicknamed ‘the windshield wiper.’  When we share videos with this tag, we’re talking about the play, not the automotive part.  For a social group of mechanics, however, ‘windshield wiper’ probably refers to the widget on your car.  Researchers are working on ways to get from user-defined meaning to socially defined meaning—by analyzing the ways a social group tags concepts, for example.
  • Context sensitivity: It’s a lot of work to tag everything with keywords and concepts.  Search engines could better understand queries by looking at the context of my searches, too.  If I have one browser tab open to the UPA website and search ‘discs’ in another, I’m probably looking for information about Frisbees, not CDs.  Search engines could recognize context by looking at your browser and email history, IP address, or updates to the cloud (social networks like Twitter or online documents like Google Docs or Wave).

Using structured data to display more relevant results
The other side of improving search through semantics is getting better results by returning more relevant links or bypassing links altogether and simply displaying relevant information.  Search engines can already use semantic data to classify pages more appropriately.  If webmasters provide structured data by marking up their sites (with microformats, RDFa, or XML feeds, for example), engines can recognize them as homepages, review sites, bookmarking sites, etc, and show this information to users as part of their search results.

The problem with searching for a string of keywords—perhaps, once again, ‘piano teachers USA’—is that by the second or third page of results, they no longer occur in the same phrase, but several sentences apart (“I felt like a piano had been dropped on my head…My grandparents were math teachers when they came to the USA.”).  Technically, the keywords are in close proximity; in reality, it’s not a relevant result and we’re not going to click through, based on our skimming of the ’snippet’ below the link.

Or perhaps it is a relevant result and the snippet just shows the most recent post or unrelated information.  How can developers and site owners ensure that their snippets reflect their sites’ content?

How marking up your site can increase traffic
Projects like Yahoo’s SearchMonkey and Google’s Rich Snippets have already begun working with developers to standardize markup formats, incorporate structured data into sites, and display this data in search results.

  • Content summaries: Participating developers can build applications that will include pertinent information like phone numbers, addresses, and user reviews in the snippet.  Instead of random phrases where search terms appear, you’ll be able to see summaries of a page’s content.  Both Yahoo and Google insist that these more appealing results will drive more traffic to your site, though some counter that if users can see a phone number or address on the search page, they’re unlikely to click through.  (Then again, there’s only so much to be said in or gleaned from a snippet.)
  • Better rankings: Including semantic data won’t just drive traffic to your site; it’s likely that your site will get better rankings, too.  This isn’t because search engines throw out non-semantic data or because Yahoo and Google have decided to favor sites with structured data.  Rather, the content of pages with semantic data can be more precisely related to search queries.  (You could argue that pages with undeservedly low rankings could move up by incorporating semantic data—homepages of people or businesses with the same names, for example.)

Improving how results are displayed
SearchMonkey and Rich Snippets are about improving both the relevancy and format of search results.  As I discussed in Part I, when I ask a search engine how many piano teachers there are in the U.S., I’m just looking for a number.  Search engines already display a kind of ‘enhanced snippet’ in response to certain queries, as this ResearchForward post points out.  They can do simple calculations and conversions, and if you enter “city name weather,” a basic forecast shows up.

For well-known figures, Bing shows something better than snippets: it can organize links into categories.  Using the example of a search for Emily Bronte, Michael Hemment of ResearchForward shows how you can choose from headings like “Biography and Works,” “Videos,” and “Images.”  This is helpful when you’re not starting from scratch—if, for example, you’re an academic who doesn’t need a synopsis from Wikipedia for your article about Shakespeare’s use of the supernatural in Julius Caesar.

Searching a database instead of the Web
Microsoft is also taking a slightly different approach to semantic data.  On November 11, the Bing blog announced the integration of the Bing “decision engine” with Wolfram|Alpha, the “computational knowledge engine.”  Wolfram|Alpha does not search the web, but its own database of curated, structured information.  Access to Wolfram|Alpha’s algorithms will allow Bing to provide ‘answers’ to certain queries, not just links, and perform more complicated calculations (including plotting equations and musical notes, which ResearchForward calls “truly amazing”).  Of course, someone at Wolfram|Alpha has to curate and organize all the information before it can be searched.

Fortunately, this isn’t as impossible as it seems.  The interest of big search companies in semantic search and the willingness of Internet users to contribute content (for example, on Wikipedia or YouTube) point towards a future in which information will be organized and easily searchable.  Maybe someday I’ll finally found out how many piano teachers are out there putting youngsters through scales.

Many thanks to hypster and Semantic Webber Milan Stankovic for his help with this series.

Photo by dullhunk via Flickr.

Tags: , , , , , , , , ,

One Response to “Thinking about the future of search, Part 2: Semantic search”

  1. dgoldgaber Says:

    I think this is a fascinating look at some of the “low-hanging fruit” for semantified data. I was happy to learn about faviki–i think it could be really useful for researchers/specialists who search in the same semantic field all the time. For them, it might be worth the time-investment of keywording and linking. For more casual and eclectic searchers it would seem not worth the bother. great post!

Leave a Reply