Recontextualization of metadata – part 3: semantic web

Which role do metadata play in the semantic web? The latter aims at structuring the contents of the web in a way which makes it possible for machines to find and combine information. At present, there are two approaches which lead to this goal, one bottom-up and one top-down. In short, the bottom-up approach depends on metadata being added to web pages by content providers while in the top-down approach a third party crawls the web or a selection of sites and automatically generates a set of metadata for certain pages.

Today, a number of semantic web applications give a glimpse into the prospects of what can be done with highly structured data. It is interesting to note that all examples come from projects embracing the top-down approach, not from the bottom-up approach which asks for systematic assignment of metadata. I’d like to take a closer look at Freebase Parallax. The power of Parallax is to aggregate information from different sources and make it visible on one screen – in a parallel way (hence the name, I assume) instead of the usual sequential succession of sources. The data Parallax uses are mainly from the Wikipedia, so don’t expect any groundbreaking new facts. But it’s still quite an awesome application, at least when you stick to the examples used in the demo video. (When experimenting yourself, it is rather difficult to find good examples, but there are a few more on Infobib (in German).)

The first example from the demo video is a search for Abraham Lincoln’s children and shows how Parallax can arrange all four entries on one page instead of the user having to navigate to each single page.

Abraham Lincoln's Children on Parallax

Abraham Lincoln's Children on Parallax

In a next step, the narrator of the video expands his search to all children of all American presidents, showing Robert Todd Lincoln alongside with children from George Bush, Sr. and Jr., and then goes on to find all schools American presidents’ children attended.

Lincoln's son alongside the Bushes children

Lincoln's son alongside the Bushes children

What has happened is that the Wikipedia entry on Abraham Lincoln was stripped of its context and reduced to two relationships: a) Abraham Lincoln was a president of the US, and b) Abraham Lincoln had children. When the same is done for the rest of the Wikipedia, then all entries with the attribute «American president» and «has children» can be combined. Like in a database, you can match all X which have an attribute Y (e.g. a school) – without even having to give this attribute Y a value (e.g. the name of a certain school).

The examples shows that the de-contextualization (or reduction of Robert Todd Lincoln to «child of US president») has great potential in terms of creating new associations of information. A number of Firefox extensions make use of this and combine information about books, films etc. with people who have visited the according web sites (Glue) or link terms to other sources according to their classificaton as person, company, country etc. (Gnosis). The re-contextualization, however, is tricky. If too much context is suppressed, the results become ridiculous, as the link from Abraham Lincoln (correctly identified as a person) to LinkedIn.

Gnosis' links for Abraham Lincoln

Gnosis' links for Abraham Lincoln

Also, the ability of machines to categorize information still seems to be very limited. Glue (called «BlueOrganizer» when I took the screenshot), for instance, can only recognize books on websites such as Amazon or Barnes & Nobles, but not on the Library of Congress’ site. And the insight of recognizing a category comes across to the human user as rather silly.

Blue Organizer recognizes that you're looking at a book!

Blue Organizer recognizes that you're looking at a book!

This lack cannot be attributed to the absence of metadata. Zotero (mentioned earlier on this blog), for instance, manages to identify books on many more sites, of libraries as well as on-line bookshops. It just shows how much effort is necessary to recognize structure on the web as such. This is exactly what the top-down approach achieves: It identifies concrete questions (e.g. who else read this book?) as well as sources which might be relevant to answer them (e.g. sites which provide links from a book to people who have read it) and then creates new relationships between these data. This is a much more selective approach than the bottom-up one, and I believe it is more promising. The bottom-up approach either doesn’t go beyond today’s databases or it aims at the maximum amount of options for future relationships. The bottom-up approach is too unspecific to make it attractive and there are too many uncertainties about its usefulness. This, I belive, is the main reason why it isn’t being adopted. For reasons of semantics, I also doubt it will be possible to achieve useful results by automatically mapping data from heterogeneous sources via generic ontologies in the near future (see the above example of Abraham Lincoln in LinkedIn).

Maybe the bottom-up approach should be viewed without rigid technical requirements, but more openly, in the sense of carefully produced content which

  • embeds information in its context (e.g. origin, use, related information) and/or
  • structures content within a certain scope by using (more or less) standardized elements.
Standardized display of a city on Wikipedia

Standardized display of a city on Wikipedia

This leads me back to my initial question: Are metadata good for findability? The semantic web, at least at present, is another example that adding descriptors is neither popular with content producers nor seems to be considered crucial for automated processing. But metadata aren’t restricted to descriptors which are added to content; the features mentioned above are pure metadata and absolutely essential for creating structured search. And structured search, or variations of it like faceted navigation or or the semantic web applications mentioned above, have a great advantage over pattern-matching search engines. Instead of only being able to look for X, it is possible to look for something you do not exactly know except that it should have the characteristic Y.

Carefully structured, metadata-rich information may not be crucial for search engines which rely on pattern-matching. But it opens new perspectives for exploring related content, for finding answers without exactly knowing the question.


2 Responses to “Recontextualization of metadata – part 3: semantic web”

  1. 1 hauschke December 16, 2008 at 9:17

    Thanks for linking to Infobib. I build a few other examples for Freebase Parallax here and here. The postings are in German, but I guess the embedded parallax mashups can be understood “standalone”.

  2. 2 jump start in orange April 10, 2013 at 4:48

    Undeniably believe that which you said. Your favorite reason seemed to be
    on the web the simplest thing to be aware of.
    I say to you, I certainly get irked while people think about worries that
    they just do not know about. You managed to hit the nail
    upon the top and defined out the whole thing without having side
    effect , people could take a signal. Will probably be back
    to get more. Thanks

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: