Archive for September, 2008

Metadata for Findability

Does it make sense to add metadata to information for the sake of findability? Historically, it definitely did. Before the age of digitally available text and full text search, there was no other means to provide access to large amounts of information. Librarians filed books under their author, title, keywords etc. Today, descriptors – keywords, subject headings, tags etc. – seem to have difficulties making the transition to the 21st century. Is this justified? Let’s take a look at some examples I noted recently which doubted the effectiveness of descriptors.

Kumarkamal on Basics of Search Engine Optimization warns «Don’t waste your time on meta tags», pointing out that search engines largely ignore descriptors.

Mathew Ingram raises the question «Who bookmarks anymore?» He states that, instead of consulting his bookmarks, «[i]f I’m writing about something and I remember some details, I type them into Google and eventually track the page down.» In a review of the social bookmarking service Qitera, tech blog netzwertig even claims that «[Schlagwörter] sind nämlich im Prinzip nicht mehr als ein Ausgleich für den Mangel an intelligenten Suchmöglichkeiten», i.e. in principle, keywords are nothing but a compensation for the lack of possiblities for intelligent search.

What are the points at issue? Search engines ignore descriptors because of bias. Descriptors added by humans are biased – almost necessarily so. To give the gist of a text, you need to focus on some parts and omit others. In addition to this «immanent» bias, descriptors are often selected to influence search engine ranking. This leads to less reliable search results. Google first became famous because it developed algorithms to encounter exactly this problem: Google’s ranking was based on reliability. Links to a page were interpreted as an indication that a site provided valuable and trustworthy information. Google introduced taking the context of a site into account in its search algorithm. But descriptors, being metadata, i.e. data «outside» the data, lack context, or rather, their context is not taken into account for by search engines.

The second example, on the other hand, is that of a highly contextual search. Ingram remembers the item he is looking for («I remember some details»), can formulate a precise query and easily decide which is the document he has been looking for. With all this information given, typing keywords into Google is a highly efficient search strategy bound to retrieve excellent results – why bother to reduce the document to keywords? In addition to that, searching for keywords would most probably return too many documents in this case. The strength of descriptors is recall, not precision. Again, this particularly holds true for descriptors lacking context. Of course, this doesn’t apply to all descriptors, but human language tends to be ambiguous without context.

In order to deliver useful results for search, descriptors need to be re-contextualized. Basically, there are two ways of including context: adding information or reducing information to a well-defined scope. The first is what machines do well. Search engines can analyze vast amounts of information, find patterns, match items etc. Human beings usually prefer the second method: They reduce complexity by resorting to subjects with (more or less) clear outlines, and particularly to reliable environments. If the creator of a record is known and trusted, or if there is sufficient evidence that this is the case (e.g. by recommendation of a trusted person), then a record’s content can be taken to be trustworthy. The reduction of the data taken into account for a search to a trusted environment greatly improves precision. The price of diminished recall is negligable in this case. Of course, the bias mentioned previously can play a role, but through knowledge of its context, it can be put into perspective.

In the following posts, I’d like to look at some examples of the use of descriptors which take context into account, in delimited areas as well as by analysis of additional data. What I won’t be doing – though I did plan this initially – is to compare the effectiveness of full text search and search in metadata. I did find a few indications in library science literature – none of them conclusive – and I believe now that the comparison is not a legitimate one. Too much depends on what a user is looking for and which data he searches in, and how well he is familiar with particular sources. So the question is simply if descriptors can be powerful for findability, and under which circumstances.