Recontextualization of metadata – part 1: keywords & classifications

In the last post, I pondered the question if descriptors can be powerful for findability. I’ll avoid defining what findability is and discuss a few examples instead.

Example 1: Libraries

Descriptors are heavily used in libraries, but do they improve findability in practice? A search in two different university library catalogues (Basle and Zurich) for the interdisciplinary subject of the history of chemistry in Switzerland returned but two matches for about 30 titles in each library. It is noteworthy that both libraries seem to possess the same books (I did a series of cross-checks) but deliver different search results for the same query.

The search was conducted overall with the keywords of «geschichte schweiz chemie» (history, Switzerland, chemistry). The screenshots of the keywords for an identical book (which I found only in one library with the keywords mentioned) show why the results are so different:

Keywords for Tobias Straumann’s «Die Schöpfung im Reagenzglas» in Basle’s university library catalogue

Keywords for Tobias Straumann’s «Die Schöpfung im Reagenzglas» in Basle’s university library catalogue

Keywords for the same title in Zurich’s university library catalogue

Keywords for the same title in Zurich’s university library catalogue

The keywords were helpful to find relevant works in both libraries, because the titles in their majority are totally useless for a subject search. Until recently, no Swiss library included any contextual information such as summaries or tables of contents into their search. However, the quality of the results of the search are dubious. The difficulties are very much the same as for search in unstructured data. The keywords are either too broad or too narrow, the controlled vocabularies are not flexible enough to account for all possible variations, and the human factor leads to inconsistent use of descriptors. Furthermore, it is impossible to reconstruct the reason for the divergence of the results.

Example 2: Classifications of Swiss Legislation

To make periodically published and revised legislation accessible by subject, it is filed in a classification. Even though in Switzerland the federal, cantonal (state) and communal jurisdictions are sovereign, the classifications of legislation are similar in large parts of the country and over all levels of state. The classifications were developed long before the Web and have been adapted since. Primary access is by browsing alphabetically or by subject.

Alphabetical index of the Compilation of Federal Legislation

Alphabetical index of the Compilation of Federal Legislation

The alphabetical index is a highly elaborated system of cross reference. It helps with disambiguation and points to results in less obvious categories.

Classification of the Swiss Confederation (corresponding excerpt)

Classification of the Swiss Confederation (excerpt)Classification of the canton of Basel-Stadt (corresponding excerpt)

Access by subjects gives the user an idea of the scope of the collection. The hierarchical approach allows him or her to quickly drill down to the relevant subject.

To find the corresponding legislation over different levels of jurisdiction, a human user can easily find the resemblances between similar classifications. For automatic aggregation, this is more of an obstacle. Therefore, the Swiss organization for e-government standards,, has published classifications for e-government subjects (in German).

Rather surprisingly, the standard is not used consistently in the cantons cited as best practice examples (e.g. compare the cantons of Aargau and Basel-Stadt), and the synonyms and related terms recommended for the improvement of findability are not being used.

eCH standard

eCH standard

Version of Canton Aargau

Version of Canton Aargau


Keywords improve findability when there is no other information available, e.g. in library catalogues without digitized content. However, keywords are only reliable when the mechanism of attributing them is transparent. «Intellectual control» – the term archivists use for creating tools to access their records – is not only important for those who create the tools, but just as well for those who use them in order to gain access. Lack of transparency means loss of control, and even when laid open, systems of cross-reference can be confusing for the user, time-consuming to maintain and error-prone.

The other example of the classification of legislation shows that transparency can at least partly compensate for lack of control. This should not be an excuse for ignoring existing standards which are primarily created for the sake of interoperability and to facilitate maintenance. But the examples do show that from a certain level of detail, control gets out of hand. The objects described often have too many facets to be consistently described – or to put it the other way round, it is difficult to create comprehensive classification systems. Thus, at a deeper level, classifications can easily shift from an asset to a risk, because results tend to become unreliable. In the digital world, I believe, it makes more sense to abandon descriptors at that level and to retreat to methods of natural language processing.


  1. 1 Diego Haettenschwiler April 16, 2009 at 9:53

    Metadaten: Hilfsmittel zur Navigation im Informationsdschungel

    Wir sind heute im Informationsdschungel ähnlich verloren wie jemand ohne Orientierung in einem unbekannten Land. Die meisten Leute googlen einfach einmal drauflos und finden meist irgendetwas – und merken dabei nicht, was alles sie an besseren Resultaten nicht finden.

    Orientierung erhalten kann man auf verschiedenen Wegen: man kauft z.B. ein GPS-Gerät und folgt dessen Anweisungen. Oder man lernt Karten lesen und verwendet je nach Situation unterschiedliche Karten. Übertragen auf den Informationsdschungel entspricht die GPS-Lösung möglichst automatisierten Suchsystemen, welche die Intention der Suchenden umzusetzen versuchen. Das Karten-lesen-lernen entspricht einem Meta-Lernen indem man zuerst verschiedene Techniken lernt. Das Weltwissen ist ganz unterschiedlich strukturiert, so wie es auch ganz verschiedene Karten gibt. Je nach Situation ist die Anwendung einer bestimmten Karte sinnvoll. Das gleiche gilt für Metadaten. Sie sind nicht universell (auch wenn das schön wäre). Man muss die jeweilige Struktur, Darstellung, Sprache kennenlernen bevor man loslegt. Je häufiger man eine Datensammlung nutzt und je wichtiger gute Resultate sind, desto intensiver sollte man sich mit der jeweiliegen Erschliessung befassen. Belohnt wird man mit einer präziseren Ausbeute, wobei man sich gleichzeitig bewusst sein muss, kein perfektes Resultat zu erhalten.

    Und nun zum Beispiel der Bibliotheken: Es stimmt, dass viele Bibliotheken in der Schweiz meist eigene Sacherschliessungssysteme verwenden. So kann man leider nicht mit den gleichen Metadaten über alle Katalog suchen, sondern muss bei Interesse mehrere Suchen durchführen. Aber aus Basler Sicht ist das Schlagwort Basel nützlicher als der allgemeinere Begriff Schweiz, wenn es um die Geschichte der Basler Chemischen geht. Für solche Fälle kann eine hierachische Suche sehr nützlich sein: wenn mit einem Oberbegriff gleichzeitig auch seine Unterbegriffe mitgesucht werden können.
    Sicher entscheiden Menschen bei der Sacherschliessung nicht immer gleich. Ich frage mich aber, was eine Maschine bei einem Titel wie “Die Schöpfung im Reagenzglas” machen würde…
    Mir scheint, dass es für Bibliotheken und viele andere Datenbanken (noch) keine Alternativen zur intellektuellen Sacherschliessung gibt. Das Textmaterial in den Buchtiteln ist viel zu unklar und kurz um maschinell gut verarbeitet werden zu können.

    • 2 Andrea April 17, 2009 at 14:18

      In his comment, Diego argues in favor of metadata because they add structure to information. However, not all information is structured in the same way, or in Diego’s words: «[Metadata] are not universal (even if this were desirable).»
      While I have lamented that metadata work well only in well-defined areas, Diego on the contrary emphasizes their power to delve deeper into a subject. «The more often one uses a collection of data and the more important the results are, the more intensely one should occupy oneself with their structure. The compensation is a yield with higher precision, even if one always has to be aware that there is no such thing as a perfect result.»
      I totally agree. Metadata are great for information seeking. But their context-sensitivity limits their usefulness for aggregation. I am not judging this – aggregation of heterogeneous or ambiguous information does not make any sense. However, I am not sure what to think of recent technological developments in the field of the Semantic Web, as I believe my last few posts have demonstrated. It is very well possible that they might change the way we edit data, e.g. by focussing less on creating structures with several levels of hierarchies and instead using our energy to explain the context they are situated in more carefully.

