Archive for July, 2008

Visualization of text analysis

The Adaptive Path blog recently featured a tool for visualizing text (i.e. word count). Wordle displays text as word clouds, e.g. for this blog:

Information Access blog in Wordle word cloud

Information Access blog in Wordle word cloud

It’s not great text analysis (e.g. it does not understanding different words as variations of the same root), but it’s good fun.


Buildings and tools

Seth Godin did a nice gloss on the secondary importance of tools for (information) architecture: I need to build a house, what kind of hammer should I buy?

Search Engine Tips

Hakia, a semantic search engine, uses Yahoo’s BOSS and does a good job at neatly clustering results (e.g. try a search for a country) and also has a reasonable ranking – something many new search engines don’t manage well. By contrast, the much-hyped Powerset, another recent semantic search engine, is simply disappointing.

Cluuz aims at graphically clustering search results. So far, I have been sceptical towards visualizations of search results, because the mind-map kind of graphs (e.g. in Thinkmap or Quintura) are either rather trivial or useless. But Cluuz (or similarly, KartOO), cluster their sources rather than semantic characteristics. Not that this is entirely new – Clusty (previously Vivisimo; I’ve used the example in the post on information literacy vs usability) does it in form of a regular list – but the focus on sources is one I much appreciate, and showing relationships between sources might prove promising.

A last tip for today: While searchme‘s sequential presentation of search results is too playful for my liking, because I need a quick overview to make my choice what I want to take a closer look at. However a really nice feature is the integration of stacks which allows you to create visual bookmarks – something completely different, because it is not about seeking new, but rather showing known objects (cf Theresaneil’s extensive post on seek or show).

Introductions to the Semantic Web

The idea of the Semantic Web has been around for a long time. Tim Berners-Lee articulated the idea in his plenary talk of the first W3 conference in 1994 and published a famous article in Scientific American in 2001. Basically, the idea of the Semantic Web is to give data a format in which computers can process their meaning. However, the idea hasn’t really picked up speed in all those years. This has changed recently, and a series of articles (incidentally published on netzwertig, a blog produced by Zeix’ sister company Blogwerk) explains the nuts and bolts of the Semantic Web in plain German. Part 1 gives a general introduction (Semantisches Web Teil 1: Was steckt hinter dem Begriff?), part 2 explains the technical background (Die technische Umsetzung) and part 3 gives practical examples (konkrete Anwendungsbeispiele).

I’ve been looking for English equivalents to these articles. The most similar I’ve come across so far is Read Write Web’s The Road to the Semantic Web which explains why the Semantic Web could be important to us («The promise is that we will be doing less of what we are doing now – namely sifting through piles of irrelevant information») next to giving a a short introduction to the data formats used for computer processing. Two later articles on the same blog go into more detail why the implementation of the Semantic Web is proving so difficult. Not only is the technical background hard to understand (Difficulties with the Classic Approach), but transforming the data into a computer readable format is a lot of work, and so far, the reward of the market for taking pains to do this has not been given (Top-Down: A New Approach to the Semantic Web).

The many comments to «Difficulties with the Classic Approach» show that most people attribute the lack of success of the idea of the Semantic Web to the difficulties of its practical implementation: «It’s way too technical and scientific and not really practical for the mere mortal». Albeit its capital S, the subject of semantics plays a minor role in the discussion. And here, in my opinion, lies the crux of the matter: For highly structured and standardized data as for addresses, people, books etc., the corresponding metadata are simple to generate from the semantic point of view, and accordingly, these are the areas in which commercial tools are evolving. For the rest of the contents of the web, the idea of mapping ontologies against each other is daunting at best. Comment No. 17 to the post mentioned above gives some practical examples of the difficulties encountered even with structured data. And another trend on the web, tagging, exactly takes the fuzziness of meaning into account, particularly stressing the importance of connotations, i.e. emotional associations with words, for human beings. I hope to deal with that subject on this blog soon.