Jon Udell’s latest post on tagging as declarative programming reminded me of some notes I made last year for a (never written) paper on tagging:
I’m interested in looking at tagging practice outside of the information retrieval framework in which they’ve mostly been discussed. There are at least two other possible frameworks I can think of: service integration and content authoring.
Flexible service integration
Looked at from a service integration point of view, tags provide a flexible way to route information between different applications, whether these are two different kinds of applications used by the same person or two similar applications used by different people. So I may tag my blog entries a certain way so that they will show up in a certain place in someone else’s newsreader–and I can do this despite a lack of standards (beyond RSS and grokking keyword tags) among the toolmakers. Most of the tools built on the Flickr API, as well as many “mash-up” apps in general, rely on the lightweight semantic interoperability provided by tags.
In the past semantic interoperability has been hard to achieve unless one company controlled all the different applications–something that Microsoft came close to achieving, but lost control of with the rise of the WWW. The WWW made great strides in giving applications a common communication protocol (HTTP) and interface description language (HTML), but has struggled with finding common content description languages. First XML, then RDF were the intended answers to this, but they may be too rigid for their intended purpose. Tags may provide enough of the value of more structured schemes to be useful, without the overhead of a complex standardization process.
From the service integration perspective, sharing, notification, and system improvement are the primary goals (and possibly play/competition as well if we consider games as a specific kind of application or mode in which applications can be used). Action, insider reference, and possibly spam are the relevant tag types.
Lightweight content authoring
A more interesting (to me at least) alternative framework is tagging as a form of lightweight authoring. I’ve been struck by how taxonomies of tag types resemble taxonomies of link types from the early hypertext literature. Tags for URL-specified resources in systems such as del.icio.us are basically aggregate hyperlinks (links that associate a set of like documents) as described in the literature. If creating hypertext or hypermedia links is a form of authorship, it seems to be tagging (or certain kinds of tagging) should be as well.
Certainly the first steps in many kinds of authoring, from the creation of an annotated bibliography in preparation for a scholarly article to the separation of clips into bins for a video production, resemble the tagging and sorting enabled by tagging systems. The first blogs were linklogs, and now many people like me use linklogs as a form of lazy blogging. The unmediated blog uses del.icio.us to enable simple collaborative authoring, tagging can be used on Flickr to create slideshows, and del.icio.us via its media support turns tagging into a way to sequence audio and video.
From the content authoring perspective, attracting attention, reputation, identity performance, and opinion influencing are the primary goals, while opinion, relation, and possibly spam are the relevant tag types.
Back to information retrieval
So I would argue that the focus on information retrieval has pushed people to think of tags as primarily descriptive or categorical and tagging practice as primarily retrieval-oriented. This is certainly an important area, and you cover the issues quite well in your paper, I think. But even if tags turn out to be useless for search, they still could be of great value from other perspectives.
[Tangent: I really like Patrick Wilson’s notion of two kinds of power information retrieval systems can offer. The first is descriptive power: the power to obtain all texts which meet some description. The second is exploitative power: the power to obtain the best texts for achieving some end. Google gives us a lot of the former and very little of the latter. Could the analysis of tags used for service integration or authoring give systems a sense of how specific texts are being exploited by different individuals or communities, and thus offer users greater exploitative power?]
The problem with these ideas, I think, is that studies of why people tag show that this isn’t at all how people typically think of tagging (that is, the small minority who even think of tagging at all). Even in cases where people use tags for social communication, I don’t believe they see what they’re doing in terms of integration or authoring. That might change, of course, but I’m not willing to bet on it just yet.