Accentuating the Positive in Metadata and Folksonomies
I just wrote a very short, introductory piece on folksonomies for Capulet’s newsletter (which, if you’re so inclined, you can sign up for here). You can read the full spiel in the newsletter or refer to something more authoritative in Wikipedia. Here’s an excerpt of what I wrote:
Originally coined by Thomas Vander Wal, folksonomies are a blending of the terms ‘folk’ and ‘taxonomy’. Where a taxonomy is a rigid, top-down organizational structure, a folksonomy is an improvised, bottom-up approach to classification.
I was recently thinking Flickr tags, a very common example of a folksonomy. I recently read an interview with Flickr’s Stewart Butterfield, in which he describes the attraction of tags for the average user: [more]
The complaint that [tagging is] uncontrolled and it’s not going to be captured in a consistent way to me is really irrelevant. Because tags are first and foremost for people to organize their own photos–and if they weren’t, it wouldn’t work. It’s a happy accident that the whole global collection emerges. And let’s say it’s only 50 percent accurate and complete and let’s say right now we have 10,000 photos tagged “Italy;” it might actually be 20,000 photos that should have been tagged “Italy,” but who cares? No one is going to look at all 10,000 photos, let alone 20,000 photos. And in six months, it will be 50,000 photos instead of 100,000 photos.
Then I thought about the five-star rating system in iTunes, which is a similar kind of metadata. I don’t share it (but I no doubt could), and others can’t modify it, but it’s useful to me in segregating great songs from good ones, and good ones from lousy ones. In my largish iTunes song library, I sorted my songs by rating and paged through them.
That’s when it occurred to me: my metadata skews toward the positive. Check out this chart, which shows the ratings for songs in my library (I’ve only actually assigned a rating to a tenth of my collection):

As you can see, I’ve rated nearly twice as many songs as above-average than below-average. Under normal conditions, shouldn’t those numbers be roughly even?
The same is true of my Flickr photos. When I upload an unusual or interesting image, I tag the hell out of it. If it’s something ordinary, my number of tags (and effort spent thinking about them) decreases.
To return to Stewart’s example, say 2000 people each upload 10 photos of Italy. They each put more effort into tagging their best one. We’re therefore likelier to see the best 2000 photo, and disregard the rest.
What’s the conclusion? Maybe folksonomies and metadata are self-filtering. If everyone spends more time and effort describing good things than bad ones, will we end up consuming fewer bad things? Is there anything wrong with that?
