Auto folksonomy for the blogosphere
Data Mining points to the latest papers from the 3rd International Workshop on the Weblogging Ecosystem, one that has caught my interest so far is Browsing System for Weblog Articles based on Automated Folksonomy.
The premise is comparing a folksonomy to user only tagging, and how multiple points of view in a folksonomy can emerge a more precise tag (describe the aboutness) of an item…see my post What qualifies a folksonomy?
NOTE: a folksonomy allows people to tag the same item, allowing a vocabulary to emerge.
So, Technorati Tags is an author based tagging system, only the authors don’t log in, they don’t even submit the items…Technorati crawls the blogosphere and adds the items itself.
Then we can browse blog posts by author tags…so the issue lies that these blog posts have been tagged/described by one person (the author), whereas in a folksonomy lots of people can describe (tag) the same item, this inturn propogates a more precise tag for an item (kind of by consensus).
Instead of collecting author tags imagine if there was a system where the latest blog posts from a feed set streamed in, eg. FeedButler, and users could save/vote and tag these blog posts…this would be like Technorati Tags defined by users not authors…according to a folksonomy the tags for an item applied by a collective mind will be of higher accuracy…perhaps.
I suppose del.icio.us works this way accept the stream of items is not generated by a feedset, instead people submit these items, this may make del.icio.us more quality controlled (but not as exhaustive)…also del.icio.us content is not limited to just blog posts.
Actually TailRank have a stream of items, these items are based on a feed set which is a massive colelction of peoples OPML’s, or just people submitting feeds to the feedset.
Then as these items stream, users can tag them…hmmm, sounds close to what I’m talking about.
Anyway, the paper in question goes a step further and generates an automated folksonomy…this is different than AutoTag, tagyu, TagSuggest, tagthe.net, etc…as these scan the contents and decide on a machine generated tag (ZoomClouds, TagCloud, and Personal Bee also do this in a different context).
What the paper suggests is that instead of a machine deciding the tag names, it instead collects all the author tags, then scans the contents of a blog post and decides which author tag/s suit best for each item…so the machine does not decide the tag names it just applies them to items.
So humans decides the vocabulary, but the machine does the indexing for each item…imagine a machine scanning a book and then applying a LOC or DDC term.
[ADDED 02/06/06: Tagyu does not do content analysis, it searches for similar chuncks of text to the bookmark in question, and then offers the tags used from these similar chunks of text for your bookmark]
[ADDED 22/12/06: Turbo Tagger - tag generator for blog posts]
View comment reactions















Tagyu doessn’t pull tags from the contents of a text. Instead, Tagyu uses the collective intelligence of the tagging public to determine tags for a document. Tagyu indexes the web, finding tagged documents. When you ask Tagyu for tags for your text, Tagyu finds similar text that’s already been tagged by humans and uses those tags as the basis for its suggestions to you.
The concept presented in the paper you link to is similar, except that they’re using automatically generated tags for navigation and indexing of the web. They also remove the user-supplied tags from any documents they encounter, relying exclusively on the auto tagger.
Comment by Adam Kalsey — June 1, 2006 @ 5:35 pm