Library clips

sharing ideas thoughts and feedback

August 9, 2005

Blogs/RSS Engines: keyword search comparison

Napsterization has a follow up post on comparing keyword searching on RSS/Blog Engines.

Here is the comparison chart.

Main difference between the previous analysis on URL lookups is that people have a different expectation out of keyword searching, and because it is the live web, results are shown by date rather than relevance (eg. Google).

But with so many results, duplication, and non-blog content, how can we create some sort of personal relevance…we need date limits, limits to top % of blogs (a top blog is according to their number of incoming links-although this doesn’t mean it is more relevant than an unknown blog…PubSub segments the blogosphere by popular blogs, enabling you to filter or personalise your result set…also Feedster limits to blog hosts, etc..)

Duplication of URL’s and titles seems an easier enough problem to clean (I think!), but what about stories that are re-syndicated (many people are publishing blog aggregators-like a public version of an RSS Reader, where they take several RSS feeds and re-publish the content like a mega-blog.
This means the exact same content is being indexed by the Blog/RSS engines…the permalink should be the same (otherwise there would be a copyright issue) but the same permalink will be indexed multiple times (due to being re-published in multiple places).

Google News has a good way to alleviate duplicates, and older content that is related to the current content, by collapsing the stories under the current story

Blogpulse kind of do something similar with their conversation seed feature, this is based on a URL or on a keyword.

Also to note in comparing these engines are some are exclusive to RSS blog content and some incorporate RSS from any type of format.

Here are some experts from the blog post:

“…Does the user want a quick taste of what is out there around a particular topic? Or do they want every instance of a keyword match, with an accurate count of those instances? Do they want to see only the most relevant posts that use the keywords, or the most recent?….

…Technorati has redone keyword search, removed the 7 day result limit…results only go back to last October….

…Technorati has reduced duplication over the past 10 months, bringing it more in line with Blogpulse’s cleaner result set…

…Technorati is now faster than Blogpulse…”

SAMPLE SEARCH

Blogpulse

Phenomenally only returned 3 duplicates

Only blog posts (not mixed with RSS from traditional news), so easier to analyse just the blogosphere from the result set (well there is no distinguishing to be made, as they are all blog posts)

All blog titles in the result set correctly linked to the permalinks

Missing some posts (but aren’t they all)

Cleaner and easier to find an initial meme (the conversation seed)

Technorati

Hard to compare as the search is forced to create a phrase search, therefore result set will be smaller and more succinct in comparison

Lots of duplication

Not all results link to the permalink, some just link to the front page

Harder to find an initial meme (the conversation seed)

Feedster

On this sample search almost ¾ of the results was non-blog content (traditional news, del.icio.us, etc…)…hard to distinguish the view of just the blogosphere

Large result set as Feedster allows any instances of the search term, in any order

Not all results link to the permalink

Bloglines

Numerous duplicates

¼ of the results were non-blog content, making it hard to distinguish the view of just the blogosphere

Large result set as Bloglines allows any instances of the search term, in any order

Not all results link to the permalink

That’s all

I wonder how IceRocket would perform.

It would be good for the post to sum up when they’d use a particular engine over another one…I guess I’m being lazy.

Even though Blogpulse didn’t cover every post in the blogosphere, it seems the easiest and cleanest to use in analysing the blogosphere.

Although for exhaustivity it seems that Technorati (beginning October 2004) and Bloglines (for a blog to be included it has to be registered by a Bloglines account RSS reader) indexes posts over a long period of time compared to Feedster and Blogpulse (only latest 6 months).

Since Blogpulse seemed the best overall, this seems a great tool for current stuff, to get a more historic picture both Technorati and Bloglines are the go (just a harder time going through the results, connecting to the actual posts, and summing up the trail and spread of a meme).

1 Comment »

The URI to TrackBack this entry is: http://libraryclips.blogsome.com/2005/08/09/blogsrss-engines-keyword-search-comparison/trackback/

  1. for feedster, try: http://blogs.feedster.com/ to limit to just blog content. maybe you did this already?

    Comment by Christina Pikas — August 28, 2005 @ 3:41 am

RSS feed for comments on this post.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>



Anti-spam measure: please retype the above text into the box provided.

Please note that comments are moderated and will                  not therefore appear immediately.
                    Please do not repost.


Library clips
Library clips Subscribe by Email                                                    

Get free blog up and running in minutes with Blogsome | Theme designs available here

Related Posts Plugin for WordPress, Blogger...