Search Engines: what is a date?
Scouting around E-LIS and found a great article, Lewandowski, Dirk (2004) Date restricted queries in web search engines.
Seems search engines have problems with defining the date, and updated content on a page, making date restricted searches not really effective at all. Sometimes the content may have an updated date that refers to other updates on the webpage other than content, eg. layout.
These are 2 notions here:
- defining the date of a document
- defining an update of contents in a document
4 methods:
- server date (updates file, not specifically contents)
- first indexed/crawl (documents may be older than search engine itself, this ignores any sort of update)
- date metadata (ignored by most search engines)
- explicitly labelled in the content (not specific to content)
It turns out most search engines use the server date or date the document was first crawled.
Read the report for the analysis and results.
Also see It’s Tough to Get a Good Date with a Search Engine
Another way of checking a date of the website you are on is from Phil Bradley’s - Utilities to help search the Internet the easy way.
From the website:
“Here’s a neat little trick to use in order to find out when a page was last updated. Go to the page that you’re interested in and then, in the Address bar, type the following: javascript:alert(document.lastModified) and that’ll pop up a little window which tells you.”
But as it says, this is “last modified”, this does not specifically refer to the content.
I wonder if RSS can help at all!
(I suppose if every document on the web was in XML, we could define if the modification/update was in the content or presentation or structure…don’t know if this is correct, this is my assumption).
Also check out the references in the E-LIS page, each citation has a “SEEK” button, which launches Paracite (a citation finder)
See the about page to see it’s appropriate copy method.













