[ Home Page | Search report ]
For the curious, a brief explanation of how this process works.
The HTTP standard specifies that when going from one page to another, a browser may transmit to the second server the URL of the first page, as described in this extract from RFC2068:
14.37 Referer The Referer[sic] request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained (the "referrer", although the header field is misspelled.) The Referer request-header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard. Referer = "Referer" ":" ( absoluteURI | relativeURI ) Example: Referer: http://www.w3.org/hypertext/DataSources/Overview.html If the field value is a partial URI, it SHOULD be interpreted relative to the Request-URI. The URI MUST NOT include a fragment. Note: Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly/anonymously, which would respectively enable/disable the sending of Referer and From information.
(Note that the mis-spelling `Referer' is now enshrined for ever more in the Standard, which is the sort of risk you run if you let physicists design network protocols.)
Now, this means that if you click through from a search engine to one of my pages, the Referer: header will give the address of the search engine-generated page which got them there. For instance, if I search for `vmail-sql' with Google, my broswer will end up at the page with URL http://www.google.com/search?q=vmail-sql; the bit after the `search?q=' is the argument to the search engine. If I then click on the link to the vmail-sql home page, your browser will send a request something like:
GET /~chris/vmail-sql/ HTTP/1.1 Host: www.ex-parrot.com User-Agent: Nutscrape 3.141 (CPM; 8-bit) Referer: http://www.google.com/search?q=vmail-sql
This gets logged on my server, in a line (folded for readability) like:
12.34.56.78 - - [20/Aug/2000:22:25:24 +0100] "GET /%7Echris/vmail-sql/ HTTP/1.1" 200 3595 "http://www.google.com/search?q=vmail-sql" "Nutscrape 3.141 (CPM; 8-bit)"
By writing a simple program to read the Referer: headers, I can figure out what people are searching for to get to my pages. The program is search-report; feel free to try it on your own site's logs. (Note that it expects the logs to be in the format which Apache calls `Combined'. Your server may log the Referer: as a different field, or not at all.)
No. The search engines are not that stupid (I hope).... Also, each report is compiled from only a few weeks' worth of logs.
The search engines do index the report, and it gets direct traffic. How depressing. I've added a noindex tag to the page to stop this; otherwise, one of the search engine operators is bound to become angered.
After wondering what `Yahoo Chat boot code' is -- lots of people seem to search for this term, and some of them find their way to my pages -- I asked if anyone reading my page could enlighten me. Eventually, one of them did. The answer is, apparently,
If you've got Yahoo messenger (a chat program you can download at Yahoo.com), you can also go to a lot of different Yahoo Chatrooms (works more or less like mIRC and ICQ) -- in these rooms, you will also find people who's got scripts/codes that will disconnect you ( "boot" you) off the chatrooms, and sometimes also disconnect your Yahoo messenger all together. In order to "boot" someone off, you need either a script, a program or codes (I'm not sure which) -- and in order to secure yourself from BEING booted, you also need the same thing.
-- thanks to Gudveig Rian for that.
Copyright (c) 2000 Chris Lightfoot. All rights reserved.