Stopping image leechers
Sat Feb 7th 2004, 12:28pm (updated Sat Feb 7th 2004, 10:14pm)
Last night I ran my Awstats report, as I usually do every few months when I get curious about the load on my server, what people were looking at, etc. To my surprise I discovered that my web site has suddenly become very active in the last few months, at least in terms of megabytes transferred. A little more investigation revealed why: a number of people have begun directly linking to images hosted on my server in their blog entries. Image leechers.

In particular, some of my Morrowind screenshots (not the thumbnails, the full size images) were showcased in one discussion comparing environmental detail in Knights of the Old Republic versus Morrowind. Glad I could help with that one, guys. But the best leecher was one guy was directly linking to one of my chimp images in his .sigs. My poor, exploited chimps!

Image leeching is bad for the leech-ees because it uses their bandwidth and gives no benefit to them. In the case of the Morrowind and chimp images, they're not copyrighted by me so credit is not the issue. The issue is simply about using my limited upstream bandwidth with no benefit to me whatsoever.

Fortunately, this is a fairly straightforward problem to solve. Apache has a number of directives that can eliminate this problem. There are good, easy to find articles out there on how to do it. I solved it as follows:

In my httpd.conf file, inside my VirtualHost tag:
    # leech prevention
    SetEnvIfNoCase Referer "^http://www81.kehlet.cx:81/" local_referal
    # Allow direct access and browsers that do not send Referer info
    SetEnvIf Referer "^$" local_referal
    <FilesMatch "\.([Gg][Ii][Ff]|[Jj][Pp][Ee]?[Gg]|[Pp][Nn][Gg])$">
       Order Deny,Allow
       Deny from all
       Allow from env=local_referal
    </FilesMatch>
The way this works is pretty clear: Apache sets the environment variable "local_referal" if the client sends a Referer header indicating the <img> tag came from a page off my site. This assumes most browsers send a Referer header, which is a pretty safe bet. I do allow direct access (e.g. bookmarks, emails) and older browsers because at the moment these aren't a problem. That FilesMatch line should catch any .gif, .jpg, or .png images. Access to images is then denied unless local_referal is set.

I also put the following in a .htaccess file in my ~kehlet/images directory, where most of my images are served up, just as an extra guard to protect all of the files in there (arguably overkill):
# leech prevention
SetEnvIfNoCase Referer "^http://www81.kehlet.cx:81/" local_referal
# Allow direct access and browsers that do not send Referer info
SetEnvIf Referer "^$" local_referal
<FilesMatch ".*">
   Order Deny,Allow
   Deny from all
   Allow from env=local_referal
</FilesMatch>
We'll see how well this helps.
Update: 2/7

Well, nothing's quite as easy as it seems. I didn't realize at first that this would block "legit leechers", like cached versions of my pages viewed through Google. So I can see this going two ways now: either I block out known offenders as they become a problem, or I continually add allowances as I realize them. Well, preferring the "default deny" approach, I'll let Google in this time. Here's an update to the logic (from httpd.conf):
    # leech prevention
    SetEnvIfNoCase Referer "^http://www81.kehlet.cx:81/" okay_referal
    SetEnvIfNoCase Referer "^http://[^\.]+\.google\....?/" okay_referal
    # Allow browsers that do not send Referer info
    SetEnvIf Referer "^$" okay_referal
    <FilesMatch "\.([Gg][Ii][Ff]|[Jj][Pp][Ee]?[Gg]|[Pp][Nn][Gg])$">
       Order Deny,Allow
       Deny from all
       Allow from env=okay_referal
    </FilesMatch>
That Google regex should match any site under google.com or google.xx, where xx is a two-letter top-level domain.
Update 2: 2/7

Crap!! Turns out some of Google's caches don't have a reverse DNS name, so I had to let that IP range in. Also, I just realized, people using web-based mail clients are going to get blocked too, since their Referer will be the page rendering their mail.

I guess this isn't going to work. I guess I should switch this around and only block those I know causing a problem. Then maybe I should look at hosting space somewhere :-(.