FACTOID # 5: Minnesota and Connecticut are both in the top 5 in saving money and total tax burden per capita.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Referer spam

Referer spam is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referer url that points to the site the spammer wishes to advertise. Sites that publicize their access logs, including referer statistics, will then end up linking to the spammer's site, which will in turn be indexed by the search engines as they crawl the access logs. Spamdexing or search engine spamming is the practice of deliberately creating web pages which will be indexed by search engines in order to increase the chance of a website or page being placed close to the beginning of search engine results, or to influence the category to which the page... A KMail folder full of spam emails collected over a few days. ... Google search is the worlds most popular search engine. ... A website, Web site or WWW site (often shortened to just site) is a collection of webpages, that is, HTML/XHTML documents accessible via HTTP on the Internet; all publicly accessible websites in existence comprise the World Wide Web. ... Referer is a common misspelling of the word referrer; so common, in fact, that it made it into the official specification of HTTP - the communication protocol of the world wide web. ... A Uniform Resource Locator, URL (spelled out as an acronym, not pronounced as earl), or Web address, is a standardized address name layout for resources (such as documents or images) on the Internet (or elsewhere). ... A statistic (singular) is the result of applying a statistical algorithm to a set of data. ...


This benefits the spammer because of the free link, and also gives the spammer's site improved search engine placement due to link-counting algorithms that search engines use.


Some web sites receive so many referer spam hits that they amount to a denial of service attack on the server because there are not enough resources left on the server to handle legitimate traffic. A denial-of-service attack (also, DoS attack) is an attack on a computer system or network that causes a loss of service to users, typically the loss of network connectivity and services by consuming the bandwidth of the victim network or overloading the computational resources of the victim system. ...

Contents

Technical solutions

As with e-mail spam, web site operators who receive unwanted referer spam may respond using filtering and blocking. Wikipedia does not yet have an article with this exact name. ...


A simple solution to render this form of spamming ineffective is to prevent the search engine spiders from crawling the site logs by moving them to a non-public area such as a password-protected area, or configure the web statistics generator to use the rel=nofollow attribute on the referer links. Spam in blogs (also called simply blog spam or comment spam) is a form of spamdexing. ...


Word-based filtering

An example configuration fragment for filtering using the Apache server is as follows: The Apache HTTP Server is a web server for Unix-like systems, Microsoft Windows, Novell NetWare, Mac OS X and other operating systems. ...

 # Filter rules # The regexp can be refined to reduce false positives... # As many SetEnvIfNoCase directives can be used... SetEnvIfNoCase Referer "(hold-?em|poker|casino|hotel|loan|mortgage|payday|credit)" refspam SetEnvIfNoCase Referer "(viagra|cialis|penis|diet|porn)" refspam # Whitelists can be used, too... note the !refspam vs. refspam SetEnvIfNoCase Referer "white-listed_site.com" !refspam # Deny access to refspam... Deny from env=refspam # For cleaniness, we'll separate the logs CustomLog /var/log/apache/access.log combined env=!refspam CustomLog /var/log/apache/access_refspam.log combined env=refspam 

The "fake" web site hits will go to access_refspam.log, whereas normal traffic goes to access.log. The "SetEnvIfNoCase" lines contain Regular expressions (more specifically, Perl regular expressions) that can be used to match any undesirable traffic. In computing, a regular expression (abbreviated as regexp or regex, with plural forms regexps, regexes, or regexen) is a string that describes or matches a set of strings, according to certain syntax rules. ...


IP-based filtering

If most of the spam is coming from a few IP addresses, or is requesting a certain page (that may no longer exist on the server) the Apache server may also be configured to deny access via the configuration file, (often named httpd.conf), based on either IP address or the name of the requested file by adding lines like the following: This article or section does not adequately cite its references or sources. ... The Apache HTTP Server is a web server for Unix-like systems, Microsoft Windows, Novell NetWare, Mac OS X and other operating systems. ...

 # Deny access based on the filename or path of the requested file <Location /links> Deny from all </Location> # Deny access based on the IP address or host name of the offending site <Directory /usr/local/etc/httpd/htdocs> Deny from 72.36.244.166 Deny from big-bad-spammer-blah-blah.com </Directory> 

A good statistics analysis program will allow you to target the worst offenders.


Advanced filtering using mod_security

A third solution for Apache is to install ModSecurity, which allows you to deny requests based on any variable from the server environment, such as referer, request, IP, host, etc. The Apache HTTP Server is a web server for Unix-like systems, Microsoft Windows, Novell NetWare, Mac OS X and other operating systems. ...


See also

Spamdexing or search engine spamming is the practice of deliberately creating web pages which will be indexed by search engines in order to increase the chance of a website or page being placed close to the beginning of search engine results, or to influence the category to which the page... Spam in blogs (also called simply blog spam or comment spam) is a form of spamdexing. ...

External links

  • ReferrerCop - A searchable database of known referrer spam with tools for filtering referrer spam from Apache log files and AWStats/W3Perl data files.
  • A proposal on addressing referer spam - technical overview and ideas for combatting the practice
  • aStatSpam - Referrer-spam blacklist

  Results from FactBites:
 
Referer spam - Wikipedia, the free encyclopedia (166 words)
Referer spam is a kind of search engine -targeted spam.
Sites that publicize their referer statistics will then also link to the spammer's site.
As with email spam, web site operators who receive unwanted referer spam may respond using filtering.
SixDifferentWays: Referer Spam (1923 words)
Referer spam is not really that big a deal to me. But after declaring a 99% victory in the great comment spam war of 2004, it irked me a bit that my personal site and the personal sites I maintain for others were still being hijacked in this small way for commercial purposes.
Now referer spam is really, really stupid because they have to use certain keywords to try and induce clicks and assumably try and build up Googlerank (though Google won't actually index referer pages, some of the weblog indexing services do.) Thus, there is a very limited vocabulary and this is their main downfall.
Since almost all the spam comes via a link to one of these phantom "_" files, it may be possible to add them to your.htaccess blocking script, but I didn't want to go down that road because it could lead to server errors and may block access to legitimate files.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m