Anti-spam techniques - Wikiwand
There are now a large number of applications, appliances, services, and software systems that email administrators can use to reduce the load of spam on their systems and mailboxes. In general these attempt to reject (or "block"), the majority of spam email outright at the SMTP connection stage. If they do accept a message, they will typically then analyze the content further – and may decide to "quarantine" any categorised as spam.
Authentication
A number of systems have been developed that allow domain name owners to identify email as authorized. Many of these systems use the DNS to list sites authorized to send email on their behalf. After many other proposals, SPF, DKIM and DMARC are all now widely supported with growing adoption.[10][11][12] While not directly attacking spam, these systems make it much harder to spoof addresses, a common technique of spammers - but also used in phishing, and other types of fraud via email.
Challenge/response systems
A method which may be used by internet service providers, by specialized services or enterprises to combat spam is to require unknown senders to pass various tests before their messages are delivered. These strategies are termed "challenge/response systems".
Checksum-based filtering
Checksum-based filter exploits the fact that the messages are sent in bulk, that is that they will be identical with small variations. Checksum-based filters strip out everything that might vary between messages, reduce what remains to a checksum, and look that checksum up in a database such as the Distributed Checksum Clearinghouse which collects the checksums of messages that email recipients consider to be spam (some people have a button on their email client which they can click to nominate a message as being spam); if the checksum is in the database, the message is likely to be spam. To avoid being detected in this way, spammers will sometimes insert unique invisible gibberish known as hashbusters into the middle of each of their messages, to make each message have a unique checksum.
Country-based filtering
Some email servers expect to never communicate with particular countries from which they receive a great deal of spam. Therefore, they use country-based filtering – a technique that blocks email from certain countries. This technique is based on country of origin determined by the sender's IP address rather than any trait of the sender.
DNS-based blacklists
There are large number of free and commercial DNS-based Blacklists, or DNSBLs which allow a mail server to quickly look up the IP of an incoming mail connection - and reject it if it is listed there. Administrators can choose from scores of DNSBLs, each of which reflects different policies: some list sites known to emit spam; others list open mail relays or proxies; others list ISPs known to support spam.
URL filtering
Most spam/phishing messages contain an URL that they entice victims into clicking on. Thus, a popular technique since the early 2000s consists of extracting URLs from messages and looking them up in databases such as Spamhaus' Domain Block List (DBL), SURBL, and URIBL.[13]
Strict enforcement of RFC standards
Many spammers use poorly written software or are unable to comply with the standards because they do not have legitimate control of the computer they are using to send spam (zombie computer). By setting tighter limits on the deviation from RFC standards that the MTA will accept, a mail administrator can reduce spam significantly - but this also runs the risk of rejecting mail from older or poorly written or configured servers.
Greeting delay – A sending server is required to wait until it has received the SMTP greeting banner before it sends any data. A deliberate pause can be introduced by receiving servers to allow them to detect and deny any spam-sending applications that do not wait to receive this banner.
Temporary rejection – The greylisting technique is built on the fact that the SMTP protocol allows for temporary rejection of incoming messages. Greylisting temporarily rejects all messages from unknown senders or mail servers – using the standard 4xx error codes.[14] All compliant MTAs will proceed to retry delivery later, but many spammers and spambots will not. The downside is that all legitimate messages from first-time senders will experience a delay in delivery.
HELO/EHLO checking – RFC 5321 says that an SMTP server "MAY verify that the domain name argument in the EHLO command actually corresponds to the IP address of the client. However, if the verification fails, the server MUST NOT refuse to accept a message on that basis." Systems can, however, be configured to
- Refuse connections from hosts that give an invalid HELO – for example, a HELO that is not an FQDN or is an IP address not surrounded by square brackets.
- Refusing connections from hosts that give an obviously fraudulent HELO
- Refusing to accept email whose HELO/EHLO argument does not resolve in DNS
Invalid pipelining – Several SMTP commands are allowed to be placed in one network packet and "pipelined". For example, if an email is sent with a CC: header, several SMTP "RCPT TO" commands might be placed in a single packet instead of one packet per "RCPT TO" command. The SMTP protocol, however, requires that errors be checked and everything is synchronized at certain points. Many spammers will send everything in a single packet since they do not care about errors and it is more efficient. Some MTAs will detect this invalid pipelining and reject email sent this way.
Nolisting – The email servers for any given domain are specified in a prioritized list, via the MX records. The nolisting technique is simply the adding of an MX record pointing to a non-existent server as the "primary" (i.e. that with the lowest preference value) – which means that an initial mail contact will always fail. Many spam sources do not retry on failure, so the spammer will move on to the next victim; legitimate email servers should retry the next higher numbered MX, and normal email will be delivered with only a brief delay.
Quit detection – An SMTP connection should always be closed with a QUIT command. Many spammers skip this step because their spam has already been sent and taking the time to properly close the connection takes time and bandwidth. Some MTAs are capable of detecting whether or not the connection is closed correctly and use this as a measure of how trustworthy the other system is.
Honeypots
Another approach is simply creating an imitation MTA that gives the appearance of being an open mail relay, or an imitation TCP/IP proxy server that gives the appearance of being an open proxy. Spammers who probe systems for open relays and proxies will find such a host and attempt to send mail through it, wasting their time and resources, and potentially, revealing information about themselves and the origin of the spam they are sending to the entity that operates the honeypot. Such a system may simply discard the spam attempts, submit them to DNSBLs, or store them for analysis by the entity operating the honeypot that may enable identification of the spammer for blocking.
Hybrid filtering
SpamAssassin, Policyd-weight and others use some or all of the various tests for spam, and assign a numerical score to each test. Each message is scanned for these patterns, and the applicable scores tallied up. If the total is above a fixed value, the message is rejected or flagged as spam. By ensuring that no single spam test by itself can flag a message as spam, the false positive rate can be greatly reduced.
Outbound spam protection
Outbound spam protection involves scanning email traffic as it exits a network, identifying spam messages and then taking an action such as blocking the message or shutting off the source of the traffic. While the primary impact of spam is on spam recipients, sending networks also experience financial costs, such as wasted bandwidth, and the risk of having their IP addresses blocked by receiving networks.
Outbound spam protection not only stops spam, but also lets system administrators track down spam sources on their network and remediate them – for example, clearing malware from machines which have become infected with a virus or are participating in a botnet.
PTR/reverse DNS checks
The PTR DNS records in the reverse DNS can be used for a number of things, including:
- Most email mail transfer agents (mail servers) use a forward-confirmed reverse DNS (FCrDNS) verification and if there is a valid domain name, put it into the "Received:" trace header field.
- Some email mail transfer agents will perform FCrDNS verification on the domain name given in the SMTP HELO and EHLO commands. See #Strict enforcement of RFC standards § HELO/EHLO .
- To check the domain names in the rDNS to see if they are likely from dial-up users, dynamically assigned addresses, or home-based broadband customers. Since the vast majority of email that originates from these computers is spam, many mail servers also refuse email with missing or "generic" rDNS names.[15]
- A Forward Confirmed reverse DNS verification can create a form of authentication that there is a valid relationship between the owner of a domain name and the owner of the network that has been given an IP address. While reliant on the DNS infrastructure, which has known vulnerabilities, this authentication is strong enough that it can be used for whitelisting purposes because spammers and phishers cannot usually bypass this verification when they use zombie computers to forge the domains.
Rule-based filtering
Content filtering techniques rely on the specification of lists of words or regular expressions disallowed in mail messages. Thus, if a site receives spam advertising "herbal Viagra", the administrator might place this phrase in the filter configuration. The mail server would then reject any message containing the phrase.
Header filtering looks at the header of the email which contains information about the origin, destination and content of the message. Although spammers will often spoof fields in the header in order to hide their identity, or to try to make the email look more legitimate than it is, many of these spoofing methods can be detected, and any violation of, e.g., RFC 5322, 7208, standards on how the header is to be formed can also serve as a basis for rejecting the message.
SMTP callback verification
Since a large percentage of spam has forged and invalid sender ("from") addresses, some spam can be detected by checking that this "from" address is valid. A mail server can try to verify the sender address by making an SMTP connection back to the mail exchanger for the address, as if it were creating a bounce, but stopping just before any email is sent.
Callback verification has various drawbacks: (1) Since nearly all spam has forged return addresses, nearly all callbacks are to innocent third party mail servers that are unrelated to the spam; (2) When the spammer uses a trap address as his sender's address. If the receiving MTA tries to make the callback using the trap address in a MAIL FROM command, the receiving MTA's IP address will be blacklisted; (3) Finally, the standard VRFY and EXPN commands[16] used to verify an address have been so exploited by spammers that few mail administrators enable them, leaving the receiving SMTP server no effective way to validate the sender's email address.[17]
SMTP proxy
SMTP proxies allow combating spam in real time, combining sender's behavior controls, providing legitimate users immediate feedback, eliminating a need for quarantine.
Spamtrapping
Spamtrapping is the seeding of an email address so that spammers can find it, but normal users can not. If the email address is used then the sender must be a spammer and they are black listed.
As an example, if the email address "spamtrap@example.org" is placed in the source HTML of a web site in a way that it isn't displayed on the web page, human visitors to the website would not see it. Spammers, on the other hand, use web page scrapers and bots to harvest email addresses from HTML source code - so they would find this address. When the spammer later sends to the address the spamtrap knows this is highly likely to be a spammer and can take appropriate action.
Statistical content filtering
Statistical, or Bayesian, filtering once set up requires no administrative maintenance per se: instead, users mark messages as spam or nonspam and the filtering software learns from these judgements. Thus, it is matched to the end user's needs, and as long as users consistently mark/tag the emails, can respond quickly to changes in spam content. Statistical filters typically also look at message headers, considering not just the content but also peculiarities of the transport mechanism of the email.
Software programs that implement statistical filtering include Bogofilter, DSPAM, SpamBayes, ASSP, CRM114, the email programs Mozilla and Mozilla Thunderbird, Mailwasher, and later revisions of SpamAssassin.
Tarpits
A tarpit is any server software which intentionally responds extremely slowly to client commands. By running a tarpit which treats acceptable mail normally and known spam slowly or which appears to be an open mail relay, a site can slow down the rate at which spammers can inject messages into the mail facility. Depending on the server and internet speed, a tarpit can slow an attack by a factor of around 500.[18] Many systems will simply disconnect if the server doesn't respond quickly, which will eliminate the spam. However, a few legitimate email systems will also not deal correctly with these delays. The fundamental idea is to slow the attack so that the perpetrator has to waste time without any significant success.[19]
An organization can successfully deploy a tarpit if it is able to define the range of addresses, protocols, and ports for deception.[20] The process involves a router passing the supported traffic to the appropriate server while those sent by other contacts are sent to the tarpit.[20] Examples of tarpits include the Labrea tarpit, Honeyd,[21] SMTP tarpits, and IP-level tarpits.
Collateral damage
Measures to protect against spam can cause collateral damage. This includes:
- The measures may consume resources, both in the server and on the network.
- When a mail server rejects legitimate messages, the sender needs to contact the recipient out of channel.
- When legitimate messages are relegated to a spam folder, the sender is not notified of this.
- If a recipient periodically checks his spam folder, that will cost him time and if there is a lot of spam it is easy to overlook the few legitimate messages.
- Measures that imposes costs on a third party server may be considered to be abuse and result in deliverability problems.