outlook on laptop

It’s no secret that spear phishing is a prevalent threat and is making an appearance in many CISOs’ nightmares. The Verizon’s 2016 breach digest is out and—for anyone who hasn’t looked through it yet—the answer is 30%. That’s the percentage of breaches from 2013 to 2016 that leveraged social engineering tactics to stage a compromise. Of those attacks, phishing accounts for 72% of them. That means that nearly 22% of breaches in the last 3 years have leveraged phishing in some way or another. It’s hard enough to secure external and internal assets… but having to secure your employees too? It’s a scary thought. Definitely something to keep one up at night.

Current solutions include improving user awareness through training exercises, minimizing and controlling damage through defined incident response programs, and stopping phishing emails before they ever make it to employees’ inboxes through email/spam filtering solutions. We’re here to talk about the last one.

Using a collection of benign and phishy emails alongside a spam filter testing service called Email on Acid, we’ve taken a stab at comparing 22 different spam filtering solutions. These tests measure each spam filter’s ability to stop spear-phishing emails in their tracks. To anyone afraid of long articles, the “tl;dr” reads something like this: Spam filters are okay. They’re not perfect and not terribly intelligent, but they can be effective at times and represent one layer of defense that should be in-place to protect an organization from phishing or spear-phishing attacks.

Spam Filters: 10,000 Foot View

Modern email spam filtering solutions were built with the intention of protecting users from spammers and phishers alike. Many of these filters provide a load of helpful functions: Whitelists, blacklists, image blocking, attachment blocking, and custom rules that can be used to tag or modify messages before they make it to a user’s inbox. These features are powerful and can be configured to protect users from spam and phishing attempts, but how good are these tools fresh out-of-the-box?

With the help of our friends at Email on Acid, we were able to compare out-of-the-box effectiveness of a large number of email filtering solutions. Through this testing, we made some progress in answering the following questions:

  • How good are spam filters really?
  • Which spam filters excel and which fall short?


It’s worth noting that spam filters are intentionally blackboxed and mysterious entities. If spam filters provided detailed feedback or source code detailing exactly what they look for, it would be very feasible for an intelligent phisherman (or phisherwoman) to get around the filters with a nearly perfect rate of success. It’s actually very difficult to find specific content describing how spam filters operate. Most of the literature we found online refers to certain “checks” that are performed; these checks include looking for “signs of spam” and cross-referencing blacklists. “Checks” and “signs of spam” are in quotations for a reason: They are never clearly defined. It seems fishy, but check it out for yourself if you are in doubt. The most comprehensive list of these “checks” we found is provided by SpamAssassin. It’s quite a long list and looks for a number of very specific things.

Our guess is that some degree of machine learning/classification is employed to sort the spam from the ham. We hope so at least, as sorting spam from ham seems to be a textbook case of a classification problem that can be addressed by machine learning. It’s also known that many (possibly all) of the spam filters observed during these tests define some sort of “spam score threshold” and, if a message exceeds this threshold, it’s marked as spam and treated accordingly.

Additionally, it’s important to note that these tests were done to measure the ability to stop spear-phishing attacks; that is, highly targeted attacks that likely target only a small set of individuals within a company. This is the path of least resistance; this is the path that an advanced persistent threat will try first to take over a company’s domain. A less targeted phishing or whaling attack that sends hundreds or thousands of emails at any corporation’s domain will almost certainly be shut down by all filters within minutes, maybe even seconds.

Email on Acid

Email on Acid is an awesome service that provides detailed feedback for a wide variety of email test cases. There are many features provided, but the one that we are really interested in here is the Spam testing provided. We’ll see shortly but, to whet your appetite, the data we are basing our results off of looks something like this: (interactive sample here)

sample spam test deliverability

Table: Email on Acid spam test

Take special note to the “Feedback Filters” section. While far from perfect, these do provide some useful information that we might use later in our overall assessment.

Test Description

To test the effectiveness of the spam filters using Email on Acid, we put together a list of “features” that spam/phishing emails are likely to contain. To name a few, we compared:

  • Loud/Noisy email subject/body (i.e. “You just won $10,000!!! Click here now!”)
  • Embedded images (Hosted from a trusted/untrusted domain)
  • Embedded hyperlinks (Linked to a trusted/untrusted domain)
  • DKIM/SPF of sending domain (Additional authentication from the sending domain to the MX server.)
  • Age of sending domain (i.e. A domain purchased this week vs. one that’s been around for years.)

The testing methodology is as follows:

  1. Create an email template that exhibits a set of the features listed above
  2. Send the email through Email on Acid’s spam testing service
  3. Check which filters blocked/allowed the message to go through
  4. Document results in a big Excel sheet


For the results, we are really looking at two things:

  1. Does a given filter mis-classify “Ham” as “Spam”?
    False positive – results in legitimate messages being blocked.
  2. Does a given filter mis-classify “Spam” as “Ham”?
    False negative – results in nasty emails making it to users’ inboxes.

Note that, for our purposes, “Spam” means it’s a spear-phishing email.

To help visualize the results, we’ve split the testing into two categories: “Ham tests” and “Spam tests.” The Ham tests are all benign emails that, in theory, should make it through the email filters. The Spam tests are all phishing emails that contain some piece of malicious content (mostly links to untrusted and malicious domains). And now, without further ado… results!

To read the following tables:
Choose a spam filter from the left-most column. Move your eyes across the row from left to right. A green box means that the spam filter correctly classified the test email. A red box means that the spam filter incorrectly classified the test email. A gray box means that the spam filter did not respond to the message at all. Unfortunately, we are not exactly sure what this means and, as such, do not factor these boxes into our assessment.

spam test key

ham test results

Table: HAM Test


  • Outlook 2007 and 2013 are very trigger happy. My guess is that they don’t like the MX servers the emails are coming from (gmail.com, Google Apps hosted email, and privateemail.com). The first two are fairly reputable domains, so I’m not entirely sure what the filters are unhappy about.
  • Google Apps (Postini) doesn’t seem to like the emails coming from the privateemail.com MX server; it let the first one through (test 4) but blocked the next two (test 5 and 6).
  • Apple Mail 7 seems unresponsive. Results are likely not relevant but are included for completeness.

spam test results

Table: SPAM Test


  • Some spam filters did (almost) nothing. Literally. Not to call out some of the big names but… Gmail, Barracuda, Spam Assassin… What’s going on?
  • A couple others also appeared to do nothing other than signature detection (they flagged the top spam email of 2014, which should be well-known and blacklisted). In this case, we’re talking about Symantec Cloud and Symantec Messaging Gateway. It also appears Postini might have some signature detection in place.
  • If we consider “No response” to be “Mark as spam,” Outlook.com actually did pretty well in both the Ham and Spam tests.
  • We can’t give too much credit to Office 2007 or Office 2013 here. They blocked most of the Spam, but they also blocked most of the Ham. My guess here is that these filters don’t like messages that don’t come from a Microsoft MX server.

Take Aways

Based on the results of our testing, it is fair to say that spam filters don’t provide all of the protection from targeted malicious emails that users need. Any truly meaningful protections from spear phishing will require a multi-faceted plan, involving user awareness training and an incident response program; email filters are important and good to have, but they are not a comprehensive solution.

How Good are Spam Filters really?

I think the data should speak for itself. Sadly, the answer is that they are not too great. Postini (Google apps hosted email) really seems to stand out but is still by no means a perfect solution. Postini (now called Google apps hosted email, as it was purchased by Google in 2007), does not seem to disclose any details about how they are filtering spam, but–knowing Google and based on the results from our testing–my guess is that they are using a combination of blacklisted email signatures and machine learning classification algorithms to make intelligent decisions about spam vs. ham.

Why Bother with Spam Filters?

  • Spam filters still provide security benefits to an organization. For one, they are highly customizable; all of the tests done using Email on Acid were against spam filters with the default settings. An organization can tweak its settings as needed. This will likely be a tedious process filled with trial and error, but it’s better than nothing.
  • Spam filters can catch in-bound bulk emails. Remember: Our testing was for spear-phishing (highly focused attacks that target a small set of employees within an organization). Email on Acid does not provide any form of bulk-message filtering and, because of this limitation, we did not include it in our tests. If a phishing campaign was sent out using a spray-and-pray technique throughout an entire organization, it is almost certain that the first dozen or so malicious emails would get through, but the next hundred or thousand following emails would be swallowed by the spam filter. While this is again, not a perfect solution, it could vastly reduce the number of employees who wind up with a phishing email in their inbox.

Which Spam Filter Should I Use?

There are three that we would recommend. Two of them come built-in to the email service; the third is a standalone service that can be integrated into (technically in-front of) various email services. They are, in no particular order:

  • Google Apps Hosted Email (Formerly Postini) (Spam filter built-in to service)
    In our eyes, this is the clear winner based on the tests shown above. While this filter did exhibit both false negatives and false positives, overall, it appears to be the most accurate filter that we tested.
  • Outlook.com (Office365) (Spam filter built-in to service)
    We run into this one a lot during our spear phishing campaigns at Praetorian. As seen from the testing, it can be overly sensitive but–in general–it’s pretty alright. It has definitely hindered many of our campaigns and required us to tweak our spear phishing emails to squeak through the filter.
  • ProofPoint (External service)
    It is a real shame that ProofPoint is not included in Email in Acid’s testing suite. From the outside, it seems like a very strong contender. Some online literature comparing ProofPoint to Postini really makes it seem like ProofPoint might be the one filter to rule them all. Unfortunately, we were unable to test this during our research phase. We do, however, occasionally encounter ProofPoint during our social engineering engagements and often times it requires us to tweak our spear phishing emails in order to get through.

Test Details

As promised, some additional details for each test case.


Table: Test Mapping