Adventures in spam testing

By Joel Snyder
Network World, 12/20/04

Original Article on Network World Web Site

Testing routers and switches is easy. Frames go in, frames come out. With anti-spam products, nothing is ever easy.

We got into more shouting matches over this test than any other - and that was even before we published the results. Vendors are intensely competitive, and the numbers are hard to come by. We worked hard to create a fair test, but that doesn't mean every product will show its best side. For our complete methodology, click here.

The biggest sticking point was being the first hop. Anti-spam vendors have learned they can eliminate a huge pile of junk right off the top by using a variety of blacklist techniques. The best products can do that wherever they are in the chain by looking at headers in the message. But a surprisingly large percentage haven't figured out how to cope with not being the top dog in the e-mail chain. Some also detect irregularities in the SMTP conversation, signs of some spam-generator tools. Our test bed probably shaved a few percentage points off the best possible spam catch scores.

We also had to deal with flaky anti-spam products. For several reasons, not every product was ready to immediately accept every message the moment we received it. To deal with this, we had to have a real SMTP Message Transfer Agent (MTA) receive and retransmit the products. That meant some of the tracks and traces of spammers that might be in irregular or improperly created messages were obstructed by our MTA.

A bigger issue in testing many products involved training. While some products - including several of our top finishers - require no training, others asked for various degrees of pre-test preparation. In the worst case, several vendors asked us to identify false positives and false negatives during a training period before testing. While we followed all the instructions on tuning, the sheer number of products limited the amount of time we could spend on this task for each product. Vendors whose products require significant tuning will argue they would leapfrog to the top of the list with more tuning time. But maybe they wouldn't.

Several products also depend on environmental information to help them make better decisions. For example, if you send your outbound mail stream through the anti-spam gateway, it knows who to expect responses from, and can reduce the false-positive rate while increasing spam-catch rate. Our test bed didn't permit this type of configuration.

The false-positive and false-negative rates we found are useful for comparing products but a real installation will likely have a lower false-positive rate and higher spam-catch rate. Because every product was handicapped in the same way, the results reported give an excellent way to compare the performance of products. Comparing these statistics across tests, though, would not give valid results.