Where's SpamAssassin?

By Joel Snyder
Network World, 12/20/04

Original Article on Network World Web Site

The short answer is that no one submitted it, but of course there's more to it than that. This year we reached out to the SpamAssassin community and asked them to participate. Although a few well-meaning souls volunteered to be the contacts for SpamAssassin, when it came time to test no one would step up to the plate and represent the product at a level that would make it competitive to the other enterprise-focused vendors.

Interest in SpamAssassin is understandable. In the small-business market, the open source SpamAssassin dominates many anti-spam systems. When well tuned and integrated by a value-added reseller (VAR) that knows what it is doing, it turns out to be a very effective system. SpamAssassin users routinely report 100% spam reduction and 0% false positives (although these self-reported statistics are probably biased), and are generally overjoyed with the results.

By itself, SpamAssassin is little more than the software implementation of an interesting idea: apply statistics, neural networks and Bayesian probabilities to the problem of classifying mail as spam or not. Train the engine by giving it desirable and undesirable mail, and it can tell you for each new message what pile it most resembles. It turns out to work astonishingly well, especially in small businesses where mail flow is very homogeneous. SpamAssassin's Bayesian engine even redefines the meaning of spam by letting you say, "This is the mail I want," and "This mail I don't want." SpamAssassin also mixes other tools into its scoring system, such as DNS-based blacklists and collaborative scoring, as well as more traditional keyword searches and formatting tests.

The key to SpamAssassin's success, though, is a smart VAR or IT person installing it. SpamAssassin requires a significant amount of integration work to make an enterprise-class installation succeed. Without a GUI, database, quarantine, anti-virus scanner, policy or per-user configuration, SpamAssassin is a great tool for those who want to build their own anti-spam system, but is in no way a solution by itself.

This doesn't mean that SpamAssassin wasn't well represented in our test. The important core of SpamAssassin, a Bayesian engine, was recognizable in at least one-third of the products we tested and might well have been hidden in the guts of more. The strategy of combining multiple tests to identify spam is in nearly all modern, anti-spam products, including SpamAssassin.

The difficulty in testing or recommending products that require heavy engine training, or ones based on trained neural networks, is that companies with many employees have very diverse mail flows, and the training will likely generate false positives or negatives across large numbers of users. For example, a multinational company might have many employees who don't read or speak Italian, and might train all their Italian mail as spam - something that would upset the Milan and Rome offices. Or imagine IDG, which owns many publications, all which have specialized vocabularies. No one set of training mail would work for the different communities.

Products that successfully include a Bayesian recognizer, such as SpamAssassin, do so by considering it as one factor in the larger cocktail of spam identification. By weighting the Bayesian verdict with other information, vendors have followed the trail that SpamAssassin blazed and made it enterprise-ready.