Latent Semantic Analysis Is Not Bayesian Filtering

Macworld recently ran an article about anti-spam tools for Mac OS X, which incorrectly simplified the world of anti-spam tools down to Boolean, points-based, and Bayesian filters. There are at least two more categories of anti-spam tools.

Macworld recently ran an article about anti-spam tools for Mac OS X, which incorrectly simplified the world of anti-spam tools down to Boolean, points-based, and Bayesian filters.

Two additional categories are distributed recognition, such as the Distributed Checksum Clearinghouse (DCC) and Vipul’s Razor, and latent semantic analysis. I don’t know of any distributed recognition products for the Mac (there’s a very good one for Windows Outlook, SpamNet by Cloudmark), but there certainly is a latent semantic analysis tool — Apple’s Mail in Jaguar!

The simplification (or oversight) is relatively understandable. From an end-user perspective, there’s no meaningful difference — even though the math is very different. It’s not clear which will prove better at filtering out spam, even though in the article Mail’s filtering did the best. Seems like it’s good to have both in the fight!

While I’m posting about it, I should note that the article was written prior to the release of my new favorite anti-spam tool, Spamnix, and so it doesn’t include it in the roundup. From my own experience with Mac OS anti-spam tools I think that, with the caveat that it only works with Eudora, it would have done well in the evaluation. Perhaps Geoff Duncan, or someone else at TidBITS, will review it soon, and confirm that guess. I know they like Eudora at TidBITS — they literally wrote the book!

Spamnix, My New Anti-Spam Tool

Yesterday a new anti-spam tool shipped, Spamnix, which functions as a plug-in to Eudora, on either Mac OS X or Windows. After installing it and using it to check e-mail a couple times, I’ve decided to abandon my old tool, Spamfire.

Update: Although it remains an excellent tool, I no longer recommend Spamnix, having found more effective tools while Spamnix 3 was in development, and Spamnix 1.2 was not enough. See my Personal Survey of Anti-Spam Tools for more details and recommendations.

Yesterday a new anti-spam tool shipped, Spamnix, which functions as a plug-in to Eudora, on either Mac OS X or Windows. After installing it and using it to check e-mail a couple times, I’ve decided to abandon my old tool, Spamfire.

The reason is pretty simple. Spamfire is fairly effective, but its design means my e-mail is processed twice. First Spamfire downloads and scans my messages, deleting those it considers spam. Then Eudora downloads whatever Spamfire lets through. Spamfire integrates with an e-mail client via the POP3 / SMTP mail server, with AppleScripts to trigger the client’s e-mail check. Overall this works fine, but because Spamfire is a separate application the whole process is slow and cumbersome. It would be better if Spamfire itself was not as slow as molasses, but, well, it is as slow as molasses.

While it’s true that Spamnix can only be used with Eudora, I’ve been using Eudora for so many years the possibility of switching to something else is near zero. So my only consideration is how well it integrates.

Spamnix does that beautifully. My e-mail downloads as normal, but messages are scanned during the download process. Messages which exceed the spam threshold are filtered to a separate mailbox, for later review. The rest go to my Inbox as normal. No two-stage mail downloading and processing, no switching to a separate application to review the caught spam for false positives, no hassle rescuing the few false positives that do turn up.

One of the other selling points for me (and here’s where you can tell I’m a nerd) is that Spamnix is based on SpamAssassin, the extremely well-regarded Open Source spam tagging tool written in Perl. While Spamnix appears to currently be using only the text scanning part of SpamAssassin right now, I am very hopeful and excited that Spamnix may soon support the Bayesian filtering and Vipul’s Razor collaborative spam tracking capabilities of the latest SpamAssassin.

At any rate, if you’re a Eudora user on either Mac OS X or Windows, and it’s worth $30 to you to block most of the spam you’re currently receiving, you should give Spamnix a try. The software is downloadable for free, and functions for 30 days before requiring a license key for further use.

But if you’re like me (I get well over 200 spams every day), it won’t take 30 days to convince you that $30 is a small price to pay. I decided in less than 24 hours!