Corpus reset

by Michael Alderete on 4/26/2005

SpamSieve, by far the best anti-spam email tool I’ve used, was updated to version 2.3 yesterday. The biggest change listed was increased accuracy, due to improvements in the tokenizers and parsers. John Gruber reported that the beta versions were running at 99.9% accuracy for him, which is several tenths of a percent above where I’d peaked.

When you get more than one thousand spams a week, you live for improvements of a couple of tenths of a percent. I of course upgraded immediately.

Read the rest of this entry (281 words) »

{ Comments on this entry are closed }

Comment form fakeout

by Michael Alderete on 3/13/2005 · 4 comments

When I converted this site to WordPress, I decided to turn on commenting, and see what happened. I have gotten a fair number of really good comments, and from people I didn’t know, which was cool. I also got a ton of comment spam (most of which never made it online). Not cool.

So I did a few things about it.

Read the rest of this entry (213 words) »

{ Comments on this entry are closed }

Spam counts for 2004

by Michael Alderete on 2/22/2005

2004 was a big year for spam, after Congress voted to make it legal at the end of 2003. The result: spam increased sharply in 2004.

But in my own, more personal battles with spam I’ve been more successful at holding back the tide. My stats for 2004:

Filtered Mail
36278 Good Messages
72239 Spam Messages (67%)
197 Spam Messages Per Day

SpamSieve Accuracy
135 False Positives
451 False Negatives (77%)
99.5% Correct

Nearly seventy five thousand spam messages came at me, but thanks to SpamSieve a mere 451 made it into my Inbox. That’s less than two spams a day. Simply amazing.

Read the rest of this entry (226 words) »

{ Comments on this entry are closed }

Done digging for a while

by Michael Alderete on 1/8/2005

I spent a couple of hours yesterday working on a few last lingering details for this site. The main changes I wanted to make were to upgrade to the latest version of WordPress (a minor security update), make sure I was using the latest version of the Kubrick template (I was), and most importantly, fix the problems I was having with the Kubrick comments form, which is a lot cleaner and nicer than the standard WordPress version.

Read the rest of this entry (308 words) »

{ Comments on this entry are closed }

Personal survey of anti-spam tools

by Michael Alderete on 1/7/2005 · 12 comments

In the three or four years I’ve been fighting unwanted e-mail messages with better tools than the Delete key I’ve tried almost a dozen different tools. This is a quick (ha!) survey of the ones I’ve used, and why I don’t (or do) still use them.

My very first anti-spam tool was something called Mailfilter. I used it for my personal e-mail on Mac OS X, wrote about it here, and almost immediately afterwards lost a non-spam message to an aggressive keyword match. That was the end of Mailfilter. I can’t even remotely recommend it, as it’s just not intelligent enough (strict, single expression matching), and had zero safety net.

My next attempt at a solution was a utility called SpamFire. Like Mailfilter, it is a “pre-filter,” which means it would run before my e-mail client, download my mail, and skim out the spam. Unlike Mailfilter, it actually saved the trapped messages, so if it made a mistake, I could recover the message. It had plenty of other differences from Mailfilter, which I wrote about previously, and which made it so useful that it became the first anti-spam tool I paid for. But in the end I switched to a different tool because SpamFire was separate from my e-mail client, and that made it cumbersome to use.

Read the rest of this entry (2,185 words) »

{ Comments on this entry are closed }

Spam count so far this year

by Michael Alderete on 3/29/2004 · 1 comment

With Q1-2004 coming to a close, I thought I’d take a look at my spam situation, which has been escalating out of control. Since 12:01am January 1, 2004 I have received 22,255 spam messages via e-mail. That’s more than 250 a day, every day, for the last 89 days. Earlier in the year, the daily average was lower, which means that in the last couple weeks it’s gone well above 250 per day.

In spite of these numbers, I have two things that give me hope.

First, SpamSieve is an amazing anti-spam filter that integrates well with Eudora. It’s far more reliable than the built-in SpamWatch feature that debuted in Eudora 6, primarily in the area of false positives (real messages mistakenly filtered out):

Filtered Mail

13565 Good Messages
22255 Spam Messages (62%)

SpamSieve Accuracy

21 False Positives
197 False Negatives (90%)
99.4% Correct

SpamSieve is award-winning software for Mac OS X, and it integrates beautifully with both Eudora and Mailsmith, the two best e-mail clients for the platform. I am getting to the point where I trust SpamSieve enough to just purge filtered e-mail without reviewing it.

Without SpamSieve, I would be going insane because of spam.

The second thing I have on my side is that more than half of my spam comes to one e-mail address, the oldest e-mail address I still use. If I were able to kill it, it would instantly cut off more than half of the spam. But, it’s the first permanent e-mail address I ever got, using the excellent mail forwarding service. I’ve had it for almost 15 years. Because it’s so old, I’m extremely reluctant to part with it — what if that’s the only address a long lost friend has?

Well, it looks like I can have my cake and eat it too. just introduced new spam filtering controls and services, which are far more effective than the old filters that were enabled on my account. Last night I turned them on, and already the amount of spam coming into my e-mail address has dropped to almost zero.

I wouldn’t exactly call this the turn of the tide, but it’s certainly encouraging. Because it’s my only hope to avoid having to look at 100,000 spam messages in 2004, which is where the growth curve points, if there isn’t change.

I’ll let you know how it’s looking when Q2 is over.

{ Comments on this entry are closed }

Save me from the bounces!

by Michael Alderete on 1/31/2004 · 1 comment

I have over the last two years implemented, I think, a dozen different anti-spam technologies to protect my Inbox. (I’ll total them up and summarize my thoughts in another post.) Today I finished implemented yet another, called SPF, or Sender Permitted From (now renamed to “Sender Policy Framework”).

The idea is, if my e-mail address is “michael a-t”, then there are only a few servers on the internet that are likely, or permitted, to send e-mail for the domain. When you receive an e-mail from that address or domain, if you knew which servers on the internet were legitimate senders, then you could reject messages from all other servers.

This is useful because it’s common practice by spammers to forge the From: header of their spam messages, and because they are almost never able to send those messages from the real server for the domain. (This is why bouncing spam back to the sender just makes the spam problem worse.)

I had incentive to do this because one of my e-mail address domains,, has been forged heavily recently (though not quite “Joe Job“ed), with thousands of e-mails being sent out with forged from addresses like “” and “”. When the spams bounce back, they come to my Inbox. Thousands of them.

Now, SPF isn’t a panacea for this problem, mostly because there has not been a lot of deployment of the technology yet. But that’s coming; AOL recently began trialing it, and if it’s successful I am sure the other big ISPs will do so soon.

When they do, I’ll be ready to reap the benefits.

{ Comments on this entry are closed }

Eudora 6 with SpamWatch

by Michael Alderete on 9/10/2003

Note: Although still terrific tools, and in the case of SpamWatch free and built-in, I no longer use either Spamnix or Eudora’s SpamWatch, having found more effective tools. See my Personal Survey of Anti-Spam Tools for more details and recommendations.

QUALCOMM’s Eudora has been my e-mail client of choice for nearly 10 years, and last week a major new version shipped, Eudora 6. I’m usually of the “fools rush in” school of thought with regards to software updates, so I waited to see what people were saying about the upgrade (MacInTouch is a great resource for these “reader reports”).

But it’s been a week, and nary a peep. And with the amount of spam I receive continuing to grow, I really wanted to try the new SpamWatch feature. So, after doing multiple backups, I upgraded myself over the weekend.

My primary concern was whether and how my other anti-spam tool, Spamnix, would work with the new version, especially with the new SpamWatch feature. Unlike a lot of other third-party anti-spam tools, Spamnix is a Eudora plug-in, and so runs “in-process” (i.e., inside) with Eudora. [Update: SpamSieve 2 just shipped, and now also includes a Eudora plug-in. Very cool!] This makes it more efficient, but also (in theory) more susceptible to compatibility issues.

I’m happy, nay, thrilled to report that Spamnix works fine with Eudora 6 (for Mac OS X), and that Spamnix + SpamWatch is more effective than either tool alone.

I love the way that SpamWatch and Spamnix tag-team to combat spam. SpamWatch gets first crack, before other filters or plug-ins look at the message, and if the message’s score is over the spam threshold, it will be filtered into the Junk mailbox, with no further processing. (Qualcomm designed SpamWatch to run first, and you can’t change that.)

If a message doesn’t get caught by SpamWatch, then Spamnix takes a look at it, and if Spamnix decides it’s spam, it’ll go into Spamnix’s own spam folder (on my system named “ Spamnix”; note the initial space to influence sort order). These messages, nicely separated and usually all spam, are prime candidates for further training for SpamWatch.

I receive hundreds of spam messages a day, but after two tiers of spam filtering very little spam gets to my Inbox — so far only a couple a day, with very little training of SpamWatch yet. The few that have made it through have gone straight back to SpamWatch for training. :-)

What is fascinating about this process is the progress that SpamWatch has made, in less than 4 days of processing my mail. The first time I downloaded a sizeable batch of e-mail (more than 50 messages), most of the spam got through SpamWatch, and caught by Spamnix. After training SpamWatch with those messages, and then downloading another big batch a few hours later, the ratio went the other way: SpamWatch was now catching most spam before Spamnix got a chance to look at it.

I’m still glad to have both layers. Spamnix was extremely effective at catching my spam, prior to SpamWatch being added to the mix, and it’s still catching spam that SpamWatch is missing. So overall, I am doing better in my personal war against spam (though it’s important to remember that this is defensive action only).

About the only downside of introducing SpamWatch as a new layer of anti-spam defense is that right now it’s relatively untrained, and generating a larger number of false positives (non-spams filed in the Junk folder) than I’m used to. SpamWatch ships “pre-trained”, meaning it already has a database of spam words to run against, but this list is generic, not customized to my own e-mail traffic. So it’s not that surprising that some of Rochelle’s e-mails are getting tagged as spam. My previous experience with Bayesian filtering is that it rapidly adjusts as you correct its mistakes, so I’m confident the false positives will go down in a week or so.

At any rate, I’m quite happy with the new version, especially since I was still in my 12 month support period from my last upgrade, so version 6 was free. Recommended, even if you have to pay for it.

{ Comments on this entry are closed }

Pete Wellborn for senator

September 10, 2003

Pete Wellborn is the attorney representing the defendants in a recent nuisance lawsuit filed by a group of spammers against some of the better-known — and more effective — anti-spam resources and groups, such as Spamhaus and SPEWS. His motion to dismiss the case was so effective that the plaintiffs are now trying to back out of the case.

Read the full article →

SpamBayes for Outlook

May 18, 2003

A while back I recommended an Outlook plug-in called SpamNet, from Cloudmark. At the time, it was a free tool for Outlook users to block spam, that worked quite reliably. Sadly, it’s no longer free. I get so little spam at work (where my e-mail address is relatively unpublished) that I can’t justify buying a subscription. Fortunately, I have found another solution at least as good.

Read the full article →

Can I kiss Eliot Spitzer?

May 14, 2003

Can I vote for him for President?

Read the full article →

Latent semantic analysis is not Bayesian filtering

May 4, 2003

Macworld recently ran an article about anti-spam tools for Mac OS X, which incorrectly simplified the world of anti-spam tools down to Boolean, points-based, and Bayesian filters. There are at least two more categories of anti-spam tools.

Read the full article →