Corpus Reset

“SpamSieve”:http://c-command.com/spamsieve/, by far “the best anti-spam email tool I’ve used”:/blog/410, was “updated to version 2.3”:http://mjtsai.com/blog/2005/04/25/spamsieve-23/ yesterday. The biggest change listed was increased accuracy, due to improvements in the tokenizers and parsers. John Gruber reported that the “beta versions were running at 99.9% accuracy”:http://daringfireball.net/linked/2005/april#mon-25-spamsieve for him, which is several tenths of a percent above where I’d peaked.

When you get more than one thousand spams a week, you _live_ for a couple of tenths of a percent improvements. I of course upgraded immediately.

SpamSieve, by far the best anti-spam email tool I’ve used, was updated to version 2.3 yesterday. The biggest change listed was increased accuracy, due to improvements in the tokenizers and parsers. John Gruber reported that the beta versions were running at 99.9% accuracy for him, which is several tenths of a percent above where I’d peaked.

When you get more than one thousand spams a week, you live for improvements of a couple of tenths of a percent. I of course upgraded immediately.

Continue reading “Corpus Reset”

Comment Form Fakeout

When I converted this site to WordPress, I decided to turn on commenting, and see what happened. I have gotten a fair number of really good comments, and from people I didn’t know, which was cool. I also got a ton of comment spam (most of which never made it online). Not cool.

So I did a few things about it.

When I converted this site to WordPress, I decided to turn on commenting, and see what happened. I have gotten a fair number of really good comments, and from people I didn’t know, which was cool. I also got a ton of comment spam (most of which never made it online). Not cool.

So I did a few things about it.

Continue reading “Comment Form Fakeout”

Spam Counts for 2004

2004 was a big year for spam, after “Congress voted to make it legal”:http://www.spamhaus.org/position/CAN-SPAM_Act_2003.html at the end of 2003. The result: “spam increased sharply in 2004”:http://www.ecommercetimes.com/story/business/can-spam-act-40216.html. But in my own, more personal battles with spam I’ve been more successful at holding back the tide.

2004 was a big year for spam, after Congress voted to make it legal at the end of 2003. The result: spam increased sharply in 2004.

But in my own, more personal battles with spam I’ve been more successful at holding back the tide. My stats for 2004:

Filtered Mail

36278 Good Messages
72239 Spam Messages (67%)
197 Spam Messages Per Day

SpamSieve Accuracy

135 False Positives
451 False Negatives (77%)
99.5% Correct

Nearly seventy five thousand spam messages came at me, but thanks to SpamSieve a mere 451 made it into my Inbox. That’s less than two spams a day. Simply amazing.

Continue reading “Spam Counts for 2004”

Done Digging for a While

I spent a couple of hours yesterday working on a few last lingering details for this site. The main changes I wanted to make were to upgrade to the latest version of WordPress (a minor security update), make sure I was using the latest version of the Kubrick template (I was), and most importantly, fix the problems I was having with the Kubrick comments form.

I spent a couple of hours yesterday working on a few last lingering details for this site. The main changes I wanted to make were to upgrade to the latest version of WordPress (a minor security update), make sure I was using the latest version of the Kubrick template (I was), and most importantly, fix the problems I was having with the Kubrick comments form, which is a lot cleaner and nicer than the standard WordPress version.

Continue reading “Done Digging for a While”

Personal Survey of Anti-spam Tools

In the three or four years I’ve been fighting unwanted e-mail messages with better tools than the Delete key I’ve tried almost a dozen different tools. This is a quick survey of the ones I’ve used, and why I don’t (or do) still use them.

In the three or four years I’ve been fighting unwanted e-mail messages with better tools than the Delete key I’ve tried almost a dozen different tools. This is a quick (ha!) survey of the ones I’ve used, and why I don’t (or do) still use them.

My very first anti-spam tool was something called Mailfilter. I used it for my personal e-mail on Mac OS X, wrote about it here, and almost immediately afterwards lost a non-spam message to an aggressive keyword match. That was the end of Mailfilter. I can’t even remotely recommend it, as it’s just not intelligent enough (strict, single expression matching), and had zero safety net.

My next attempt at a solution was a utility called SpamFire. Like Mailfilter, it is a “pre-filter,” which means it would run before my e-mail client, download my mail, and skim out the spam. Unlike Mailfilter, it actually saved the trapped messages, so if it made a mistake, I could recover the message. It had plenty of other differences from Mailfilter, which I wrote about previously, and which made it so useful that it became the first anti-spam tool I paid for. But in the end I switched to a different tool because SpamFire was separate from my e-mail client, and that made it cumbersome to use.

Continue reading “Personal Survey of Anti-spam Tools”

Spam Count So Far This Year

With Q1-2004 coming to a close, I thought I’d take a look at my spam situation, which has been escalating out of control. Since 12:01am January 1, 2004 I have received 22,255 spam messages via e-mail. That’s more than 250 a day, every day, for the last 89 days. Earlier in the year, the daily average was lower, which means that in the last couple weeks it’s gone well above 250 per day. In spite of these numbers, I have two things that give me hope.

With Q1-2004 coming to a close, I thought I’d take a look at my spam situation, which has been escalating out of control. Since 12:01am January 1, 2004 I have received 22,255 spam messages via e-mail. That’s more than 250 a day, every day, for the last 89 days. Earlier in the year, the daily average was lower, which means that in the last couple weeks it’s gone well above 250 per day.

In spite of these numbers, I have two things that give me hope.

First, SpamSieve is an amazing anti-spam filter that integrates well with Eudora. It’s far more reliable than the built-in SpamWatch feature that debuted in Eudora 6, primarily in the area of false positives (real messages mistakenly filtered out):

Filtered Mail

13565 Good Messages
22255 Spam Messages (62%)

SpamSieve Accuracy

21 False Positives
197 False Negatives (90%)
99.4% Correct

SpamSieve is award-winning software for Mac OS X, and it integrates beautifully with both Eudora and Mailsmith, the two best e-mail clients for the platform. I am getting to the point where I trust SpamSieve enough to just purge filtered e-mail without reviewing it.

Without SpamSieve, I would be going insane because of spam.

The second thing I have on my side is that more than half of my spam comes to one e-mail address, the oldest e-mail address I still use. If I were able to kill it, it would instantly cut off more than half of the spam. But, it’s the first permanent e-mail address I ever got, using the excellent pobox.com mail forwarding service. I’ve had it for almost 15 years. Because it’s so old, I’m extremely reluctant to part with it — what if that’s the only address a long lost friend has?

Well, it looks like I can have my cake and eat it too. pobox.com just introduced new spam filtering controls and services, which are far more effective than the old filters that were enabled on my account. Last night I turned them on, and already the amount of spam coming into my pobox.com e-mail address has dropped to almost zero.

I wouldn’t exactly call this the turn of the tide, but it’s certainly encouraging. Because it’s my only hope to avoid having to look at 100,000 spam messages in 2004, which is where the growth curve points, if there isn’t change.

I’ll let you know how it’s looking when Q2 is over.

Save Me From the Bounces!

I have over the last two years implemented, I think, a dozen different anti-spam technologies to protect my Inbox. (I’ll total them up and summarize my thoughts in another post.) Today I finished implemented yet another, called SPF, or Sender Permitted From.

I have over the last two years implemented, I think, a dozen different anti-spam technologies to protect my Inbox. (I’ll total them up and summarize my thoughts in another post.) Today I finished implemented yet another, called SPF, or Sender Permitted From (now renamed to “Sender Policy Framework”).

The idea is, if my e-mail address is “michael a-t alderete.com”, then there are only a few servers on the internet that are likely, or permitted, to send e-mail for the alderete.com domain. When you receive an e-mail from that address or domain, if you knew which servers on the internet were legitimate senders, then you could reject messages from all other servers.

This is useful because it’s common practice by spammers to forge the From: header of their spam messages, and because they are almost never able to send those messages from the real server for the domain. (This is why bouncing spam back to the sender just makes the spam problem worse.)

I had incentive to do this because one of my e-mail address domains, alderete.com, has been forged heavily recently (though not quite Joe Jobed), with thousands of e-mails being sent out with forged from addresses like “Tammeravxryawwv [at] alderete.com” and “Glenniedatjklcjyknai [at] alderete.com”. When the spams bounce back, they come to my Inbox. Thousands of them.

Now, SPF isn’t a panacea for this problem, mostly because there has not been a lot of deployment of the technology yet. But that’s coming; AOL recently began trialing it, and if it’s successful I am sure the other big ISPs will do so soon.

When they do, I’ll be ready to reap the benefits.

Eudora 6 with SpamWatch

QUALCOMM’s Eudora has been my e-mail client of choice for nearly 10 years, and last week a major new version shipped, Eudora 6. My primary concern before upgrading was whether and how my other anti-spam tool, Spamnix, would work with the new version, especially with the new SpamWatch feature. I’m thrilled to report that Spamnix works fine with Eudora 6 (for Mac OS X), and that Spamnix + SpamWatch is more effective than either tool alone.

Note: Although still terrific tools, and in the case of SpamWatch free and built-in, I no longer use either Spamnix or Eudora’s SpamWatch, having found more effective tools. See my Personal Survey of Anti-Spam Tools for more details and recommendations.

QUALCOMM’s Eudora has been my e-mail client of choice for nearly 10 years, and last week a major new version shipped, Eudora 6. I’m usually of the “fools rush in” school of thought with regards to software updates, so I waited to see what people were saying about the upgrade (MacInTouch is a great resource for these “reader reports”).

But it’s been a week, and nary a peep. And with the amount of spam I receive continuing to grow, I really wanted to try the new SpamWatch feature. So, after doing multiple backups, I upgraded myself over the weekend.

My primary concern was whether and how my other anti-spam tool, Spamnix, would work with the new version, especially with the new SpamWatch feature. Unlike a lot of other third-party anti-spam tools, Spamnix is a Eudora plug-in, and so runs “in-process” (i.e., inside) with Eudora. [Update: SpamSieve 2 just shipped, and now also includes a Eudora plug-in. Very cool!] This makes it more efficient, but also (in theory) more susceptible to compatibility issues.

I’m happy, nay, thrilled to report that Spamnix works fine with Eudora 6 (for Mac OS X), and that Spamnix + SpamWatch is more effective than either tool alone.

I love the way that SpamWatch and Spamnix tag-team to combat spam. SpamWatch gets first crack, before other filters or plug-ins look at the message, and if the message’s score is over the spam threshold, it will be filtered into the Junk mailbox, with no further processing. (Qualcomm designed SpamWatch to run first, and you can’t change that.)

If a message doesn’t get caught by SpamWatch, then Spamnix takes a look at it, and if Spamnix decides it’s spam, it’ll go into Spamnix’s own spam folder (on my system named “_Spamnix”; note the initial space to influence sort order). These messages, nicely separated and usually all spam, are prime candidates for further training for SpamWatch.

I receive hundreds of spam messages a day, but after two tiers of spam filtering very little spam gets to my Inbox — so far only a couple a day, with very little training of SpamWatch yet. The few that have made it through have gone straight back to SpamWatch for training. :-)

What is fascinating about this process is the progress that SpamWatch has made, in less than 4 days of processing my mail. The first time I downloaded a sizeable batch of e-mail (more than 50 messages), most of the spam got through SpamWatch, and caught by Spamnix. After training SpamWatch with those messages, and then downloading another big batch a few hours later, the ratio went the other way: SpamWatch was now catching most spam before Spamnix got a chance to look at it.

I’m still glad to have both layers. Spamnix was extremely effective at catching my spam, prior to SpamWatch being added to the mix, and it’s still catching spam that SpamWatch is missing. So overall, I am doing better in my personal war against spam (though it’s important to remember that this is defensive action only).

About the only downside of introducing SpamWatch as a new layer of anti-spam defense is that right now it’s relatively untrained, and generating a larger number of false positives (non-spams filed in the Junk folder) than I’m used to. SpamWatch ships “pre-trained”, meaning it already has a database of spam words to run against, but this list is generic, not customized to my own e-mail traffic. So it’s not that surprising that some of Rochelle’s e-mails are getting tagged as spam. My previous experience with Bayesian filtering is that it rapidly adjusts as you correct its mistakes, so I’m confident the false positives will go down in a week or so.

At any rate, I’m quite happy with the new version, especially since I was still in my 12 month support period from my last upgrade, so version 6 was free. Recommended, even if you have to pay for it.

Pete Wellborn for Senator

Pete Wellborn is the attorney representing the defendants in a recent nuisance lawsuit filed by a group of spammers against some of the better-known — and more effective — anti-spam resources and groups, such as Spamhaus and SPEWS. His motion to dismiss the case was so effective that the plaintiffs are now trying to back out of the case.

Pete Wellborn is the attorney representing the defendants in a recent nuisance lawsuit filed by a group of spammers against some of the better-known — and more effective — anti-spam resources and groups, such as Spamhaus and SPEWS.

His motion to dismiss the case was so effective that the plaintiffs are now trying to back out of the case, so they can avoid having to pay opponent’s legal fees, which they’re likely to have to do. Pete’s not going to let them do that.

Wellborn has been so effective at racking up successes against spammers, to the tune of multi-million dollar judgments, that he’s called the “Spammer Hammer.” And after defending on this lawsuit, he’s switching to offense, to run down the toads behind it for their spamming activities.

Go get ’em, Hammer!

SpamBayes for Outlook

A while back I recommended an Outlook plug-in called SpamNet, from Cloudmark. At the time, it was a free tool for Outlook users to block spam, that worked quite reliably. Sadly, it’s no longer free. I get so little spam at work (where my e-mail address is relatively unpublished) that I can’t justify buying a subscription. Fortunately, I have found another solution at least as good.

A while back I recommended an Outlook plug-in called SpamNet, from Cloudmark. At the time, it was a free tool for Outlook users to block spam, that worked quite reliably. Sadly, it’s no longer free. I get so little spam at work (where my e-mail address is relatively unpublished) that I can’t justify buying a subscription.

I do still get some spam, though. Fortunately, Jon Udell’s recent weblog entries and review at InfoWorld turned me onto a replacement that is free, and will remain so (it’s Open Source): SpamBayes.

Like SpamNet, it can be installed as an Outlook plug-in, and easily used via buttons on Outlook’s toolbar. But the technology behind it is very different, as it uses Bayesian filtering rather than distributed recognition. It’s also different in that the core project and recognition engine is command line-oriented. The Outlook-only plug-in is terrific, but only a side project. It’s not required, and there are plenty of ways for those who use something other than Outlook for e-mail to use SpamBayes.

You can read the review for a thorough look, but my experience was that it was just as easy to install as SpamNet, is extremely effective at blocking spam, and is also having fewer false positives. I think the reason for that is SpamNet uses other people’s spam reports to decide what to block in my Inbox, and there’s a lot of people who just block e-mails they signed up for (newsletters, promos, etc.), rather than unsubscribe from them. Those false reports pollute the knowledge base, and affect my results. Bayesian filtering is exactly the opposite — it only cares what I think is spam.

Can I Kiss Eliot Spitzer?

Can I vote for him for President?

Can I vote for him for President?

Seriously, this is clearly an ambitious man with his finger squarely on the pulse of what’s making people feel crazy. He takes the pulse, he prosecutes cases against the bad buys. It’s great. I wish I had a politician so responsive in California. I predict that state attorney general is not the highest office he will ever hold.

Now, can I sign up to be on the jury?

Latent Semantic Analysis Is Not Bayesian Filtering

Macworld recently ran an article about anti-spam tools for Mac OS X, which incorrectly simplified the world of anti-spam tools down to Boolean, points-based, and Bayesian filters. There are at least two more categories of anti-spam tools.

Macworld recently ran an article about anti-spam tools for Mac OS X, which incorrectly simplified the world of anti-spam tools down to Boolean, points-based, and Bayesian filters.

Two additional categories are distributed recognition, such as the Distributed Checksum Clearinghouse (DCC) and Vipul’s Razor, and latent semantic analysis. I don’t know of any distributed recognition products for the Mac (there’s a very good one for Windows Outlook, SpamNet by Cloudmark), but there certainly is a latent semantic analysis tool — Apple’s Mail in Jaguar!

The simplification (or oversight) is relatively understandable. From an end-user perspective, there’s no meaningful difference — even though the math is very different. It’s not clear which will prove better at filtering out spam, even though in the article Mail’s filtering did the best. Seems like it’s good to have both in the fight!

While I’m posting about it, I should note that the article was written prior to the release of my new favorite anti-spam tool, Spamnix, and so it doesn’t include it in the roundup. From my own experience with Mac OS anti-spam tools I think that, with the caveat that it only works with Eudora, it would have done well in the evaluation. Perhaps Geoff Duncan, or someone else at TidBITS, will review it soon, and confirm that guess. I know they like Eudora at TidBITS — they literally wrote the book!