Monday, March 19, 2007

My blog is a spam

Over the weekend, I found out that Blogger has blocked by blog as spam. I could not add posts anymore.



I had to go through
  • Step 1 - a CAPTCHA to prove that I am indeed a human, followed by
  • Step 2 - a wait period while some human actually verified that my blog is not spam.



Good...
Step 1 eliminates all automatically created blogs. Which also reduces the number of supervisors at Google who need to do step 2.
Step 2 eliminates all human generated blogs which are spam

But then the question bugging me is: Why the hell did their algorithm think this blog is a spam? I'm sure they have some probabilistic algorithm to detect splogs based on simple features. But what features do you think this blog has which could characterize it as a spam?

This is a case of a spam detection algorithm throwing up a False Postive. False positives should be a big no-no for any spam/splog detection system. The algorithm error should incline on the side of having some false negatives rather than false positives, for e.g. have some spam show up in your inbox rather than good mail going directly to your junk folder.

All this talk of false positives remind me of the statement that is the basis of judicial systems around the world: "It's better for ten guilty men to go free, than for one innocent man to be convicted.". Same concept, different context.