My opinion on using

Recently I read about a new service that will protect your email address against spam bots and email harvesters. It’s called scrim and advertises with being able to use a save, short and cute URL to replace your email address in tweets or on social networking sites. Great Idea! (Yes, with capital I!)

Someone else has already given a good example of weaknesses I wanted to show, but I didn’t even bother looking at the JavaScript code much. His “cracking” algorithm will surely be faster. It consists of just trying all nine options in sequence until the returned page doesn’t give the error message. There is no limit on how often you can try to select the right answer or a timeout between those attempts and the POST can easily be converted to a simple get.

My first idea was a little more complicated and obviously slower. When I saw the CAPTCHA image I thought of how easy it would be to crack. Even a partial crack would be fine, since the obfuscated answers are different enough to get by with three letters clear. The largest is 5 at all times. Yes, the bottleneck would be in the OCR part of my algorithm AND in the return part. Which would be solved by running multiple threads.

Let’s cherry-pick from both idea’s. Use GET requests and multiple threads. And now add a little thing called search engine queries for anything with in it that looks like one of those addresses… Bots, don’t they usually work in networks? So all you really need is a small amount of bots running on multi-core systems, feed them a list of those addresses and wait for a little while?

Don’t get me wrong, I like the idea of using a service like scrim. But only after the bots preventions are truly crippling brute force attempts. Traditional CAPTCHA isn’t that save anymore. Even if few can program a multi-threaded algorithm properly it only takes one of the “dark” side to have millions capable of imitating the trick in a short while.

Just now stuff like the map-reduce algorithm popped into my mind. I’m too distracted to actually come up with a proof of concept now and I have no need for a list of email addresses. Just the idea that it isn’t that well protected made me write this.

What could be improved?

  • don’t give the answer. It doesn’t matter how much clutter is around it, a lucky guess or little effort (in comparison to what I imagine a decent bot net is capable of) get the attacker enough revenue.
  • restrict the number of tries
  • (temporary) blacklisting of IP addresses
  • vary between CAPTCHAs and other techniques to slow down bots

Once it get’s out of beta I will look and see if my opinions are still the same. For now I couldn’t have done better and kind of envy them for coming up with a good idea. Good thought, flawed execution (for now).