Comments and Cash

The rapid evolution and adoption of the MTBlacklist plug-in is a surprise to no one. Comments have been sitting there with scads of exploit potential the moment the first decent looking woman (who could type AND owned a digital camera) discovered the wonders of LiveJournal.

The MTBlacklist plug-in strikes me as a very good solution for the wrong problem. MTBlacklist will “block incoming comments/trackbacks with content matched by any one of the entries in the blacklist.” Groovy. It also features a “Default blacklist [that] contains over 400 known spam strings for immediate protection on install.” Super groovy.

So, what we’ve done here is move from a world were webloggers afflicted with spam stop blocking IPs and start blocking based on content strings. My question is, simply, “what is the difference?” Now, I have not installed MTBlacklist, so I am not an expert, but the problem which needs to be solved isn’t the content, it’s the people who are generating the content.

Segue.

Back when everyone was still shaking in their boots about transferring money over the Internet, a company called PayPal had a problem. They were complementing Ebay as a (semi) neutral clearinghouse for cash transactions and, by doing so, needed to provide various financial vehicles to move money. This meant they need to support credit card transactions and interface with banks. The russian mafia saw PayPal as a great way to launder money, so they set-up an automated system to create accounts on PayPal using stolen credit cards and they started to move money around. Go bad guys!

To handle this fraud, PayPal came up with what seems like a goofy idea. They started requiring new account holders to read a graphic which contained some bizarrely formatted text. The idea was that the text could not be read by optical character recognition software which meant that every new account would need to be touched by, at least, one human being. Human interaction and automation doesn’t scale… they’re aren’t enough humans or enough time, so problem solved. The PayPal “human interaction” authorization is all over the place now. Go good guys!

Back to our original topic.

Yes, I am suggesting that there is little difference in being authorized to have a bank account and being authorized to post on a weblog. In both cases, the person who wants to post must prove they are a human being. The PayPal goofy text graphic approach might sound like overkill and will likely decrease the chance someone will post to your weblog, but it solves the problem. Human beings don’t spam.

[10/22/03 Update]:

The folks at MovableType are all over this.

10 Responses

  1. I’ve been meaning to write up a rant on this topic for a long, long time now. Pity I never got around to it; saying “I told you so” would have been a small, mean, but nonetheless savorable pleasure.

    Anyway, here was what was gonna be the punchline/pullquote:

    “Those who do not remember Usenet are doomed to repeat it. Badly.”

    The only surprise here is that it took as long as it did for the spambots to discover blogs, especially given MT’s near-total dominance of the independently-maintained corner of the blogoverse, and the near universal usage of the (completely insecure) Blogger XMLRPC API. We might as well have tattooed “Insert spam here” on our collective asses.

    So now we get to watch the MT developers first, and the rest of the blogosphere soon after, quickly repeat the same arms race / learning cycle that us NNTP and then SMTP developers/administrators did. Rotsa ruck, kids. Block by IP address? Block by frequency analysis? RAPING YOU VIA MILLIONS OF OPEN PROXIES. Block by regular expressions? Bayesian filters? SHOVING RANDOM TEXT AND OBFUSCATED URLS BETWEEN YOUR BLEEDING LIPS. Cute challenge-response methods? OWNING YOUR ASS WITH A WAREZED OCR PROGRAM, AND HA HA HA YOU SEEM TO HAVE DRIVEN YOUR ACTUAL CUSTOMERS AWAY…OOPS I GUESS I WANTED THEM AROUND TOO.

    As far as I can tell, it’s axiomatic that spam expands to fill any available common space that can host it, eventually driving out the users of that commons that the spammers wanted to contact in the first place. And the ratio of otherwise bright engineers who think about the problem for 0.0021 seconds and immediately announce that they’ve got the perfect solution to the number people who’ve actually solved the problem is…oh wait, can’t divide by zero.

    BTW, I’m pretty sure that Yahoo was using the “type the phrase in this fuzzy GIF into this field” method long before Paypal, but I could be wrong. (Despite my snarking above, it’s been a reasonably effective holding action, although I don’t expect it to last. I suppose we should be grateful that nobody’s yet tried to patent it.)

    Amusing related note: AT&T has just announced that they’re changing all of their mail servers over to a default-reject policy: if they don’t have email from a human being, somewhere, claiming to exercise administrative authority over an IP address, they’ll no longer accept mail from it. Oh well, SMTP was nice while it lasted.

  2. That technique, for what it’s worth, is called CAPTCHA, and was developed at CMU.

    Of course, it’s a cold war — the CAPTCHA website, at http://www.captcha.net/, reports that two teams have written programs which can recognize two distorted-word captcha techniques with 70-80% accuracy, and as the bar goes higher you end up getting in the way of humans too.

  3. I found this entry informative.

  4. Michael Daines 13 years ago

    I once read in WIRED about a scheme related to this, in which humans were augmenting the fake account process. The situation they described involved a spam program that would sign up for accounts on chat servers that had such distorted-text verification schemes (or other schemes, where a user might identify the object in an image) by sending the image to humans whose only job was to identify its content. WIRED reported that these people were often teenagers getting access to pornography in exchange. But it doesn’t sound exactly like something that’d happen in real life or scale comfortably.

    It seems like the identification of what object an image is depicting where the number of possible images is very large might also be a good verification method. There are large databases of stock photography images on the internet as I’m sure you’re aware. Imagine being asked to identify the animal pictured, and there are 10,000 different photos of dogs that could be displayed which are, to humans, obviously all dogs. The problem with this method is while it makes it much harder for a computer to identify the image, it could severely limit the number of ways an image can be identified.

    But in the world of weblogs, this could be a real cute way to verify the human-ness of a poster if you weren’t so worried about it being very strong, as maybe you could pick your own images (am I happy or sad in this picture?) or something.

  5. I’ve been working in the email spam space for quite a while now, and I’m very excited about MT-Blacklist (haven’t set it up yet because the irritation factor hasn’t been great enough, still only a few comment spams a day).

    Specifically, filtering on WHERE THEY WANT YOU TO GO is the right approach. It’s impossible to blacklist all the IPs because they just get a new IP or compromise one of the millions of Windoze desktops out there on cable modem uplinks. IP-based RBLs for email stopped being a good idea when major ISPs like AOL started getting added. Looking for the call to action in the message is one sure way to avoid this.

    And about the graphical challenge, that’s not very friendly to many vision impaired folks out there.

    I’m hopeing MT-Blacklist will move towards some distributed clearinghouse of URLs with an automatic algorithm for ranking the trustworthiness of the submitters so that we don’t have people gaming the system.

  6. One of the interesting things about CAPTCHAs is that a true CAPTCHA is win-win: they’re designed to be hard problems in AI, so if they’re compromised, AI research has moved forward. For example, the obfuscated text recognizer can now be rolled into machine vision systems elsewhere. It’s an interesting study in incentives. CAPTCHAs give spammers incentive to do something that at least minorly advances society, while AI researchers have an incentive to build CAPTCHAs that are hard (but not too hard–those you save until the easier ones get cracked).

    The image identification thing doesn’t quite make it as a CAPTCHA. One of the rules of CAPTCHAs is that the entire thing–source code, data, etc–have to be open source; no security through obscurity allowed. So the problem as posed is trivial. Because the originator must classify all the images in order to verify the correct answer from the user, this is available as part of the source and data. Note that even if the originator cheated and did not make this available, the database had to be small enough to be human-classified in the first place, so it won’t be long until every spammer has it classified too. Now, if the program were to alter the image in a way that obfuscates it for computers but so that it is still identifiable to humans, this would be an acceptable method.

  7. One issue that you seem to be forgetting with this problem is that it could prevent some legitimate humans from accessing sites as well. Reading an image breaks the expected flow of information, and would cause screen readers and other such things to fail. I think that paypal offers a sound clip of someone saying a word as an alternative for blind people, but I recall some sites that don’t offer this, and there’s still the question of whether sites can be made sufficiently accessible while excluding the automatons.

  8. A Leisure Town archive has been located: http://comp.uark.edu/~itaylor/leisuretown.tar.gz

    Thank you.

  9. > which meant that every new account

    > would need to be touched by, at least,

    > one human being.

    No, it means that every new account would need to be touched by a human with a completely functional pair of eyes and a browser that displays images. Not exactly the same. You are thinking of a Turing test. Image recoginition always fails the Turing test.

    There are two Movable Type plugins (although one goes a bit further) that utilize this technology although I will leave searching for them up to the user since I find it to be a particularly onerous solution which cuts out an already discriminated against segment of our society.

    > That technique, for what it’s worth, is called

    > CAPTCHA, and was developed at CMU.

    It was developed in parallel at i-drive and first released on the net in the summer of 2000. How do I know? Because I was Product Manager for the group that developed it.

    Great solution at the time. Terrible solution in retrospect.

    > BTW, I’m pretty sure that Yahoo was using the

    > “type the phrase in this fuzzy GIF into this field”

    > method long before Paypal, but I could be wrong.

    Actually, the order was i-drive, Hotmail and then Paypal. I don’t know how long after Yahoo debuted theirs…

    > I suppose we should be grateful that nobody’s

    > yet tried to patent it.

    i-drive did try, but then flamed out before (if I remember) the patent could be filed.

    Anyway, comment spam is far different than email spam. Install MT-Blacklist, and you will understand why. That is, unless you aren’t getting any spam…

  10. You have done a very nice job with your website, I enjoy reading the various posts and opinions of your other visitors.

    I do Home finance, Refinance, Mortgage, and Second Mortgage loans nationwide.