Popular Spam Protection Technique Doesn't Work


The idea behind entity encoding is that you replace characters of the email address with codes that a browser will understand but a spammer's address extractor may not recognize. At one time, address extractors were pretty naive tools and could be fooled this way. I ran some tests to see if this still was true, and, unfortunately, it no longer is.

The term entity encoding actually is a misnomer. It typically is used to mean one of two things. In one case, it means replacing characters with their equivalent numeric character reference. For instance, a web browser will render the numeric character reference @ as a @ character. These sorts of encodings can be used anywhere in an HTML document, including a mailto: tag. A naive address extractor may fail to decode the address correctly, thus keeping it away from the spammer.

Sometimes people say entity encoding when they mean URI escaped encoding, as described in RFC 2396 section 2.4.1. For instance, the sequence %40 is equivalent to a @ in a URL. These encodings can't be used anywhere in an HTML document, only in the links, including mailto: links. Again, a naive address extractor may fail to recognize them.

I happened to use the @ character in these examples, but any part of the mailto: URL could be encoded in this fashion. All you need to do is convert the character to the appropriate numeric code and rewrite it an encoded fashion. Just be sure you use the decimal value when composing a numeric character reference and the hexadecimal value when composing URI escaped encoding. Or, better yet, there are online tools that will do this for you.

Entity encoding is a simple technique. It is documented in many places as a possible method to protect your email address. Unfortunately, although it may have worked at one time, it's not useful with current address extraction tools.

To test address extraction capability, I downloaded a product called Web Data Extractor v4.0. I setup a target web page with a number of email addresses, employing a variety of entity encoding techniques. The results are shown in the table below.

trial document source what extractor saw
1 user01 [at] example [dot] com user01 [at] example [dot] com
2 user02@example.com user02 [at] example [dot] com
3 <a href="mailto:user03&#64;example.com"> user03 [at] example [dot] com
4 <a href="&#109;ailto:user04&#64;example.com"> user04 [at] example [dot] com
5 <a href="mailto:user05%40example.com"> user05 [at] example [dot] com
6 <a href="%6Dailto:user06%40example.com"> (not seen)
7 <a href="%6dailto:user07%40example.com"> (not seen)

In trial one, the address was in the clear. The remaining trials used some form of entity encoding. Only in trials 6 and 7, where the "m" in mailto: is URI escape encoded, did the extractor fail. Still, I can't recommend the method. The author could easily remedy this defect in the next release.

Entity encoding is attractive because it's simple, portable and does not hamper functionality. At one time it may have worked, but it no longer is a useful method to hide your email address from web page harvesting.

Dec 5 update: I've posted an article that responds to some of the ideas proposed in the comments below.


Comments have been closed for this entry.

re: Popular Spam Protection Technique Doesn't Work

This is as good a place as any to mention my trick, which is to use javascript.

The code is this:

function sendto (domain, account, text) {
var atsign = String.fromCharCode(64); // @
document.write('<a href="mailto:');
document.write(account + atsign + domain);
if (text==null || text.length==0) {
document.write(account + atsign + domain);
} else
document.write('\" title=\"'+text+'\">'+text);

I call it thusly inside the HTML:

<script language="javascript" type="text/javascript">sendto('crossroads.net','adamrice')</script>
<noscript><address>adamrice - at - crossroads - dot - net</address></noscript>

re: Popular Spam Protection Technique Doesn't Work

So, my now-preferred-method of supplying a form, followed by a line of text along the lines of 'to contact me, use the form above, or send a message to XXX at this domain' is the way to go.
<grumbles> why wont they just leave us alone?

re: Popular Spam Protection Technique Doesn't Work

I encode the entire mailto: uri including the mailto: tag and any mention of the word email. Not because it is invincible but because it makes the process of email extraction more expensive in time and processor cycles for the spammers.

Using Javascript creates problems for non-spammers who don't deserve the hassle (particularly for me the guy who has to maintain Javascript solutions in the face of multiple half-@$$ed implentations of it in browsers both old and new).

BTW, URL encoding any of the characters in the mailto: URI other than the actual address itself makes the link either useless in Safari 1.1.1 or it brings up a dialog box complaining:

No file exists at

re: Popular Spam Protection Technique Doesn't Work

I'd be curious what you thought about something slightly more complex like the Hiveware Enkoder, which "generates a unique and random key and ties that to an encrypted array containing your address for even better protection" via a Javascript.

re: Popular Spam Protection Technique Doesn't Work

What is a good net citizen to do ?!
All these solutions are great, but they will fail on the next generation of harvester robots. Current methodologies rely on the fact that robots trawl websites for HTML source pages, which allows systems such as the Hiveware Enkoder to sucessfuly prevent the harvesting of addresses because the page has to be rendered and javascript processed before the address is available.
New robots are now being built however that ( in just one example ) use browsers rendering engines and javascript processors to construct the page as the end user would see and then analyse the resulting DOM data structure. Therefore systems like the Hiveware Enkoder become useless because the robots can now see the processed javascript email address.
It is a shame that there are people out there willing to write software for this purpose. They must have no morals whatsoever !
I still think we need to fight this war at the protocol level. Just my 2 cents.

re: Popular Spam Protection Technique Doesn't Work

I discovered this about a year ago even though my email obfuscator created a random mix of hex, ascii and characters.

I recently upgraded the Mean Dean Anti Spam Email Obfuscator to not only randomize the mix, but also to provided a javascript option.

So far so good, but I suspect that's only for so long.

re: Popular Spam Protection Technique Doesn't Work

I always suspected that such encoding was a waste of time, this just proves it. But look what it says in the comment form of this page:

Email Address: Displayed (entity-encoded for spam protection)

Ironic, eh?

I think the only way to be safe is to use a form and a cgi script to accept and send mail. Give the script a meaningless name - pinky.pl, not formmail.pl - and do the same for the <input> fields - george, midred not message, email. That way an automated script won't guess what those form fields are.

Like Lea says, why can't these buggers leave us alone. If I want a super spy camera or a new mortgage or a ton of cheap viagra I'll search Google for it!

re: Popular Spam Protection Technique Doesn't Work

Chris & Lea are right - these buggers should leave us alone. Unfortunately they won't, because they have no moral fibre and have never grown up. My (new) email address was put on a link on a friends very large website less than 48 hours ago and this morning I have received 14 spam emails, all from the USA and Nigeria. It makes you think that these sad idiots do not have anything better to do with their lives - because they probably do not!

...However, thanks to all you wonderful people for these excellent hints & tips on how to take some kind of avoiding action. I am going to try some out today.

Regards, Cal.

re: Popular Spam Protection Technique Doesn't Work

Why not juste create a .gif image with the e-mail address. It looks fine, is very easy to realize and it'll take a long time before spammers do OCR on all the pictures of the web to look for e-mail addresses.

The only problem with this solution is that the reader cannot copy/paste the address to use it.

Regards, Olivier.