doing TODAY and not getting caught in the HYPE of tomorrow

Alan Williamson

Subscribe to Alan Williamson: eMailAlertsEmail Alerts
Get Alan Williamson: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


LWM Speaks with Julian Field

MailScanner addresses the growing spam problem

The latest dirty word to creep into people's vocabularies is used to describe the flood of unwanted e-mail: spam. This unintended consequence of the Internet has made checking e-mail a chore of wading through obscene and unwarranted material. Gone are the days when you got excited at the e-mail notification popping up. But before you give up hope and start reaching for the pen and paper again, there is an army of soldiers out there on the front line of this spam war, helping to keep your box free of this nonsense. I had the good fortune to catch up with Julian Field, who is the lead behind one of more popular anti-spam tools, or in other words, my hero!

LWM: The problem with spam is coming into the mainstream headlines. Do you think it's gotten any worse, or is it that we have a brighter media spotlight on it?
Julian Field:
The problem has definitely grown. Assuming our ability to spot spam has stayed roughly the same, then the problem has doubled or tripled in the past 12 months. But it is getting far more press coverage than it did a year ago, it regularly makes it onto news Web sites such as the BBC. However, most of the death and gloom statistics produced by the marketing departments of the big anti-spam firms should be taken with a large bucket of salt. Most of their graphs predict that more than 100% of e-mail will be spam within the next year or so!

LWM: Based on your experience with combatting spam, is the lack of initial security with Microsoft's Outlook and Windows a major reason for the increase in spam as spammers exploit various holes?
Not really, no. The spammers haven't really turned into virus writers yet. One of the bigger holes used by the spammers is open mail relays, which are servers that will accept and deliver mail for anyone. The spammers send one message with 5,000 addresses to the open relay, which then has the (big) job of splitting that message up into one for each address.

LWM: MailScanner is your brainchild; tell us a little about how it came into being.
Here at the university we had a small research project to do a survey of the marketplace of products which protected sites from e-mail viruses, at the gateway. We invited various suppliers in to talk to us, and give us price quotes for their products and services for the university as a whole. They all wanted between £50k and £70k to set up, then £25k per year every year after that. This is far more money than a university can reasonably afford to throw at the problem. So I started thinking that this problem really can't be that hard to solve, and solve better than the commercial suppliers had done. While eating lunch one day, I figured a way of using a command line-based virus scanner to scan e-mail instead of scanning files. 10 days later, version 1 was born and I started scanning all our department's incoming e-mail for viruses. The rest, as they say, is history.

LWM: So in that 10 days, what was your initial choice of platform and language that you opted to code MailScanner with?
I decided right from the start to write it in Perl. There were three main reasons. First, using Perl eliminates all the memory allocation and buffer overrun problems associated with something like C. Second, only parts of the process are CPU intensive. Fast disks and fast networks are just as important, so Perl's speed relative to C was not a problem, but writing it in Java really would have been too slow. Third, Perl is far more portable than most other languages, so it can be run on almost any platform. I don't know enough of the huge Windows API to be able to write it effectively for Microsoft platforms, and I prefer basing critical systems on Unix anyway.

LWM: How does MailScanner work?
You take a setup where you have one sendmail/Exim/whatever process collecting mail via SMTP and putting it all into an incoming queue, where it just collects. You have another one picking up mail from an outgoing queue and delivering it. MailScanner sits between the two queues, moving mail from the incoming to the outgoing as it scans it.

As a result, MailScanner is not involved in providing SMTP service, or delivering your mail. None of that needs to change, which is why MailScanner is so easy to install. I refuse to reinvent the wheel, so anything that can be done by other programs is done by them and not MailScanner. There are lots of nasty problems that have been solved by other people, such as reliably unpacking nested archives (e.g., zip files containing more zip files) or providing decent SMTP service. MailScanner doesn't attempt to do this; it leaves it to the virus scanners as they already do it very well.

LWM: When designing the project, what were your core objectives? Did you meet them?
My objectives were to not reinvent the wheel (see above) and to write secure code. Every time I add more code to implement any new feature, I assume someone will launch an attack on it, regardless of how obscure those attacks might be now. Once upon a time, stack corruption via buffer overflows in C code were considered obscure. So I try to protect against everything that could happen.

LWM: Talk to us a little more on writing secure code. LWM's own James Turner recently published an online piece on how hackers exploit buffer-overrun problems. How did you ensure your code was 'safe'?
To start with, writing it in Perl eliminates 99% of the problems as you don't have limits on array lengths, string lengths, or anything like that. You also don't have any memory freeing problems to worry about, Perl does all the hard work for you. I always try to write code with a view to how it could be attacked, and so "sanitize" data read from the user wherever possible.

For example, in attachment filenames, before using them for anything such as storing files or writing report messages, I only allow a very restricted set of characters that I know cannot be used to do anything "clever" in the shell or in the filesystem and I restrict their length, what characters they can start and end with, and other similar constraints. If I accepted any filename, how long would it be before someone managed to encode an entire message attachment in the filename of another attachment? That could be used to force an attachment to appear in a message sent to a user because their message was blocked for some reason.

LWM: What is the most amazing fact regarding e-mail you learned while doing this project?
That an e-mail message containing nothing but plain text could be damaging. You would be amazed to see quite how hard Outlook works to find attachments buried in your message, you can even put an attachment in the 'Subject:' header of a message.

LWM: Has your unversity recognized your efforts? Do they block off time so you can explicitly work on MailScanner during the daylight?
They have been very helpful, and have bought some equipment for me so that I have a decent set of test and development servers. I still have to fit in all my other work commitments, but they are happy for me to spend any spare time working on MailScanner, and a bit more when necessary. They happily purchase any odd components, software, or books I need too, which is very helpful. They also pay my expenses for any trips I make to speak at conferences and exhibitions.

LWM: MailScanner is an open source project. What decision process did you go through to come to this conclusion.
I'm not entirely driven by money - I like to help if I can. Also, universities are notoriously bad at making money on their inventions anyway, so I decided it would be best all round if it was open source. I initially wrote it only for other members of the UK academic community, and I knew they would never use it unless they could see the source.

LWM: How would you say MailScanner differs from the alternatives out there?
It is far more efficient than its competition because it handles messages in batches. The more heavily loaded your mail server gets, the more efficient MailScanner becomes. Not many packages can claim that performance actually improves under higher load.

It doesn't affect your mail delivery or SMTP service at all, so it's far easier to integrate into your working e-mail system. If you run into trouble, backing out MailScanner is a two-minute job, which makes potential users much more inclined to try it in the first place.

LWM: With respect to the efficiency can you explain what you mean? It would seem that the more mail comes in, the more files that are sitting in the first directory needing to be processed before moving to the second queue.
One of the bigger overheads is the time taken to start up the virus scanner. With recent versions of some of the better scanners, this has become quite an issue. If I were to scan each message individually, starting up the scanner would be the biggest part of the process. By processing messages in batches, I need to start the virus scanner only once for each entire batch, which saves a lot of time. Unless the incoming queue is huge, most file systems handle the directory very quickly. Of course there are a few notable exceptions such as most versions of ufs. I am also planning to write some code to recognize that the queue has grown very large and switch into an 'accelerated' mode whereby it stops guaranteeing that the messages are processed in date order and therefore only needs to scan a small part of the queue.

LWM: Julian, thanks for taking the time to talk to LWM. Before we go, can you tell us the future for MailScanner? What's coming in the next release, and what are you looking to do to promote this project?
The main new feature I am working on at the moment is 'Message Content Protection.' This is content scanning of text in the message body and in attachments, with rules and scores similar to the present spam detection, but kept completely separate. This could be used for everything from enforcing corporate 'Use of E-mail' policies, to automatically removing delivery receipts from your outgoing mail, which are generated by Microsoft Exchange and which most people consider an unwanted security hole.

In the long term, I have plans to investigate building a MailScanner 'appliance' which would be a very simple system that ran MailScanner and nothing else, and which would require almost no maintenance.

About Julian Field
Julian Field has vast experience with e-mail systems and has been a postmaster for many years. He has considerable skills in designing and delivering reliable software solutions for mission-critical applications and has always had a strong interest in computer and network security. He has been fighting computer viruses for many years, and has spent the last three years creating, developing, and supporting MailScanner, initially just for the benefit of the UK academic community. Along the way, he has also acquired considerable knowledge of the problems caused by bulk e-mail, and has been featured in local, national, and international media programs and conferences on the subject. Julian can be contacted at [email protected].

More Stories By Alan Williamson

Alan Williamson is widely recognized as an early expert on Cloud Computing, he is Co-Founder of aw2.0 Ltd, a software company specializing in deploying software solutions within Cloud networks. Alan is a Sun Java Champion and creator of OpenBlueDragon (an open source Java CFML runtime engine). With many books, articles and speaking engagements under his belt, Alan likes to talk passionately about what can be done TODAY and not get caught up in the marketing hype of TOMORROW. Follow his blog, or e-mail him at cloud(at)

Comments (2)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.