How email works – Behind the scenes from a developer’s perspective
The numbers are mind boggling. Every minute 204 million emails are sent. Email has been around long before most of us were even born and have fundamentally changed the way we communicate. From saving trees to cutting costs, email sure has changed the world for the better. So how did this game changer technology come into existence?
The origin of email
The first email was sent by a computer engineer, at ARPANET. He sent a text message to his colleague’s computer using the @ symbol to separate user from computer like firstname.lastname@example.org. In the early days, email could only be used to send messages to various users of the same computer. Eventually, with internetworking, computers scattered around the globe could talk to each other and emails reached the inboxes in a few seconds.
How email works – Behind the scenes from a developer’s perspective
Sending an email seems to be instant and simple. However, there’s a lot that happens in that fraction of a second. Let me show you how email works – from a developer’s perspective.
What is Email?
Email is just a piece of text - a text document, which can be sent from one computer to another. Yes, even the attachments like jpg, videos, pdf, etc are all converted to text and then sent over the Internet.
As it’s a text document, it got its own format. Usually, emails are saved as .eml files.
Here’s an example of how email really looks like:
And this is how it looks to the end user when he sees it in the mailbox:
It’s your email client like Outlook, Thunderbird, etc who reads the code and converts it to a beautiful email in a readable format. It’s same like HTML, where the web-browser (Chrome, Mozilla) reads the HTML document and converts it into a web page.
Journey of Email from sender to receiver
When you compose an email and click the send button, your email client (Outlook,Thunderbird, etc) reads all the information like sender email address, receiver email address, subject line, email content, attachments and converts this into email text format (as shown above).
In order to send email, the email client (in this case Thunderbird) needs to chat with another program, called as ‘MTA - Mail Transfer Agent’ on the sending server. Servers are nothing but computers running over the Internet to whom one can query and it will serve (reply) for the same.
How the chat happens: Just as in real life where chat can happen between two people if they speak the same language, the same is true with the email client and MTA program. Both need to speak the same language - the SMTP. SMTP which stands for Simple Mail Transfer Protocol is widely used by mailing servers to send emails to each other.
SMTP is called “simple”, because it consists of only a few keywords (5 to be precise, namely - HELO, MAIL FROM, RCPT, DATA and QUIT). By using only these 5 keywords, it does its job of transferring any type of email from one computer to another. It was intentionally kept simple by the creators, as the past experiences have shown that protocol which uses fewer options are more reliable and long-lived.
However, there were few things which were not addressed, such as authentication. So in order to extend SMTP and at the same time keeping it backward compatible, ESMTP (Extended SMTP) was introduced. ESMTP introduced more verbs like EHLO, AUTH and so forth.
MTA is the program that’s responsible for transferring emails from one computer to another. Both sender and receiver require an MTA to be running on their email service provider. Some examples of MTA program are Postfix, EXIM, PowerMTA, etc. Postfix is the most popular and open source MTA.
After validating and receiving an email from the mail client program, the MTA sends this email as a text document to the receiver’s MTA program. This MTA program will be running on the server of receiver’s email service provider like Yahoo, Gmail. However, in order for any computer to connect to another computer to the Internet, it needs to know its Internet Address, that is, its IP Address (Internet Protocol) example - 18.104.22.168. To find out this IP address, the MTA asks the DNS server.
DNS - stands for Domain Name System. DNS server is a program running on the server which listens on port 53 and provides different types of information about the domain name (DNS records) such as TXT record, MX record, a record and so on.
So an application can ask, give me “a record” for the domain “Google.com” and the DNS server will reply with the IP of Google.com “22.214.171.124”.
Of all the DNS records, MTA is interested in MX records. MX records are the type of DNS records which contain the list of servers who are responsible for handling the incoming emails for that particular domain.
So let's say for sending an email to <email>@gmail.com, MTA will ask the DNS server for MX records for the receiver's domain name (in this case it’s GMAIL.COM) and the DNS will reply with a list of the servers who are responsible for handling email for Gmail.com.
Below is the actual response from the DNS server for GMAIL.COM:
;; ANSWER SECTION:
gmail.com. 3599 IN MX 5 gmail-smtp-in.l.google.com.
gmail.com. 3599 IN MX 40 alt4.gmail-smtp-in.l.google.com.
gmail.com. 3599 IN MX 30 alt3.gmail-smtp-in.l.google.com.
gmail.com. 3599 IN MX 20 alt2.gmail-smtp-in.l.google.com.
gmail.com. 3599 IN MX 10 alt1.gmail-smtp-in.l.google.com.
In the above, the numbers 5,40,30,... represent the priorities i.e. the order in which MTA will approach the servers for sending email (In this example, if server 5 is not available or busy, MTA will approach 40 and so on)
The sender MTA sends this email to receiver MTA using SMTP protocol
The MTA running on the incoming server doesn’t just accept any emails from any servers. As it may be a SPAM mail. A SPAM email is any email sent to the receiver without its permission. SPAM checking is a complex process which requires lots of parameters. Of course, we would not undergo into the details of SPAM checking process, for now you can consider it as a black box process.
Once the Receiver MTA verifies that it is not a spam, it then delivers the email to the one of the mailbox dedicated for that receiver. The receiver’s mailbox is the place where all mails for a particular user are stored. Usually it is part of the MTA program and resides on the same server as the incoming MTA. The actual data format of the receiver’s mailbox varies from MTA to MTA or provider to provider. For example, in case of Postfix, there are 2 types mailbox format - maildir & mbox. Basic difference is - Maildir stores 1 file per email whereas mbox stores all emails in a single file.
Now, the mail is inside the mailbox, but the user has not yet read it. The user needs to log in to his email client and open this particular mail to read. However, at the backend, the email client connects with the email provider’s MTA program and downloads this email and stores it in local storage. This communication happens not via SMTP but IMAP. IMAP is also a language (protocol) like SMTP. Unlike SMTP, IMAP is used to receive emails. There is another protocol which can be used to receive emails called POP - Post Office Protocol. However, IMAP is better and newer than POP
Most of the email users configure their client with IMAP. The major difference between POP and IMAP is that POP deletes all the emails from the server once they are downloaded. Whereas IMAP downloads all the emails and at the same time leaves copy on the server.
The Email client (Outlook, Thunderbird..)
The email client is a program, which helps user to receive, compose, send and other email related tasks. Its examples are Thunderbird, Outlook, etc. It’s also possible to send email without using email client. For instance, you log into your Gmail or Yahoo account and send email. This time instead of email client talking to MTA, the provider’s (Gmail, Yahoo, etc) application talks to the MTAs.
Finally, user has read the email 🙂
So next time when you send an email, you will be thankful for all the human efforts that made this technology available on single click of a button.