Sunday, August 1, 2010

How does email threading work in Gmail?

A really cool feature of Gmail is how it automatically threads email messages so that they appear as a single item rather than multiple items in your email list. This helps keep your inbox organized, especially if you have really long email threads. The problem is that sometimes I notice that it has false positives and something it has false negatives. I wanted to know exactly how email threading works in Gmail, so I tested it out, and here's what I found out....

The following rules must be met:
  1. The subject must be similar.
  2. The sender must be a part of the thread OR in-reply-to must be used.
1) The subjects do not have to be exactly the same, but they must contain a pre-approved prefix (e.g. test and re: test will be in the same thread). I haven't tried all the prefixes, but some of the valid ones are: RE:, R:, and FWD:. If you modify the subject in any other way, it will start a new thread (e.g. if you modify test to test 123).

2) The sender of the email message must be a part of the thread, or otherwise it will start a new thread. The one exception is if the in-reply-to header is supplied. You can get the in-reply-to header by replying to the original email through Gmail. For example, let's say you have a@example.com forward messages to b@example.com. Now let's say user@example.com sends you an email to a@example.com. The message will also be delivered to b@example.com. If you use the reply form in your email client (e.g. gmail) from the b@example.com, it will most likely automatically add the in-reply-to header for you. This way even though b@example.com was never a part of the thread that exists on user@example.com, it still goes into the same thread.

One interesting thing to note is that if you send email messages from Gmail they will also be threaded. The rules are exactly the same as when you receive them, except for one minor detail. If you send the same exact message twice with no subject prefix (e.g. subject is test not re: test) it does get threaded on the receiving end, but not on sending end. Conversely, if it does contain a prefix (e.g. re: test) it will be threaded in both cases.

Note: there may be additional conditions which I didn't check for. Google's site says that there's a maximum of 100 messages per thread.


Now that I have a better understanding of how this feature works, I can try to limit the number of false positives and negatives.

1 comment:

  1. Nice find. I have a slight correction to your sender exception (in #2). Gmail uses the references field rather than using in-reply-to. If A sends a message to B, B forwards to C, and C forwards back to A, gmail groups these two messages together in A's account. The last message's in-reply-to is the ID for the B->C message. Its references include IDs for A->B and B->C messages.

    ReplyDelete