Wednesday, July 14, 2010

How Gmail Filter Email-Matching Works

I was trying to create some complicated Gmail filters. However, there doesn't seem to be any documentation of how the to and from fields work exactly. So I tried figuring it out myself...

General Matching Guidelines

The matching criteria is similar to Google's search. There is no word stemming, so you must enter full words (e.g. joh will not match Not even plural stemming, like what Google search has, is supported (e.g. app will not match

Word order does not matter, unless the words are enclosed in quotes (e.g. "smith john" will not match Generally, symbols are ignored (for more information see the next section).

Words are split on everything except: letters, numbers, and underscores. The most common symbols that split words are +.@. This means that foo will not match but will match The @ character itself is not considered a word and can be skipped over (e.g. "smith gmail" will match

You can use the OR operator in addition to grouping () for some complex conditions.

Symbol Behavior

When you enter a symbol in the filter box, they usually behave differently:

  • Symbols that act as x y: ~#$%^*+;",<>? and the grave character. For example, smith~john becomes smith john, which matches
  • Symbols that act as "x y": -=\:'./ -- For example, john-smith becomes "john smith", which matches
  • Symbols that are treated literally: &_ -- For example, john_smith will match, but not
  • Special symbols: !@()[]{}|
    • !: john!smith becomes john -smith, which matches but not
    • @:
      • @ is stripped out at the end of a word. For example, john@ becomes john, which matches
      • @ is stripped out at the start of a word. For example, will become, which matches
      • @ in the middle of a word will generally require the full address for a successful match. For example, john.smith@gmail will not match Additionally, symbols will be taken literally. For example, to match you must use both and will no longer work.
      • @ in a different location in the middle of a word has strange behavior. For example, when trying to match
        • john@smith@gmail@com does not match
        • gmail@com does not match
        • @gmail@com does not match
        • smith@gmail@com does match
        • does not match
        • "john" does not match
        • "john.smith@gmail com" does not match
    • | acts as the OR operator.
    • Parenthesis act as grouping for OR and AND filters.

Other Matching Behaviors

The default account you use (e.g. will match all variations of your address. This includes dot notation, plus addressing, and using the domain.

Here's a brief explanation of each:

  • Using dot notation: You can enter as many non-consecutive dots in your email as you want. For example, if your email is, mail sent to will still arrive at your account.
  • Using plus addressing: After your account name, you can enter the + sign and whatever text you want afterwards followed by the Gmail domain. For example, mail sent to will arrive at
  • Using domain: Any mail sent to your <your-gmail-account> will arrive at your address. For example, mail sent to will arrive at

Any of the above can be combined (e.g. will still go to

Interesting Consequences

  • Can't match all dot versions of your Gmail address easily: If you're in the habit of giving out the . version of your email address to prevent spam (e.g., you cannot easily create a filter for all dot version of your address since these are split up into separate words (e.g. j ohn smith). When you only use one variation of this, it's easy to create a filter and, for example, send it to spam. However, if you start using different variations (e.g. it causes different words in the address (e.g. jo h n smi th), forcing you to create a distinct condition for each variation you use.
  • The + symbol is worse than the "" operator when matching plus addresses: If you're trying to create a filter for a plus address, your best bet is to include the full address (e.g. If for some reason you aren't using the full address, the + operator is actually worse than the "" operator. For example, john+foo is worse than using "john foo", since the former will match Keep in mind that the later is not bullet proof either, it will still match It just guarantees that the order is correct. For clarity, you could use "john+foo", but realize that it's the same as "john foo".
  • You must use negation to match all email sent to plus addresses: To filter on all plus addresses (e.g. to send them to spam), you should use the query -"john smith gmail com". The first part of the query will match any plus addresses you have. The second will remove all those that don't have the words in the exact order. For example, will not match since it has the word foo in between the other words. Note that there is one weird, and very unlikely, case where this won't work:, since it does have the words in the specified order.


  1. All important tools are already provided by Google. However, I'll be more glad if they will add an email encryption tool directly inside the gmail.

  2. Haha, never realized how retarded gmail filtering really is... just want to match a [tag] prepended to subject field, fat chance ):

  3. Thanks for your article.
    Nevertheless, I don't find how to isolate an entire domain. For example, I would like to find all emails from ( and not xyz@[something] How can I do that?