Home

Awesome

šŸ™‚ Emoji, flags & emoticons support for Elasticsearch

Add support for emoji and flags in any Lucene compatible search engine!

If you wish to search šŸ© to find donuts in your documents, you came to the right place. We offer synonym files ready for usage in Elasticsearch and OpenSearch analyzer.

Test all synonym files on a real Elasticsearch

Requirements to index emoji in Elasticsearch

There is no requirements for Elasticsearch >= 6.7.

<details><summary>Using older version of Elasticsearch? Open me! šŸ–±</summary>
VersionRequirements
Elasticsearch >= 6.4 and < 6.7You need to install the official ICU Plugin. See our blog post about this change.
Elasticsearch < 6.4You need our custom ICU Tokenizer Plugin, see our blog post (2016).

Run the following test to verify that you get 4 EMOJI tokens:

GET _analyze
{
  "text": ["šŸ© šŸ‡«šŸ‡· šŸ‘©ā€šŸš’ šŸš£šŸ¾ā€ā™€"]
}
</details>

The Synonyms, flags and emoticons

What you need to search with emoji is a way to expand them to words that can match searches and documents, in your language. That's the goal of the synonym dictionaries.

We build Solr / Lucene compatible synonyms files in all languages supported by Unicode CLDR so you can set them up in an analyzer. It looks like this:

šŸ‘©ā€šŸš’ => šŸ‘©ā€šŸš’, firefighter, firetruck, woman
šŸ‘©ā€āœˆ => šŸ‘©ā€āœˆ, pilot, plane, woman
šŸ„“ => šŸ„“, bacon, meat, food
šŸ„” => šŸ„”, potato, vegetable, food
šŸ˜… => šŸ˜…, cold, face, open, smile, sweat
šŸ˜† => šŸ˜†, face, laugh, mouth, open, satisfied, smile
šŸšŽ => šŸšŽ, bus, tram, trolley
šŸ‡«šŸ‡· => šŸ‡«šŸ‡·, france
šŸ‡¬šŸ‡§ => šŸ‡¬šŸ‡§, united kingdom

For emoticons, use this mapping with a char_filter to replace emoticons by emoji.

Installation

Download the emoji and emoticon file you want from this repository and store them in PATH_TO_ES/config/analysis (or anywhere Elasticsearch can read).

config
ā”œā”€ā”€ analysis
ā”‚Ā Ā  ā”œā”€ā”€ cldr-emoji-annotation-synonyms-en.txt
ā”‚Ā Ā  ā””ā”€ā”€ emoticons.txt
ā”œā”€ā”€ elasticsearch.yml
...

Use them like this (this is a complete english example with Elasticsearch >= 6.7):

PUT /tweets
{
  "settings": {
    "analysis": {
      "filter": {
        "english_emoji": {
          "type": "synonym",
          "synonyms_path": "analysis/cldr-emoji-annotation-synonyms-en.txt"
        },
        "emoji_variation_selector_filter": {
          "type": "pattern_replace",
          "pattern": "\\uFE0E|\\uFE0F",
          "replace": ""
        },
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_"
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"]
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "english_with_emoji": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "emoji_variation_selector_filter",
            "english_emoji",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "english_with_emoji"
      }
    }
  }
}

You can now test the result with:

GET tweets/_analyze
{
  "field": "content",
  "text": "šŸ© šŸ‡«šŸ‡· šŸ‘©ā€šŸš’ šŸš£šŸ¾ā€ā™€"
}

How to contribute

Build from CLDR SVN

You will need:

Edit the tag in tools/build-released.php and run php tools/build-released.php.

Update emoticons

Run php tools/build-emoticon.php.

Licenses

Emoji data courtesy of CLDR. See unicode-license.txt for details. Some modifications are done on the data, see here. Emoticon data based on https://github.com/wooorm/emoticon/ (MIT).

This repository in distributed under MIT License. Feel free to use and contribute as you please!