Home

Awesome

ParsiAnalyzer

ParsiAnalyzer is an analysis plugin for Elasticsearch. Analysis is a process that consists of the following steps:

An analyzer is really just a wrapper that combines Character filters, Tokenizer, and Token filters. Elasticsearch provides many Built-in Analyzers but there's still room for improvement especially for Persian language. This plugin provides tools for tokenizing, normalizing and stemming Persian text.

Key features

Installation

To install the plugin for Elasticsearch 7.13.1, run this command:

bin\elasticsearch-plugin install https://www.dropbox.com/s/cr61dmnx95taivi/ParsiAnalyzer-7.13.1.zip?dl=1

Build

If you want to build ParsiAnalyzer for any specific version of Elasticsearch, follow these steps:

  1. Make sure you've installed JDK and Maven on your computer
  2. Clone project
  3. Open pom.xml
  4. Under dependencies tag, change Elasticsearch version to your desired version
  5. Open plugin-descriptor.properties
  6. Change elasticsearch.version to your desired version
  7. Run this maven command: mvn clean package
  8. In the target/releases folder, you’ll now find a zip file. install the plugin using this command: bin/elasticsearch-plugin install file:///path/to/ParsiAnalyzer.zip

Usage

To see how this plugin works, you can use Elasticsearch's analyze API:

POST _analyze
{
  "analyzer" : "parsi",
  "text" : "روباه قهوه‌اي چابك از روی سگ تنبل می پرد"
}

If you find stemming a little annoying, you can always use the standard variation of ParsiAnalyzer:

POST _analyze
{
  "analyzer" : "parsi_standard",
  "text" : "روباه قهوه‌اي چابك از روی سگ تنبل می پرد"
}

ParsiAnalyzer can be specified directly in the field mapping as follows:

PUT /my_index
{
  "mappings": {
    "blog": {
      "properties": {
        "title": {
          "type":     "text",
          "analyzer": "parsi" 
        }
      }
    }
  }
}

Contact me

Email: n.esmaielyfard [at] gmail.com