Stemmer Token Filter

edit

A filter that provides access to (almost) all of the available stemming token filters through a single unified interface. For example:

PUT /my_index
{
    "settings": {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : ["lowercase", "my_stemmer"]
                }
            },
            "filter" : {
                "my_stemmer" : {
                    "type" : "stemmer",
                    "name" : "light_german"
                }
            }
        }
    }
}

The language/name parameter controls the stemmer with the following available values (the preferred filters are marked in bold):

Arabic

arabic

Armenian

armenian

Basque

basque

Bengali

bengali light_bengali

Brazilian Portuguese

brazilian

Bulgarian

bulgarian

Catalan

catalan

Czech

czech

Danish

danish

Dutch

dutch, dutch_kp

English

english, light_english, minimal_english, possessive_english, porter2, lovins

Finnish

finnish, light_finnish

French

french, light_french, minimal_french

Galician

galician, minimal_galician (Plural step only)

German

german, german2, light_german, minimal_german

Greek

greek

Hindi

hindi

Hungarian

hungarian, light_hungarian

Indonesian

indonesian

Irish

irish

Italian

italian, light_italian

Kurdish (Sorani)

sorani

Latvian

latvian

Lithuanian

lithuanian

Norwegian (Bokmål)

norwegian, light_norwegian, minimal_norwegian

Norwegian (Nynorsk)

light_nynorsk, minimal_nynorsk

Portuguese

portuguese, light_portuguese, minimal_portuguese, portuguese_rslp

Romanian

romanian

Russian

russian, light_russian

Spanish

spanish, light_spanish

Swedish

swedish, light_swedish

Turkish

turkish