Stemmer Token Filter

edit

A filter that provides access to (almost) all of the available stemming token filters through a single unified interface. For example:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "lowercase", "my_stemmer"]
                }
            },
            "filter" : {
                "my_stemmer" : {
                    "type" : "stemmer",
                    "name" : "light_german"
                }
            }
        }
    }
}

The language/name parameter controls the stemmer with the following available values (the preferred filters are marked in bold):

Arabic

arabic

Armenian

armenian

Basque

basque

Brazilian Portuguese

brazilian

Bulgarian

bulgarian

Catalan

catalan

Czech

czech

Danish

danish

Dutch

dutch, dutch_kp [1.3.0] Added in 1.3.0. Renamed from kp

English

english [1.3.0] Added in 1.3.0. Returns the <<analysis-porterstem-tokenfilter , light_english [1.3.0] Added in 1.3.0. Returns the <<analysis-kstem-tokenfilter , minimal_english, possessive_english, porter2 [1.3.0] Added in 1.3.0. Returns the <<analysis-snowball-tokenfilter , lovins

Finnish

finnish, light_finnish

French

french, light_french, minimal_french

Galician

galician [1.3.0] Added in 1.3.0. , minimal_galician (Plural step only) [1.3.0] Added in 1.3.0.

German

german, german2, light_german, minimal_german

Greek

greek

Hindi

hindi

Hungarian

hungarian, light_hungarian

Indonesian

indonesian

Irish

irish

Italian

italian, light_italian

Kurdish (Sorani)

sorani [1.3.0] Added in 1.3.0.

Latvian

latvian

Norwegian (Bokmål)

norwegian, light_norwegian [1.3.0] Added in 1.3.0. , minimal_norwegian

Norwegian (Nynorsk)

light_nynorsk [1.3.0] Added in 1.3.0. , minimal_nynorsk [1.3.0] Added in 1.3.0.

Portuguese

portuguese, light_portuguese, minimal_portuguese, portuguese_rslp [1.3.0] Added in 1.3.0.

Romanian

romanian

Russian

russian, light_russian

Spanish

spanish, light_spanish

Swedish

swedish, light_swedish

Turkish

turkish