WARNING: Version 6.1 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Whitespace Tokenizer
editWhitespace Tokenizer
editThe whitespace
tokenizer breaks text into terms whenever it encounters a
whitespace character.
Example output
editPOST _analyze { "tokenizer": "whitespace", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The above sentence would produce the following terms:
[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]
Configuration
editThe whitespace
tokenizer accepts the following parameters:
|
The maximum token length. If a token is seen that exceeds this length then
it is split at |