Explain Analyze

edit

If you want to get more advanced details, set explain to true (defaults to false). It will output all token attributes for each token. You can filter token attributes you want to output by setting attributes option.

The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.

GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] 
}

Set "keyword" to output "keyword" attribute only

The request returns the following result:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false 
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false 
      } ]
    } ]
  }
}

Output only "keyword" attribute, since specify "attributes" in the request.

Settings to prevent tokens explosion

edit

Generating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced:

index.analyze.max_token_count
The maximum number of tokens that can be produced using _analyze API. The default value is 10000. If more than this limit of tokens gets generated, an error will be thrown. The _analyze endpoint without a specified index will always use 10000 value as a limit. This setting allows you to control the limit for a specific index:
PUT analyze_sample
{
  "settings" : {
    "index.analyze.max_token_count" : 20000
  }
}
GET analyze_sample/_analyze
{
  "text" : "this is a test"
}