New

The executive guide to generative AI

Read more

Predicate Token Filter Script

edit

Predicate Token Filter Script

edit

The predicate_token_filter token filter takes a predicate script, and removes tokens that do not match the predicate.

Options

edit

script

a predicate script that determines whether or not the current token will be emitted. Note that only inline scripts are supported.

Settings example

edit

You can set it up like:

PUT /condition_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : [ "my_script_filter" ]
                }
            },
            "filter" : {
                "my_script_filter" : {
                    "type" : "predicate_token_filter",
                    "script" : {
                        "source" : "token.getTerm().length() > 5"  
                    }
                }
            }
        }
    }
}

This will emit tokens that are more than 5 characters long

And test it like:

POST /condition_example/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "What Flapdoodle"
}

And it’d respond:

{
  "tokens": [
    {
      "token": "Flapdoodle",        
      "start_offset": 5,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1                 
    }
  ]
}

The token What has been removed from the tokenstream because it does not match the predicate.

The position and offset values are unaffected by the removal of earlier tokens

Was this helpful?
Feedback