Configuration

edit

Configuration

edit

Use the following files to configure App Search, Workplace Search, and shared concerns within Enterprise Search:

config/enterprise-search.yml

Defines the configuration for the deployment.

See Enterprise Search configuration settings.

config/env.sh

Defines default values for environment variables used by Enterprise Search.

See Enterprise Search environment variables.

Refer to the following docs for specific configuration tasks:

Enterprise Search configuration settings

edit

Set the configuration for Enterprise Search within config/enterprise-search.yml:

## ================= Elastic Enterprise Search Configuration ==================
#
# NOTE: Elastic Enterprise Search comes with reasonable defaults.
#       Before adjusting the configuration, make sure you understand what you
#       are trying to accomplish and the consequences.
#
# NOTE: For passwords, the use of environment variables is encouraged
#       to keep values from being written to disk, e.g.
#       elasticsearch.password: ${ELASTICSEARCH_PASSWORD:changeme}
#
# ---------------------------------- Secrets ----------------------------------
#
# Encryption keys to protect your application secrets. This field is required.
#
#secret_management.encryption_keys: []
#
# ------------------------------- Elasticsearch -------------------------------
#
# Enterprise Search needs one-time permission to alter Elasticsearch settings.
# Ensure the Elasticsearch settings are correct, then set the following to
# true. Or, adjust Elasticsearch's config/elasticsearch.yml instead.
# See README.md for more details.
#
#allow_es_settings_modification: false
#
# Elasticsearch full cluster URL:
#
#elasticsearch.host: http://127.0.0.1:9200
#
# Elasticsearch credentials:
#
#elasticsearch.username: elastic
#elasticsearch.password: changeme
#
# Elasticsearch custom HTTP headers to add to each request:
#
#elasticsearch.headers:
#  X-My-Header: Contents of the header
#
# Elasticsearch SSL settings:
#
#elasticsearch.ssl.enabled: false
#elasticsearch.ssl.certificate:
#elasticsearch.ssl.certificate_authority:
#elasticsearch.ssl.key:
#elasticsearch.ssl.key_passphrase:
#elasticsearch.ssl.verify: true
#
# Elasticsearch startup retry:
#
#elasticsearch.startup_retry.enabled: true
#elasticsearch.startup_retry.interval: 5 # seconds
#elasticsearch.startup_retry.fail_after: 600 # seconds
#
# ---------------------------------- Kibana -----------------------------------
#
# The primary URL at which users interact with Kibana. This is used when
# Enterprise Search links users to Kibana.
#
#kibana.external_url: http://localhost:5601
#
# ------------------------------- Hosting & Network ---------------------------
#
# Define the exposed URL at which users will reach Enterprise Search.
# Defaults to localhost:3002 for testing purposes.
# Most cases will use one of:
#
# * An IP: http://255.255.255.255
# * A FQDN: http://example.com
# * Shortname defined via /etc/hosts: http://ent-search.search
#
#ent_search.external_url: http://localhost:3002
#
# Web application listen_host and listen_port.
# Your application will run on this host and port.
#
# * ent_search.listen_host: Must be a valid IPv4 or IPv6 address.
# * ent_search.listen_port: Must be a valid port number (1-65535).
#
#ent_search.listen_host: 127.0.0.1
#ent_search.listen_port: 3002
#
# ------------------------------ Authentication -------------------------------
#
# Auth name associated with the options being set up. If realm chains are
# configured in elasticsearch.yml for the associated Elasticsearch instance,
# then the names of the realms should also be used here. Multiple auth
# providers may be configured. Each must have a unique name.
#
#ent_search.auth.<auth_name>
#
# The origin of authenticated Enterprise Search users.
# Options are standard, elasticsearch-native, and elasticsearch-saml.
#
# Docs: https://www.elastic.co/guide/en/workplace-search/current/workplace-search-security.html
#
# * standard: Users are created within the Enterprise Search dashboard.
# * elasticsearch-native: Users are managed via the Elasticsearch native realm.
# * elasticsearch-saml: Users are managed via the Elasticsearch SAML realm.
#
#ent_search.auth.<auth_name>.source:
#
# Auth providers are consulted in ascending order (that is to say, the realm
# with the lowest order value is consulted first). You should make sure each
# configured realm has a distinct order setting.
#
#ent_search.auth.<auth_name>.order:
#
# The name to be displayed on the login screen associated with this provider.
#
#ent_search.auth.<auth_name>.description:
#
# The URL to an icon to be displayed on the login screen associated with this
# provider.
#
#ent_search.auth.<auth_name>.icon:
#
# Boolean value to determine whether or not to display this login option on the
# login screen. It is common to hide an option if you would like to create role
# mappings before allowing the option to be used as a valid login mechanism.
#
#ent_search.auth.<auth_name>.hidden: false
#
# Adds a message to the login screen. Useful for displaying information about
# maintenance windows, links to corporate sign up pages, etc. This field
# supports Markdown.
#
#ent_search.login_assistance_message:
#
# ---------------------------------- Limits -----------------------------------
#
# Configurable limits for Enterprise Search.
# NOTE: Overriding the default limits can impact performance negatively.
#       Also, changing a limit here does not actually guarantee that
#       Enterprise Search will work as expected as related Elasticsearch limits
#       can be exceeded.
#
#### Workplace Search
#
# Configure the maximum allowed document size for a content source.
#
#workplace_search.content_source.document_size.limit: 100kb
#
# Configure how many fields a content source can have.
# NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count`
# might also need to be adjusted if "Max clause count exceeded" errors start
# occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html
#
#workplace_search.content_source.total_fields.limit: 64
#
# Configure how many errors to tolerate in a sync job.
# If the job encounters more total errors than this value, the job will fail.
# NOTE: this only applies to errors tied to individual documents.
#
#workplace_search.content_source.sync.max_errors: 1000
#
# Configure how many errors in a row to tolerate in a sync job.
# If the job encounters more errors in a row than this value, the job will fail.
# NOTE: this only applies to errors tied to individual documents.
#
#workplace_search.content_source.sync.max_consecutive_errors: 10
#
# Configure the ratio of <errored documents> / <total documents> to tolerate in a sync job
# or in a rolling window (see `workplace_search.content_source.sync.error_ratio_window_size`).
# If the job encounters an error ratio greater than this value in a given window, or overall
# at the end of the job, the job will fail.
# NOTE: this only applies to errors tied to individual documents.
#
#workplace_search.content_source.sync.max_error_ratio: 0.15
#
# Configure how large of a window to consider when calculating an error ratio
# (see `workplace_search.content_source.sync.max_error_ratio`).
#
#workplace_search.content_source.sync.error_ratio_window_size: 100
#
# Configure whether or not a content source should generate thumbnails for the documents
# it syncs. Not all file types/sizes/content or Content Sources support thumbnail generation,
# even if this is enabled.
#
#workplace_search.content_source.sync.thumbnails.enabled: true
#
#### App Search
#
# Configure the maximum allowed document size.
#
#app_search.engine.document_size.limit: 100kb
#
# Configure how many fields an engine can have.
# NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count`
# might also need to be adjusted if "Max clause count exceeded" errors start
# occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html
#
#app_search.engine.total_fields.limit: 64
#
# Configure how many source engines a meta engine can have.
#
#app_search.engine.source_engines_per_meta_engine.limit: 15
#
# Configure how many facet values can be returned by a search.
#
#app_search.engine.total_facet_values_returned.limit: 250
#
# Configure how big full-text queries are allowed.
# NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count`
# might also need to be adjusted if "Max clause count exceeded" errors start
# occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html
#
#app_search.engine.query.limit: 128
#
# Configure total number of synonym sets an engine can have.
#
#app_search.engine.synonyms.sets.limit: 256
#
# Configure total number of terms a synonym set can have.
#
#app_search.engine.synonyms.terms_per_set.limit: 32
#
# Configure how many analytics tags can be associated with a single query or clickthrough.
#
#app_search.engine.analytics.total_tags.limit: 16
#
# ---------------------------------- Workers ----------------------------------
#
# Configure the number of worker threads.
#
#worker.threads: 1
#
# ----------------------------------- APIs ------------------------------------
#
# Set to true hide product version information from API responses.
#
#hide_version_info: false
#
# ---------------------------------- Mailer -----------------------------------
#
# Connect Enterprise Search to a mailer.
# Docs: https://www.elastic.co/guide/en/workplace-search/current/workplace-search-smtp-mailer.html
#
#email.account.enabled: false
#email.account.smtp.auth: plain
#email.account.smtp.starttls.enable: false
#email.account.smtp.host: 127.0.0.1
#email.account.smtp.port: 25
#email.account.smtp.user:
#email.account.smtp.password:
#email.account.email_defaults.from:
#
# ---------------------------------- Logging ----------------------------------
#
# Choose your log export path.
#
#log_directory: log
#
# Log level can be: debug, info, warn, error, fatal, or unknown.
#
#log_level: info
#
# Log format can be: default, json
#
#log_format: default
#
# Choose your Filebeat logs export path.
#
#filebeat_log_directory: log
#
# Use Index Lifecycle Management (ILM) to manage analytics and API logs
# retention.
#
# auto: Use ILM when supported by the underlying Elasticsearch cluster
# true: Use ILM (requires ILM support in the underlying Elasticsearch cluster)
# false: Don't use ILM (analytics and API logs will grow unconstrained)
#
# Docs: https://www.elastic.co/guide/en/app-search/current/logs.html
#
#ilm.enabled: auto
#
# Enable logging app logs to stdout (enabled by default).
#
#enable_stdout_app_logging: true
#
# The number of files to keep on disk when rotating logs. When set to 0, no
# rotation will take place.
#
#log_rotation.keep_files: 7
#
# The maximum file size in bytes before rotating the log file. If
# log_rotation.keep_files is set to 0, no rotation will take place and there
# will be no size limit for the singular log file.
#
#log_rotation.rotate_every_bytes: 1048576 # 1 MiB
#
# ---------------------------------- TLS/SSL ----------------------------------
#
# Configure TLS/SSL encryption.
#
#ent_search.ssl.enabled: false
#ent_search.ssl.keystore.path:
#ent_search.ssl.keystore.password:
#ent_search.ssl.keystore.key_password:
#ent_search.ssl.redirect_http_from_port:
#
# ---------------------------------- Session ----------------------------------
#
# Set a session key to persist user sessions through process restarts.
#
#secret_session_key:
#
# --------------------------------- Telemetry ---------------------------------
#
# Reporting your basic feature usage statistics helps us improve your user
# experience. Your data is never shared with anyone.
#
# Set to false to disable telemetry capabilities entirely. You can alternatively
# opt out through the Settings page.
#
#telemetry.enabled: true
#
# If false, collection of telemetry data is disabled; however, it can be
# enabled via the Settings page if telemetry.allow_changing_opt_in_status is
# true.
#
#telemetry.opt_in: true
#
# If true, users are able to change the telemetry setting at a later time
# through the Settings page. If false, the value of telemetry.opt_in determines
# whether to send telemetry data or not.
#
#telemetry.allow_changing_opt_in_status: true
#
# ----------------------------- Diagnostics report ----------------------------
#
# Path where diagnostic reports will be generated.
#
#diagnostic_report_directory: diagnostics
#
# ------------------------------ Crawler Preview ------------------------------
#
# The User-Agent HTTP Header used for the Crawler.
#
#crawler.http.user_agent: Elastic Crawler (<crawler_version_number>)
#
# The user agent platform used for the Crawler with identifying information:
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#syntax
# Note: this value will be added as a suffix to `crawler.http.user_agent` and
# used as the final User-Agent Header.
#
#crawler.http.user_agent_platform:
#
# The number of parallel crawls allowed per instance of Enterprise Search.
# By default, it is set to 2x the number of available CPU cores.
#
#crawler.workers.pool_size.limit: N
#
# -------------------------
# Per-crawl Resource Limits
# -------------------------
#
# These limits guard against infinite loops and other traps common to
# production web crawlers.
#
# If your crawler is hitting these limits, try changing your crawl rules
# or the content you're crawling. Adjust these limits as a last resort.
#
#
# The maximum duration of a crawl, in seconds. Beyond this limit, the
# crawler will stop, abandoning all remaining URLs in the crawl queue.
#
#crawler.crawl.max_duration.limit: 86400 # seconds
#
# The maximum number of sequential pages the crawler will traverse starting
# from the given set of entry points. Beyond this limit, the crawler will stop
# discovering new links.
#
#crawler.crawl.max_crawl_depth.limit: 10
#
# The maximum number of characters within each URL to crawl. The crawler
# will skip URLs that exceed this length.
#
#crawler.crawl.max_url_length.limit: 2048
#
# The maximum number of segments within the path of each URL to crawl. The
# crawler will skip URLs whose paths exceed this length.
# Example: The path '/a/b/c/d' has 4 segments.
#
#crawler.crawl.max_url_segments.limit: 16
#
# The maximum number of query parameters within each URL to crawl. The
# crawler will skip URLs that exceed this length.
# Example: The query string in '/a?b=c&d=e' has 2 query parameters.
#
#crawler.crawl.max_url_params.limit: 32
#
# The maximum number of unique URLs the crawler will index during a single crawl.
# Beyond this limit, the crawler will stop.
#
#crawler.crawl.max_unique_url_count.limit: 100000
#
# -------------------------
# Advanced Per-crawl Limits
# -------------------------
#
# The number of parallel threads to use for each crawl.
# The main effect from increasing this value will be an increased throughput
# of the crawler at the expense of higher CPU load on Enterprise Search and
# Elasticsearch instances as well as higher load on the website being crawled.
#
#crawler.crawl.threads.limit: 10
#
# The maximum size of the crawl frontier - the list of URLs the crawler needs to visit.
# The list is stored in Elasticsearch, so the limit could be increased as long
# as the Elasticsearch cluster has enough resources (disk space) to hold the queue index.
#
#crawler.crawl.url_queue.url_count.limit: 100000
#
# ---------------------------
# Per-Request Timeout Limits
# ---------------------------
#
# The maximum period to wait until abortion of the request, when a connection is being initiated.
#
#crawler.http.connection_timeout: 10 # seconds
#
# The maximum period of inactivity between two data packets, before the request is aborted.
#
#crawler.http.read_timeout: 10 # seconds
#
# The maximum period of the entire request, before the request is aborted.
#
#crawler.http.request_timeout: 60 # seconds
#
# ---------------------------
# Per-Request Resource Limits
# ---------------------------
#
# The maximum size of an HTTP response (in bytes) supported by the crawler.
#
#crawler.http.response_size.limit: 10485760
#
# The maximum number of HTTP redirects before a request is failed.
#
#crawler.http.redirects.limit: 10
#
# ----------------------------------
# Content Extraction Resource Limits
# ----------------------------------
#
# The maximum size (in bytes) of some fields extracted from crawled pages
#
#crawler.extraction.title_size.limit: 1024
#crawler.extraction.body_size.limit: 5242880
#crawler.extraction.keywords_size.limit: 512
#crawler.extraction.description_size.limit: 1024
#
# The maximum number of links extracted from each page for further crawling
#
#crawler.extraction.extracted_links_count.limit: 1000
#
# The maximum number of links extracted from each page and indexed in a document
#
#crawler.extraction.indexed_links_count.limit: 25
#
# The maximum number of HTML headers to be extracted from each page
#
#crawler.extraction.headings_count.limit: 25
#
# Document fields used to compare documents during de-duplication
#
#crawler.extraction.content_hash_include: ['document_title', 'document_body', 'meta_keywords', 'meta_description', 'links', 'headings']
#
# -----------------------------
# Crawler DNS Security Controls
# -----------------------------
#
# WARNING: The settings in this section could make your deployment vulnerable to
# SSRF attacks (especially in cloud environments) from the owners of any domains
# you crawl. Do not enable any of the settings here unless you fully control DNS
# domains you access with the crawler.
#
# See https://owasp.org/www-community/attacks/Server_Side_Request_Forgery for
# more details on the SSRF attack and the risks associated with it.
#
# Allow crawler to access the localhost (127.0.0.0/8 IP namespace)
#
#crawler.security.dns.allow_loopback_access: false
#
# Allow crawler to access the private IP space: link-local, network-local addresses, etc
# (see https://en.wikipedia.org/wiki/Reserved_IP_addresses#IPv4 for more details)
#
#crawler.security.dns.allow_private_networks_access: false
#
# ------------------------------ Read-only mode -------------------------------
#
# If true, pending migrations can be executed without enabling read-only mode.
# Proceeding with migrations while indices are allowing writes can have
# unintended consequences. Use at your own risk, should not be set to true when
# upgrading a production instance with ongoing traffic.
#
#skip_read_only_check: false
#

Enterprise Search environment variables

edit

Set default values for environment variables used by Enterprise Search within config/env.sh:

#####################################################
## Elastic Enterprise Search Environment Variables ##
#####################################################

# Java options for JVM tuning (used for app-server and CLI commands)
export JAVA_OPTS=${JAVA_OPTS:-"-Xms2g -Xmx2g"}

# Additional Java options for the application server
export APP_SERVER_JAVA_OPTS="${APP_SERVER_JAVA_OPTS:-}"

#------------------------------------------------------------------------------
# Enable Java GC logging (see below for the default configuration)
export JAVA_GC_LOGGING=true

# Example Environment variables for further logging configuration:

# Where to put the files
# export JAVA_GC_LOG_DIR=log
#
# How many of the most recent files to keep
# export JAVA_GC_LOG_KEEP_FILES=10
#
# How big GC logs should grow before triggering log rotation
# export JAVA_GC_LOG_MAX_FILE_SIZE=10m