WARNING: Version 5.5 of Filebeat has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Developer guide: creating a new module
editDeveloper guide: creating a new module
editThis guide will walk you through creating a new Filebeat module.
All Filebeat modules currently live in the main Beats repository. To clone the repository and build Filebeat (which you will need for testing), please follow the general Beats CONTRIBUTING guide.
Overview
editEach Filebeat module is composed of one or more "filesets". We usually create a
module for each service that we support (nginx
for Nginx, mysql
for Mysql,
and so on) and a fileset for each type of log that the service creates. For
example, the Nginx module has access
and error
filesets. You can contribute
a new module (with at least one fileset), or a new fileset for an existing
module.
Creating a new fileset
editRegardless of whether you are creating a fileset in a new or existing module,
the procedure is similar. Run the following command in the filebeat
folder:
make create-fileset
You’ll be prompted to enter a module and fileset name. Only use characters
[a-z]
and, if required, underscores (_
). No other characters are allowed.
For the module name, you can us either a new module name or an existing module
name. If the module doesn’t exist, it will be created.
In this guide we use {fileset}
and {module}
as placeholders for the
fileset and module names. You need to replace these with the actual names you
entered when your created the module and fileset.
After running the make create-fileset
command, you’ll find the fileset,
along with its generated files, under module/{module}/{fileset}
. This
directory contains the following files:
module/{module}/{fileset} ├── manifest.yml ├── config │ └── {fileset}.yml ├── ingest │ └── pipeline.json ├── _meta │ └── fields.yml └── test
Let’s look at these files one by one.
manifest.yml
editThe manifest.yml
is the control file for the module, where variables are
defined and the other files are referenced. It is a YAML file, but in many
places in the file, you can use built-in or defined variables by using the
{{.variable}}
syntax.
The var
section of the file defines the fileset variables and their default
values. The module variables can be referenced in other configuration files,
and their value can be overridden at runtime by the Filebeat configuration.
As the fileset creator, you can use any names for the variables you define. Each variable must have a default value. So in it’s simplest form, this is how you can define a new variable:
var: - name: pipeline default: with_plugins
Most fileset should have a paths
variable defined, which sets the default
paths where the log files are located:
var: - name: paths default: - /example/test.log* os.darwin: - /usr/local/example/test.log* - /example/test.log* os.windows: - c:/programdata/example/logs/test.log*
There’s quite a lot going on in this file, so let’s break it down:
-
The name of the variable is
paths
and the default value is an array with one element:"/example/test.log*"
. - Note that variable values don’t have to be strings. They can be also numbers, objects, or as shown in this example, arrays.
-
We will use the
paths
variable to set the prospector paths setting, so "glob" values can be used here. -
Besides the
default
value, the file defines values for particular operating systems: a default for darwin/OS X/macOS systems and a default for Windows systems. These are introduced via theos.darwin
andos.windows
keywords. The values under these keys become the default for the variable, if Filebeat is executed on the respective OS.
Besides the variable definition, the manifest.yml
file also contains
references to the ingest pipeline and prospector configuration to use (see next
sections):
ingest_pipeline: ingest/pipeline.json prospector: config/testfileset.yml
These should point to the respective files from the fileset.
Note that when evaluating the contents of these files, the variables are expanded, which enables you to select one file or the other depending on the value of a variable. For example:
ingest_pipeline: ingest/{{.pipeline}}.json
This example selects the ingest pipeline file based on the value of the
pipeline
variable. For the pipeline
variable shown earlier, the path would
resolve to ingest/with_plugins.json
(assuming the variable value isn’t
overridden at runtime.)
config/*.yml
editThe config/
folder contains template files that generate Filebeat prospector
configurations. The Filebeat prospectors are primarily responsible for tailing
files, filtering, and multi-line stitching, so that’s what you configure in the
template files.
A typical example looks like this:
input_type: log paths: {{ range $i, $path := .paths }} - {{$path}} {{ end }} exclude_files: [".gz$"]
You’ll find this example in the template file that gets generated automatically
when you run make create-fileset
. In this example, the paths
variable is
used to construct the paths
list for the paths option.
Any template files that you add to the config/
folder need to generate a valid
Filebeat prospector configuration in YAML format. The options accepted by the
prospector configuration are documented in this
section.
The template files use the templating language defined by the Golang standard library.
Here is another example that also configures multiline stitching:
input_type: log paths: {{ range $i, $path := .paths }} - {{$path}} {{ end }} exclude_files: [".gz$"] multiline: pattern: "^# User@Host: " negate: true match: after
Although you can add multiple configuration files under the config/
folder,
only the file indicated by the manifest.yml
file will be loaded. You can use
variables to dynamically switch between configurations.
ingest/*.json
editThe ingest/
folder contains Elasticsearch
Ingest Node pipeline configurations. The Ingest
Node pipelines are responsible for parsing the log lines and doing other
manipulations on the data.
The files in this folder are JSON documents representing
pipeline definitions. Just like with the config/
folder, you can define multiple pipelines, but a single one is loaded at runtime
based on the information from manifest.yml
.
The generator creates a JSON object similar to this one:
{ "description": "Pipeline for parsing {module} {fileset} logs", "processors": [ ], "on_failure" : [{ "set" : { "field" : "error", "value" : "{{ _ingest.on_failure_message }}" } }] }
From here, you would typically add processors to the processors
array to do
the actual parsing. For details on how to use ingest node processors, see the
ingest node documentation. In
particular, you will likely find the
Grok processor to be useful for parsing.
Here is an example for parsing the Nginx access logs.
{ "grok": { "field": "message", "patterns":[ "%{IPORHOST:nginx.access.remote_ip} - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\"" ], "ignore_missing": true } }
Note that you should follow the convention of naming of fields prefixed with the
module and fileset name: {module}.{fileset}.field
, e.g.
nginx.access.remote_ip
. Also, please review our
field naming conventions.
While developing the pipeline definition, we recommend making use of the Simulate Pipeline API for testing and quick iteration.
_meta/fields.yml
editThe fields.yml
file contains the top-level structure for the fields in your
fileset. It is used as the source of truth for:
- the generated Elasticsearch mapping template
- the generated Kibana index pattern
- the generated documentation for the exported fields
Besides the fields.yml
file in the fileset, there is also a fields.yml
file
at the module level, placed under module/{module}/_meta/fields.yml
, which
should contain the fields defined at the module level, and the description of
the module itself. In most cases, you should add the fields at the fileset
level.
test
editIn the test/
directory, you should place sample log files generated by the
service. We have integration tests, automatically executed by CI, that will run
Filebeat on each of the log files under the test/
folder and check that there
are no parsing errors and that all fields are documented.
In addition, assuming you have a test.log
file, you can add a
test.log-expected.json
file in the same directory that contains the expected
documents as they are found via an Elasticsearch search. In this case, the
integration tests will automatically check that the result is the same on each
run.
Module-level files
editBesides the files in the fileset folder, there is also data that needs to be filled at the module level.
_meta/docs.asciidoc
editThis file contains module-specific documentation. You should include information about which versions of the service were tested and the variables that are defined in each fileset.
_meta/fields.yml
editThe module level fields.yml
contains descriptions for the module-level fields.
Please review and update the title and the descriptions in this file. The title
is used as a title in the docs, so it’s best to capitalize it.
_meta/kibana
editThis folder contains the sample Kibana dashboards for this module. To create them, you can build them visually in Kibana and then run the following command:
$ cd filebeat/module/{module}/ python ../../../dev-tools/export_dashboards.py --regex {module} --dir _meta/kibana
Where the --regex
parameter should match the dashboard you want to export.
You can find more details about the process of creating and exporting the Kibana dashboards by reading this guide.