- Filebeat Reference: other versions:
- Filebeat overview
- Quick start: installation and configuration
- Set up and run
- Upgrade
- How Filebeat works
- Configure
- Inputs
- Modules
- General settings
- Project paths
- Config file loading
- Output
- Kerberos
- SSL
- Index lifecycle management (ILM)
- Elasticsearch index template
- Kibana endpoint
- Kibana dashboards
- Processors
- Define processors
- add_cloud_metadata
- add_cloudfoundry_metadata
- add_docker_metadata
- add_fields
- add_host_metadata
- add_id
- add_kubernetes_metadata
- add_labels
- add_locale
- add_network_direction
- add_nomad_metadata
- add_observer_metadata
- add_process_metadata
- add_tags
- community_id
- convert
- copy_fields
- decode_base64_field
- decode_cef
- decode_csv_fields
- decode_json_fields
- decode_xml
- decode_xml_wineventlog
- decompress_gzip_field
- detect_mime_type
- dissect
- dns
- drop_event
- drop_fields
- extract_array
- fingerprint
- include_fields
- rate_limit
- registered_domain
- rename
- replace
- script
- syslog
- timestamp
- translate_sid
- truncate_fields
- urldecode
- Autodiscover
- Internal queue
- Load balancing
- Logging
- HTTP endpoint
- Regular expression support
- Instrumentation
- filebeat.reference.yml
- How to guides
- Override configuration settings
- Load the Elasticsearch index template
- Change the index name
- Load Kibana dashboards
- Load ingest pipelines
- Enrich events with geoIP information
- Deduplicate data
- Parse data using an ingest pipeline
- Use environment variables in the configuration
- Avoid YAML formatting problems
- Migrate
log
input configurations tofilestream
- Modules
- Modules overview
- ActiveMQ module
- Apache module
- Auditd module
- AWS module
- AWS Fargate module
- Azure module
- Barracuda module
- Bluecoat module
- CEF module
- Check Point module
- Cisco module
- CoreDNS module
- CrowdStrike module
- Cyberark PAS module
- Cylance module
- Elasticsearch module
- Envoyproxy Module
- F5 module
- Fortinet module
- Google Cloud module
- Google Workspace module
- HAproxy module
- IBM MQ module
- Icinga module
- IIS module
- Imperva module
- Infoblox module
- Iptables module
- Juniper module
- Kafka module
- Kibana module
- Logstash module
- Microsoft module
- MISP module
- MongoDB module
- MSSQL module
- MySQL module
- MySQL Enterprise module
- NATS module
- NetFlow module
- Netscout module
- Nginx module
- Office 365 module
- Okta module
- Oracle module
- Osquery module
- Palo Alto Networks module
- pensando module
- PostgreSQL module
- Proofpoint module
- RabbitMQ module
- Radware module
- Redis module
- Salesforce module
- Santa module
- Snort module
- Snyk module
- Sonicwall module
- Sophos module
- Squid module
- Suricata module
- System module
- Threat Intel module
- Tomcat module
- Traefik module
- Zeek (Bro) Module
- ZooKeeper module
- Zoom module
- Zscaler module
- Exported fields
- ActiveMQ fields
- Apache fields
- Auditd fields
- AWS fields
- AWS CloudWatch fields
- AWS Fargate fields
- Azure fields
- Barracuda Web Application Firewall fields
- Beat fields
- Blue Coat Director fields
- Decode CEF processor fields fields
- CEF fields
- Checkpoint fields
- Cisco fields
- Cloud provider metadata fields
- Coredns fields
- Crowdstrike fields
- CyberArk PAS fields
- CylanceProtect fields
- Docker fields
- ECS fields
- Elasticsearch fields
- Envoyproxy fields
- Big-IP Access Policy Manager fields
- Fortinet fields
- Google Cloud Platform (GCP) fields
- google_workspace fields
- HAProxy fields
- Host fields
- ibmmq fields
- Icinga fields
- IIS fields
- Imperva SecureSphere fields
- Infoblox NIOS fields
- iptables fields
- Jolokia Discovery autodiscover provider fields
- Juniper JUNOS fields
- Kafka fields
- kibana fields
- Kubernetes fields
- Log file content fields
- logstash fields
- Lumberjack fields
- Microsoft fields
- MISP fields
- mongodb fields
- mssql fields
- MySQL fields
- MySQL Enterprise fields
- NATS fields
- NetFlow fields
- Arbor Peakflow SP fields
- Nginx fields
- Office 365 fields
- Okta fields
- Oracle fields
- Osquery fields
- panw fields
- Pensando fields
- PostgreSQL fields
- Process fields
- Proofpoint Email Security fields
- RabbitMQ fields
- Radware DefensePro fields
- Redis fields
- s3 fields
- Salesforce fields
- Google Santa fields
- Snort/Sourcefire fields
- Snyk fields
- Sonicwall-FW fields
- sophos fields
- Squid fields
- Suricata fields
- System fields
- threatintel fields
- Apache Tomcat fields
- Traefik fields
- Zeek fields
- ZooKeeper fields
- Zoom fields
- Zscaler NSS fields
- Monitor
- Secure
- Troubleshoot
- Get help
- Debug
- Common problems
- Error extracting container id while using Kubernetes metadata
- Can’t read log files from network volumes
- Filebeat isn’t collecting lines from a file
- Too many open file handlers
- Registry file is too large
- Inode reuse causes Filebeat to skip lines
- Log rotation results in lost or duplicate events
- Open file handlers cause issues with Windows file rotation
- Filebeat is using too much CPU
- Dashboard in Kibana is breaking up data fields incorrectly
- Fields are not indexed or usable in Kibana visualizations
- Filebeat isn’t shipping the last line of a file
- Filebeat keeps open file handlers of deleted files for a long time
- Filebeat uses too much bandwidth
- Error loading config file
- Found unexpected or unknown characters
- Logstash connection doesn’t work
- Publishing to Logstash fails with "connection reset by peer" message
- @metadata is missing in Logstash
- Not sure whether to use Logstash or Beats
- SSL client fails to connect to Logstash
- Monitoring UI shows fewer Beats than expected
- Dashboard could not locate the index-pattern
- High RSS memory usage due to MADV settings
- Contribute to Beats
How Filebeat works
editHow Filebeat works
editIn this topic, you learn about the key building blocks of Filebeat and how they work together. Understanding these concepts will help you make informed decisions about configuring Filebeat for specific use cases.
Filebeat consists of two main components: inputs and harvesters. These components work together to tail files and send event data to the output that you specify.
What is a harvester?
editA harvester is responsible for reading the content of a single file. The harvester reads each file, line by line, and sends the content to the output. One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it’s being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes. By default, Filebeat keeps the file open until close_inactive
is reached.
Closing a harvester has the following consequences:
- The file handler is closed, freeing up the underlying resources if the file was deleted while the harvester was still reading the file.
-
The harvesting of the file will only be started again after
scan_frequency
has elapsed. - If the file is moved or removed while the harvester is closed, harvesting of the file will not continue.
To control when a harvester is closed, use the close_*
configuration options.
What is an input?
editAn input is responsible for managing the harvesters and finding all sources to read from.
If the input type is log
, the input finds all files on the drive that match the defined glob paths and starts a harvester for each file. Each input runs in its own Go routine.
The following example configures Filebeat to harvest lines from all log files that match the specified glob patterns:
filebeat.inputs: - type: log paths: - /var/log/*.log - /var/path2/*.log
Filebeat currently supports several input
types. Each input type can be defined multiple times. The log
input checks each file to see whether a harvester needs to be started, whether one is already running, or whether the file can be ignored (see ignore_older
). New lines are only picked up if the size of the file has changed since the harvester was closed.
How does Filebeat keep the state of files?
editFilebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory for each input. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
For each input, Filebeat keeps a state of each file it finds. Because files can be renamed or moved, the filename and path are not enough to identify a file. For each file, Filebeat stores unique identifiers to detect whether a file was harvested previously.
If your use case involves creating a large number of new files every day, you might find that the registry file grows to be too large. See Registry file is too large for details about configuration options that you can set to resolve this issue.
How does Filebeat ensure at-least-once delivery?
editFilebeat guarantees that events will be delivered to the configured output at least once and with no data loss. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file.
In situations where the defined output is blocked and has not confirmed all events, Filebeat will keep trying to send events until the output acknowledges that it has received the events.
If Filebeat shuts down while it’s in the process of sending events, it does not
wait for the output to acknowledge all events before shutting down. Any events
that are sent to the output, but not acknowledged before Filebeat shuts down,
are sent again when Filebeat is restarted. This ensures that each event is sent
at least once, but you can end up with duplicate events being sent to the
output. You can configure Filebeat to wait a specific amount of time before
shutting down by setting the shutdown_timeout
option.
There is a limitation to Filebeat’s at-least-once delivery guarantee involving log rotation and the deletion of old files. If log files are written to disk and rotated faster than they can be processed by Filebeat, or if files are deleted while the output is unavailable, data might be lost. On Linux, it’s also possible for Filebeat to skip lines as the result of inode reuse. See Common problems for more details about the inode reuse issue.
On this page