How to ingest data to Elasticsearch through Airbyte

Airbyte is a data integration tool that allows you to move information from various sources to different destinations in an automated and scalable way. It enables you to extract data from APIs, databases, and other systems and load it into platforms such as Elasticsearch, which offers advanced search and efficient analysis.

In this article, we will explain how to configure Airbyte to ingest data into Elasticsearch, covering key concepts, prerequisites, and step-by-step integration.

Airbyte fundamental concepts

Airbyte has several essential concepts for its use. Below, we highlight the main ones:

Sources: Defines the origin of the data that will be extracted.
Destinations: Defines where the data will be sent and stored.
Connections: Configures the relationship between the source and the destination, including the synchronization frequency.

Airbyte integration with Elasticsearch

In this demonstration, we will perform an integration where data stored in an S3 bucket will be migrated to an Elasticsearch index. We will show how to configure the source (S3) and destination (Elasticsearch) in Airbyte.

Prerequisites

To follow this demonstration, the following prerequisites must be met:

Create a bucket in AWS, where the JSON files containing the data will be stored.
Install Airbyte locally using Docker.
Create an Elasticsearch cluster in Elastic Cloud to store the ingested data.

Below, we will detail each of these steps.

Installing Airbyte

Airbyte can be run locally using Docker or in the cloud, where there are costs associated with usage. For this demonstration, we will use the local version with Docker.

The installation may take a few minutes. After following the installation instructions, Airbyte will be available at: http://localhost:8000.

After logging in, we can start configuring the integration.

Creating the bucket

In this step, you’ll need an AWS account to create an S3 bucket. Additionally, it is essential to set the correct permissions by creating a policy and an IAM user to allow access to the bucket.

In the bucket, we will upload JSON files containing different log records, which will later be migrated to Elasticsearch. The file logs have this content:

{
   "timestamp": "2025-02-15T14:00:12Z",
   "level": "INFO",
   "service": "data_pipeline",
   "message": "Pipeline execution started",
   "details": {
       "pipeline_id": "abc123",
       "source": "MySQL",
       "destination": "Elasticsearch"
   }
}

Below are the files loaded into the bucket:

Elastic Cloud configuration

To make the demonstration easier, we will use Elastic Cloud. If you do not have an account yet, you can create a free trial account here: Elastic Cloud Registration.

After configuring the deployment in Elastic Cloud, you will need to obtain:

The URL of the Elasticsearch server.
A user to access Elasticsearch.

To obtain the URL, go to Deployments > My deployment, in application, find Elasticsearch and click on ‘Copy endpoint.‘

To create the user, follow the steps below:

Access Kibana > Stack Management > Users.
Create a new user with the superuser role.
Fill in the fields to create the user.

Now that we have everything set up, we can start configuring the connectors in Airbyte.

Configuring source connector

In this step, we will create the source connector for S3. To do this, we will access the Airbyte interface and select the Source option in the menu. Then, we will search for the S3 connector. Below, we detail the steps required to configure the connector:

Access Airbyte and go to the Sources menu.
Search for and select the S3 connector.
Configure the following parameters:
1. Source Name: Define a name for the data source.
2. Delivery Method: Select Replicate Records (recommended for structured data).
3. Data Format: Choose JSON Format.
4. Stream Name: Define the name of the index in Elasticsearch.
5. Bucket Name: Enter the name of the bucket in AWS.
6. AWS Access Key and AWS Secret Key: Enter the access credentials.

Click on Set up source and wait for validation.

Configuration destination connector

In this step, we will configure the destination connector, which will be Elasticsearch. To do this, we will access the menu and select the Destination option. Then, we will search for Elasticsearch and click on the returned result. Now, we will proceed with the configuration of this connection:

Access Airbyte and go to the Destinations menu.
Search and select the Elasticsearch connector.
Configure the following parameters:
1. Authentication Method: Choose Username/Password.
2. Username and Password: Use the credentials created in Kibana.
3. Server Endpoint: Paste the URL copied from Elastic Cloud.

Click on Set up destination and wait for validation.

Creating the Source and Destination connection

Once the Source and Destination have been created, the connection between them will be created, thus completing the creation of the integration.

Below are the instructions for creating the connection:

1. In the menu, go to Connections and click on Create First Connection.

2. On the next screen, you will be able to select an existing Source or create a new one. Since we already have a Source created, we will select Source S3.

3. The next step will be to select the destination. Since we have already created the Elasticsearch connector, it will be selected to finalize the configuration.

In the next step, it will be necessary to define the Sync Mode and which schema will be used. Since only the log schema was created, it will be the only option available for selection.

4. We will move on to the Configure Connection step. Here, we can define the name of the connection and the frequency of the integration execution. The frequency can be configured in three ways:

Cron: Runs the syncs based on the user-defined cron expression (e.g 0 0 15 * * ?, At 15:00 every day);
Scheduled: Runs the syncs at the specified time interval (e.g. every 24 hours, every 2 hours);
Manual: Run the syncs manually.

For this demonstration, we will select the Manual option.

Finally, by clicking on Set up Connection, the connection between the Source and the Destination will be established.

Clicking on Set up Connection in Airbyte

Synchronizing Data from S3 to Elasticsearch

When you return to the Connections screen, you can see the connection that was created. To execute the process, simply click on Sync. From that moment on, the migration of data from S3 to Elasticsearch will begin.

If everything goes smoothly, you will get the synced status.

Visualizing data in Kibana

Now, we will go to Kibana to analyze the data and check if it was indexed correctly. In the Kibana Discovery section, we will create a Data View called logs. With this, we will be able to explore the data existing only in the logs index, which was created after the synchronization.

Now, we can visualize the indexed data and perform analyses on it. This way, we validated the entire migration flow using Airbyte, where we loaded the data present in the bucket and indexed it in Elasticsearch.

Airbyte and Elastic: visualize the indexed data and perform analyses on it in Kibana

Conclusion

Airbyte proved to be an efficient tool for data integration, allowing us to connect several sources and destinations in an automated way. In this tutorial, we demonstrated how to ingest data from an S3 bucket to an Elasticsearch index, highlighting the main steps of the process.

This approach facilitates the ingestion of large volumes of data and allows analyses within Elasticsearch, such as complex searches, aggregations, and data visualizations.

References

Quickstart Airbyte:

https://docs.airbyte.com/using-airbyte/getting-started/oss-quickstart#part-1-install-abctl

Core concepts:

https://docs.airbyte.com/using-airbyte/core-concepts/

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Report an issue