From ES|QL to native Pandas dataframes in Python

Get hands-on with Elasticsearch: Dive into our sample notebooks, start a free cloud trial, or try Elastic on your local machine now.

Since Elasticsearch 8.15 or with Elasticsearch Serverless, ES|QL responses support the Apache Arrow streaming format. This blog post will show you how to take advantage of it in Python. In an earlier blog post, I demonstrated how to convert ES|QL queries to Pandas dataframes using CSV as an intermediate representation. Unfortunately, CSV requires explicit type declarations, is slow (especially for larger datasets) and does not handle nested arrays and objects. Apache Arrow lifts all these limitations.

ES|QL to Pandas dataframes in Python

Importing test data

First, let's import some test data. As before, we will be using the employees sample data and mappings. The easiest way to load this dataset is to run these two Elasticsearch API requests in the Kibana Console.

Converting dataset to a Pandas DataFrame object

OK, with that out of the way, let's convert the full employees dataset to a Pandas DataFrame object using the ES|QL Arrow export:

Even though this dataset only contains 100 records, we use a LIMIT command to avoid ES|QL warning us about potentially missing records. This prints the following dataframe:

OK, so what actually happened here?

Given format="arrow", Elasticsearch returns binary Arrow streaming data
The Elasticsearch Python client looks at the Content-Type header and creates a PyArrow object
Finally, PyArrow's Pandas integration converts the PyArrow object to a Pandas dataframe.

Note that the types_mapper=pd.ArrowDtype parameter asks Pandas to use a PyArrow backend instead of a NumPy backend, since the source data is PyArrow. While this backend is not enabled by default for compatibility reasons, it has many advantages: it handles missing values, is faster, more interopable and supports more types. (This is not a zero copy conversion, however.)

For this example to work, the Pandas and PyArrow optional dependencies need to be installed. If you want to use another dataframe library such as Polars instead, you don't need Pandas and can directly use polars.from_arrow to create a Polars DataFrame from the PyArrow table returned by the Elasticsearch client.

One limitation is that Elasticsearch does not currently handle multi-valued fields, which is why we had to drop the is_rehired, job_positions and salary_change columns. This limitation will be lifted in a future version of Elasticsearch.

Anyway, you now have a Pandas dataframe that you can use to analyze your data further. But you can also continue massaging the data using ES|QL, which is particularly useful when queries return more than 10,000 rows, the current maximum number of rows that ES|QL queries can return.

More complex queries

In the next example, we're counting how many employees are speaking a given language by using STATS ... BY (not unlike GROUP BY in SQL). And then we sort the result with the languages column using SORT:

Unlike with CSV, we did not have to specify any types, as Arrow data already includes types. Here's the result:

21 employees speak 5 languages, wow! And 10 employees did not declare any spoken language. The missing value is denoted by <NA>, which is consistently used for missing data with the PyArrow backend. If we had used the NumPy backend instead, this column would have been converted to floats and the missing value would have been a confusing NaN, as NumPy integers don't have any sentinel value for missing data.

Queries with parameters

Finally, suppose that you want to expand the query from the previous section to only consider employees that speak N or more languages, with N being a variable parameter. For this we can use ES|QL's built-in support for parameters, which eliminates the risk of an injection attack associated with manually assembling queries with variable parts:

which prints the following:

Conclusion

As we saw, ES|QL's native Arrow support makes working with Pandas and other DataFrame libraries even nicer than using CSV and it will continue to improve over time, with the multi-value support coming in a future version of Elasticsearch.

Additional resources

If you want to learn more about ES|QL, the ES|QL documentation is the best place to start. You can also check out this other Python example using Boston Celtics data. To know more about the Python Elasticsearch client itself, you can refer to the documentation, ask a question on Discuss with the language-clients tag or open a new issue if you found a bug or have a feature request. Thank you!

Report an issue

Related Content

Introducing Elasticsearch support in the Google MCP Toolbox for Databases

ES|QL Agentic AI

December 12, 2025

Introducing Elasticsearch support in the Google MCP Toolbox for Databases

Explore how Elasticsearch support is now available in the Google MCP Toolbox for Databases and leverage ES|QL tools to securely integrate your index with any MCP client.

EZ LS

By: Enrico Zimuel and Laurent Saint-Félix

ES|QL

December 2, 2025

ES|QL in 9.2: Smart Lookup Joins and time-series support

Explore three separate updates to ES|QL in Elasticsearch 9.2: an enhanced LOOKUP JOIN for more expressive data correlation, the new TS command for time-series analysis, and the flexible INLINE STATS command for aggregation.

TP KK JK

By: Tyler Perkins, Kostas Krikellas and Julian Kiryakov

Multimodal search for mountain peaks with Elasticsearch and SigLIP-2

Vector Database Hybrid Search+2

November 4, 2025

Multimodal search for mountain peaks with Elasticsearch and SigLIP-2

Learn how to implement text-to-image and image-to-image multimodal search using SigLIP-2 embeddings and Elasticsearch kNN vector search. Project focus: finding Mount Ama Dablam peak photos from an Everest trek.

By: Navneet Kumar