Elasticsearch has had powerful geospatial search and analytics capabilities for many years, but the API was quite different from what typical GIS users were used to. In the past year we've added the ES|QL query language, a piped query language as easy, or even easier, than SQL. It's particularly suited to the search, security, and observability use cases Elastic excels at. We're also adding support for geospatial search and analytics within ES|QL, making it far easier to use, especially for users coming from SQL or GIS communities.
Elasticsearch 8.12 and 8.13 brought basic support for geospatial types to ES|QL. This was dramatically enhanced with the addition of geospatial search capabilities in 8.14. More importantly, this support was designed to conform closely to the Simple Feature Access standard from the Open Geospatial Consortium (OGC) used by other spatial databases like PostGIS, making it much easier to use for GIS experts familiar with these standards.
In this blog, we'll show you how to use ES|QL to perform geospatial searches, and how it compares to the SQL and Query DSL equivalents. We'll also show you how to use ES|QL to perform spatial joins, and how to visualize the results in Kibana Maps. Note that all the features described here are in "technical preview", and we'd love to hear your feedback on how we can improve them.
Searching for geospatial data
Let's start with an example query:
FROM airport_city_boundaries
| WHERE ST_INTERSECTS(
city_boundary,
"POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
)
| KEEP abbrev, airport, region, city, city_location
This performs a search for any city boundary polygons that intersect with a rectangular search polygon around the Sanya Phoenix International Airport (SYX).
In a sample dataset of airports, cities and city boundaries, this search finds the intersecting polygon and returns the desired fields from the matching document:
abbrev | airport | region | city | city_location |
---|---|---|---|---|
SYX | Sanya Phoenix Int'l | 天涯区 | Sanya | POINT(109.5036 18.2533) |
That was easy! Now compare this to the classic Elasticsearch Query DSL for the same query:
GET /airport_city_boundaries/_search
{
"_source": ["abbrev", "airport", "region", "city", "city_location"],
"query": {
"geo_shape": {
"city_boundary": {
"shape": {
"type": "polygon",
"coordinates" : [[
[109.4, 18.1],
[109.6, 18.1],
[109.6, 18.3],
[109.4, 18.3],
[109.4, 18.1]
]]
}
}
}
}
}
Both queries are reasonably clear in their intent, but the ES|QL query closely resembles SQL. The same query in PostGIS looks like this:
SELECT abbrev, airport, region, city, city_location
FROM airport_city_boundaries
WHERE ST_INTERSECTS(
city_boundary,
'SRID=4326;POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'::geometry
);
Look back at the ES|QL example. So similar, right?
FROM airport_city_boundaries
| WHERE ST_INTERSECTS(
city_boundary,
"POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
)
| KEEP abbrev, airport, region, city, city_location
We've found that existing users of the Elasticsearch API find ES|QL much easier to use. We now expect that existing SQL users, particularly Spatial SQL users, will find that ES|QL feels very familiar to what they are used to seeing.
Why not SQL?
What about Elasticsearch SQL? It has been around for a while and has some geospatial features. However, Elasticsearch SQL was written as a wrapper on top of the original Query API, which meant only queries that could be transpiled down to the original API were supported. ES|QL does not have this limitation. Being a completely new stack allows for many optimizations that were not possible in SQL. Our benchmarks show ES|QL is very often faster than the Query API, particularly with aggregations!
Differences to SQL
Clearly, from the previous example, ES|QL is somewhat similar to SQL, but there are some important differences. For example, ES|QL is a piped query language, starting with a source command like FROM and then chaining all subsequent commands together with the pipe | character. This makes it very easy to understand how each command receives a table of data and performs some action on that table, such as filtering with WHERE
, adding columns with EVAL
, or performing aggregations with STATS
. Rather than starting with SELECT
to define the final output columns, there can be one or more KEEP
commands, with the last one specifying the final output results. This structure simplifies reasoning about the query.
Focusing in on the WHERE
command in the above example, we can see it looks quite similar to the PostGIS example:
ES|QL
WHERE ST_INTERSECTS(
city_boundary,
"POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
)
PostGIS
WHERE ST_INTERSECTS(
city_boundary,
'SRID=4326;POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'::geometry
)
Aside from the difference in string quotation characters, the biggest difference is in how we type-cast the string to a spatial type. In PostGIS, we use the ::geometry
suffix, while in ES|QL, we use the ::geo_shape
suffix. This is because ES|QL runs within Elasticsearch, and the type-casting operator ::
can be used to convert a string to any of the supported ES|QL types, in this case, a geo_shape
. Additionally, the geo_shape
and geo_point
types in Elasticsearch imply the spatial coordinate system known as WGS84, more commonly referred to using the SRID number 4326. In PostGIS, this needs to be explicit, hence the use of the SRID=4326;
prefix to the WKT string. If that prefix is removed, the SRID will be set to 0, which is more like the Elasticsearch types cartesian_point
and cartesian_shape
, which are not tied to any specific coordinate system.
Both ES|QL and PostGIS provide type conversion function syntax as well:
ES|QL
WHERE ST_INTERSECTS(
city_boundary,
TO_GEOSHAPE("POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))")
)
PostGIS
WHERE ST_INTERSECTS(
city_boundary,
ST_SetSRID(
ST_GeomFromText('POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'),
4326
)
)
OGC functions
Elasticsearch 8.14 introduces the following four OGC spatial search functions:
ES|QL | PostGIS | Description |
---|---|---|
ST_INTERSECTS | ST_Intersects | Returns true if two geometries intersect, and false otherwise. |
ST_DISJOINT | ST_Disjoint | Returns true if two geometries do not intersect, and false otherwise. The inverse of ST_INTERSECTS. |
ST_CONTAINS | ST_Contains | Returns true if one geometry contains another, and false otherwise. |
ST_WITHIN | ST_Within | Returns true if one geometry is within another, and false otherwise. The inverse of ST_CONTAINS. |
These function behave similarly to their PostGIS counterparts, and are used in the same way. For example, ST_INTERSECTS
returns true if two geometries intersect and false otherwise. If you follow the documentation links in the above table, you might notice that all the ES|QL examples are within a WHERE
clause after a FROM
clause, while all the PostGIS examples are using literal geometries. In fact, both platforms support using the functions in any part of the query where they make sense.
The first example in the PostGIS documentation for ST_INTERSECTS
is:
SELECT ST_Intersects(
'POINT(0 0)'::geometry,
'LINESTRING ( 2 0, 0 2 )'::geometry
);
The ES|QL equivalent of this would be:
ROW ST_INTERSECTS(
"POINT(0 0)"::geo_point,
"LINESTRING ( 2 0, 0 2 )"::geo_shape
)
Note how we did not specify the SRID in the PostGIS example. This is because in PostGIS when using the geometry
type, all calculations are done on a planar coordinate system, and so if both geometries have the same SRID, it does not matter what the SRID is. In Elasticsearch, this is also true for most functions, however, there are exceptions where geo_shape
and geo_point
use spherical calculations, as we'll see in the next blog about spatial distance search.
ES|QL versatility
So, we've seen examples above for using spatial functions in WHERE
clauses, and in ROW
commands. Where else would they make sense? One very useful place is in the EVAL
command. This command allows you to evaluate an expression and return the result. For example, let's determine if the centroids of all airports grouped by their country names are within a boundary outlining the country:
FROM airports
| EVAL in_uk = ST_INTERSECTS(location, TO_GEOSHAPE("POLYGON((1.2305 60.8449, -1.582 61.6899, -10.7227 58.4017, -7.1191 55.3291, -7.9102 54.2139, -5.4492 54.0078, -5.2734 52.3756, -7.8223 49.6676, -5.0977 49.2678, 0.9668 50.5134, 2.5488 52.1065, 2.6367 54.0078, -0.9668 56.4625, 1.2305 60.8449))"))
| EVAL in_iceland = ST_INTERSECTS(location, TO_GEOSHAPE("POLYGON ((-25.4883 65.5312, -23.4668 66.7746, -18.4131 67.4749, -13.0957 66.2669, -12.3926 64.4159, -20.1270 62.7346, -24.7852 63.3718, -25.4883 65.5312))"))
| EVAL within_uk = ST_WITHIN(location, TO_GEOSHAPE("POLYGON((1.2305 60.8449, -1.582 61.6899, -10.7227 58.4017, -7.1191 55.3291, -7.9102 54.2139, -5.4492 54.0078, -5.2734 52.3756, -7.8223 49.6676, -5.0977 49.2678, 0.9668 50.5134, 2.5488 52.1065, 2.6367 54.0078, -0.9668 56.4625, 1.2305 60.8449))"))
| EVAL within_iceland = ST_WITHIN(location, TO_GEOSHAPE("POLYGON ((-25.4883 65.5312, -23.4668 66.7746, -18.4131 67.4749, -13.0957 66.2669, -12.3926 64.4159, -20.1270 62.7346, -24.7852 63.3718, -25.4883 65.5312))"))
| STATS centroid = ST_CENTROID_AGG(location), count=COUNT() BY in_uk, in_iceland, within_uk, within_iceland
| SORT count ASC
The results are expected, the centroid of UK airports are within the UK boundary, and not within the Iceland boundary, and vice versa:
centroid | count | in_uk | in_iceland | within_uk | within_iceland |
---|---|---|---|---|---|
POINT (-21.946634463965893 64.13187285885215) | 1 | false | true | false | true |
POINT (-2.597342072712148 54.33551226578214) | 17 | true | false | true | false |
POINT (0.04453958108176276 23.74658354606057) | 873 | false | false | false | false |
In fact, these functions can be used in any part of the query where their signature makes sense. They all take two arguments, which are either a literal spatial object or a field of a spatial type, and they all return a boolean value. One important consideration is that the coordinate reference system (CRS) of the geometries must match, or an error will be returned. This means you cannot mix geo_shape
and cartesian_shape
types in the same function call. You can, however, mix geo_point
and geo_shape
types, as the geo_point
type is a special case of the geo_shape
type, and both share the same coordinate reference system. The documentation for each of the functions defined above lists the supported type combinations.
Additionally, either argument can be a spatial literal or a field, in either order. You can even specify two fields, two literals, a field and a literal, or a literal and a field. The only requirement is that the types are compatible. For example, this query compares two fields in the same index:
FROM airport_city_boundaries
| EVAL in_city = ST_INTERSECTS(city_location, city_boundary)
| STATS count=COUNT(*) BY in_city
| SORT count ASC
| EVAL cardinality = CASE(count < 10, "very few", count < 100, "few", "many")
| KEEP cardinality, count, in_city
The query basically asks if the city location is within the city boundary, which should generally be true, but there are always exceptions:
cardinality | count | in_city |
---|---|---|
few | 29 | false |
many | 740 | true |
A far more interesting question would be whether the airport location is within the boundary of the city that the airport serves. However, the airport location resides in a different index than the one containing the city boundaries. This requires a method to effectively query and correlate data from these two separate indexes.
Spatial joins
ES|QL does not support JOIN
commands, but you can achieve a special case of a join using the ENRICH
command, which behaves similarly to a 'left join' in SQL. This command operates akin to a 'left join' in SQL, allowing you to enrich results from one index with data from another index based on a spatial relationship between the two datasets.
For example, let's enrich the results from a table of airports with additional information about the city they serve by finding the city boundary that contains the airport location, and then perform some statistics on the results:
FROM airports
| ENRICH city_boundaries ON city_location WITH airport, region, city_boundary
| MV_EXPAND city_boundary
| EVAL boundary_wkt_length = LENGTH(TO_STRING(city_boundary))
| STATS centroid = ST_CENTROID_AGG(location), count = COUNT(city_location), min_wkt = MIN(boundary_wkt_length), max_wkt = MAX(boundary_wkt_length) BY region
| SORT count DESC
| LIMIT 5
This returns the top 5 regions with the most airports, along with the centroid of all the airports that have matching regions, and the range in length of the WKT representation of the city boundaries within those regions:
centroid | count | min_wkt | max_wkt | region |
---|---|---|---|---|
POINT (-32.56093470960719 32.598117914802714) | 90 | 207 | 207 | null |
POINT (-73.94515332765877 40.70366442203522) | 9 | 438 | 438 | City of New York |
POINT (-83.10398317873478 42.300230911932886) | 9 | 473 | 473 | Detroit |
POINT (-156.3020245861262 20.176383580081165) | 5 | 307 | 803 | Hawaii |
POINT (-73.88902732171118 45.57078813901171) | 4 | 837 | 837 | Montréal |
So, what really happened here? Where did the supposed JOIN
occur? The crux of the query lies in the ENRICH
command:
FROM airports
| ENRICH city_boundaries ON city_location WITH airport, region, city_boundary
This command instructs Elasticsearch to enrich the results retrieved from the airports
index, and perform an intersects
join between the city_location
field of the original index, and the city_boundary
field of the airport_city_boundaries
index, which we used in a few examples earlier. But some of this information is not clearly visible in this query. What we do see is the name of an enrich policy city_boundaries
, and the missing information is encapsulated within that policy definition.
{
"geo_match": {
"indices": "airport_city_boundaries",
"match_field": "city_boundary",
"enrich_fields": ["city", "airport", "region", "city_boundary"]
}
}
Here we can see that it will perform a geo_match
query (intersects
is the default), the field to match against is city_boundary
, and the enrich_fields
are the fields we want to add to the original document. One of those fields, the region
was actually used as the grouping key for the STATS
command, something we could not have done without this 'left join' capability. For more information on enrich policies, see the enrich documentation. While reading those documents, you will notice that they describe using the enrich indexes for enriching data at index time, by configuring ingest pipelines. This is not required for ES|QL, as the ENRICH
command works at query time. It is sufficient to prepare the enrich index with the necessary data and enrich policy, and then use the ENRICH
command in your ES|QL queries.
You may also notice that the most commonly found region was null
. What could this imply? Recall that I likened this command to a 'left join' in SQL, meaning if no matching city boundary is found for an airport, the airport is still returned but with null
values for the fields from the airport_city_boundaries
index. It turns out there were 89 airports that found no matching city_boundary
, and one airport with a match where the region
field was null
. This lead to a count of 90 airports with no region
in the results. Another interesting detail is the need for the MV_EXPAND
command. This is necessary because the ENRICH
command may return multiple results for each input row, and MV_EXPAND
helps to separate these results into multiple rows, one for each outcome. This also clarifies why "Hawaii" shows different min_wkt
and max_wkt
results: there were multiple regions with the same name but different boundaries.
Kibana Maps
Kibana has added support for Spatial ES|QL in the Maps application. This means that you can now use ES|QL to search for geospatial data in Elasticsearch, and visualize the results on a map.
There is a new layer option in the add layers menu, called "ES|QL". Like all of the geospatial features described so far, this is in "technical preview". Selecting this option allows you to add a layer to the map based on the results of an ES|QL query. For example, you could add a layer to the map that shows all the airports in the world.
Or you could add a layer that shows the polygons from the airport_city_boundaries
index, or even better, how about that complex ENRICH
query above that generates statistics for how many airports are in each region?
What's next
You might have noticed in two of the examples above we squeezed in yet another spatial function ST_CENTROID_AGG
. This is an aggregating function used in the STATS
command, and the first of many spatial analytics features we plan to add to ES|QL. We'll blog about it when we've got more to show!
Before that, we want to tell you more about a particularly exciting feature we've worked on: the ability to perform spatial distance searches, one of the most used spatial search features of Elasticsearch. Can you imagine what the syntax for distance searches might look like? Perhaps similar to an OGC function? Stay tuned for the next blog in this series to find out!
Spoiler alert: Elasticsearch 8.15 has just been released, and spatial distance search with ES|QL is included!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!