Ingest geospatial data into Elasticsearch with GDAL
Have you used Elastic Maps in Kibana yet? I am very excited about multiple layer support. Heat maps, vector layers from the Elastic Maps Service, and even individual documents all in the same interface! What a fantastic way to analyze and visualize your data.
But what about geospatial data that’s not in Elasticsearch? Maybe you want to overlay a shapefile of regional sales territories with sales aggregations. Maybe you have a CSV file of distribution center locations, and you want to get this data into Elasticsearch, but configuring Filebeat or Logstash is not ideal for ingesting static datasets. Well, we have the perfect solution for you: GDAL.
GDAL (Geospatial Data Abstraction Library) contains command line tools that can convert geospatial data between over 75 different geospatial file formats, including Elasticsearch. GDAL can be compiled from source or installed via package managers. GDAL can also be installed via Homebrew OSGeo on Mac (ex. brew tap osgeo/osgeo4mac && brew install osgeo-gdal
). Note, you must have GDAL v3.1 or later to ingest data into Elasticsearch 7.x.
Connecting to Elasticsearch
Once you’ve installed GDAL, open your command line or terminal window and try connecting to your Elasticsearch cluster using the ogrinfo
tool. We preface the URL with “ES:” to tell GDAL to use the Elasticsearch driver.
ogrinfo ES:http://localhost:9200
If you have security enabled on your Elasticsearch cluster (good for you), add a username and password to the front of the IP or domain in the form of user:pass@
. For example:
ogrinfo ES:https://elastic:[email protected]:9243
This command will print a list of indices in your Elasticsearch cluster and the geometry types (if any).
You can also store your credentials in a .netrc file in your HOME directory. GDAL uses curl to connect to Elasticsearch so the credentials will be automatically determined from this file. Using the example from above, my .netrc file looks like:
machine example.com
login elastic
password mysecretpassword
And my GDAL command looks like:
ogrinfo ES:https://example.com:9243
Ingesting a shapefile into Elasticsearch
Now that you’ve confirmed you can connect to Elasticsearch, let’s try ingesting a shapefile. A shapefile is a binary geospatial file format that contains shapes and properties. The quickest way to ingest a shapefile into Elasticsearch is using ogr2ogr
.
ogr2ogr ES:http://localhost:9200 my_shapefile.shp
However, the defaults set by GDAL might not be ideal for our data. For example, GDAL assumes all our text fields should be mapped in Elasticsearch as “text” for full-text search indexing. Usually, I want my text fields mapped as “keyword” so I can use them in terms aggregations. So I’m going to add an -lco NOT_ANALYZED_FIELDS={ALL}
flag to my command line which maps all text fields to “keyword”. I could also specify a comma-separated list of fields instead of {ALL}
.
We could also specify our own mapping file for ingestion into Elasticsearch. Fortunately, we don’t have to do this by hand as GDAL can generate a mapping file for us. We’ll use ogr2ogr
tool and add some extra parameters for generating the mapping. This will only generate a mapping file and will not ingest any data into Elasticsearch.
ogr2ogr -lco INDEX_NAME=gdal-data -lco NOT_ANALYZED_FIELDS={ALL} -lco WRITE_MAPPING=./mapping.json ES:http://localhost:9200 my_shapefile.shp
We can edit our mapping.json file in a text editor to fine-tune some of the settings. Then we can use our customized mapping when ingesting our shapefile by specifying the path to our customized mapping.
ogr2ogr -lco INDEX_NAME=gdal-data -lco OVERWRITE_INDEX=YES -lco MAPPING=./mapping.json ES:http://localhost:9200 my_shapefile.shp
GDAL and mapping types
GDAL has a MAPPING_NAME parameter that can be specified for earlier versions of Elasticsearch that support mapping types. If you are using Elasticsearch 6 or earlier, the default mapping type is “FeatureCollection” and each document will have its attributes under a nested field called “properties”. Currently Kibana does not have support for nested fields. So I strongly recommend changing the MAPPING_NAME parameter to something else such as “doc” which does not use nested fields. The MAPPING_NAME parameter has no effect on Elasticsearch 7 and later because mapping types have been removed.
ogr2ogr -lco MAPPING_NAME=doc -lco INDEX_NAME=gdal-data -lco MAPPING=./mapping.json ES:http://localhost:9200 my_shapefile.shp
More resources
GDAL is an immensely powerful tool and finding the exact command line options to use involves reading the driver documentation and a lot of testing. Be sure to read the ogr2ogr documentation carefully. Also, review the documentation for the appropriate input and output drivers as each have their own configuration options.
BostonGIS.com has a good ogr2ogr cheat sheet that might be helpful. There are also several useful GDAL cheat sheets on GitHub. I have also created a list of example command line scripts for ingesting geospatial formats into Elasticsearch. If you want, give it a try in a free trial of Elasticsearch Service. And while you’re there, check out Elastic Maps (if you haven’t seen it yet).
And if you run into any issues, reach out on the Elastic Discuss forums or check out https://gis.stackexchange.com/. Have fun!