Web crawler (beta) API reference
editWeb crawler (beta) API reference
editThe Elastic Enterprise Search web crawler is a beta feature. Beta features are subject to change and are not covered by the support SLA of general release (GA) features. Elastic plans to promote this feature to GA in a future release.
Each crawl performed by the Enterprise Search web crawler has an associated crawl request object. The crawl requests API allows operators to create new crawl requests and to view and control the state of existing crawl requests.
The following operations are supported on the Crawl Requests API:
Crawler APIs are scoped to single App Search engines
editAll endpoints within the crawl requests API are scoped to a particular App Search engine. The engine is identified by the engine name value provided in the URL of the request. If an engine could not be found for any API request, an empty HTTP 404 response will be returned.
Get current active crawl request
editReturns a crawl request object for an active crawl or returns an HTTP 404 response if there is no active crawl for a given App Search engine.
GET /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests/active
For successful calls, the response is going to look like this:
# 200 OK { "id": "601b21adbeae67679b3b760a", "status": "running", "created_at": "Wed, 03 Feb 2021 22:20:29 +0000", "begun_at": "Wed, 03 Feb 2021 22:20:31 +0000", "completed_at": null }
For cases when there is no active crawl for a given engine, the API responds with a 404 error:
# 404 Not Found { "error": "There are no active crawl requests for this engine" }
Cancel an active crawl
editCancels an active crawl for a given App Search engine or returns an HTTP 404 response if there is no active crawl for a given App Search engine.
It may take some time for the crawler to detect the cancellation request and gracefully stop the crawl.
During the time, the status of the crawl request will remain canceling
.
POST /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests/active/cancel
In case of success, the response contains a single crawl request object with a canceling
state:
# 200 OK { "id": "601b21adbeae67679b3b760a", "status": "canceling", "created_at": "Wed, 03 Feb 2021 22:20:29 +0000", "begun_at": "Wed, 03 Feb 2021 22:20:31 +0000", "completed_at": null }
For cases when there is no active crawl for a given engine, the API responds with a 404 error:
# 404 Not Found { "error": "There are no active crawl requests for this engine" }
List crawl requests
editReturns a list of the most recent crawl requests for a given engine.
The number of items returned (default: 10) can be changed by using the limit
argument.
GET /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests GET /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests?limit=25
# 200 OK [ { "id": "601b21adbeae67679b3b760a", "status": "running", "created_at": "Wed, 03 Feb 2021 22:20:29 +0000", "begun_at": "Wed, 03 Feb 2021 22:20:31 +0000", "completed_at": null }, { "id": "60147e93beae67bf7ef72e86", "status": "success", "created_at": "Fri, 29 Jan 2021 21:30:59 +0000", "begun_at": "Fri, 29 Jan 2021 21:31:00 +0000", "completed_at": "Fri, 29 Jan 2021 21:35:20 +0000" }, { "id": "60146c07beae67f397300128", "status": "canceled", "created_at": "Fri, 29 Jan 2021 20:11:51 +0000", "begun_at": "Fri, 29 Jan 2021 20:11:52 +0000", "completed_at": "Fri, 29 Jan 2021 20:12:51 +0000" } ]
Create a new crawl request
editRequests a new crawl for a given App Search engine. If there is already an active crawl, the request returns an HTTP 400 response with an error message.
POST /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests
In case of success, the response contains a single crawl request object with a pending
state:
# 200 OK { "id": "601b21adbeae67679b3b760a", "status": "pending", "created_at": "Wed, 03 Feb 2021 22:20:29 +0000", "begun_at": null, "completed_at": null }
When there is already an active crawl, the API returns an HTTP 400 response:
# 400 Bad Request { "error": "There is an active crawl for the engine \"your-engine\", please wait for it to finish or abort it before requesting another one" }
View details for a crawl request
editReturns details of a given crawl request. The crawl request is identified with a unique Crawl Request ID value.
GET /api/as/v0/engines/{ENGINE_NAME}/crawler/crawl_requests/{CRAWL_REQUEST_ID}
# 200 OK { "id": "60147e93beae67bf7ef72e86", "status": "success", "created_at": "Fri, 29 Jan 2021 21:30:59 +0000", "begun_at": "Fri, 29 Jan 2021 21:31:00 +0000", "completed_at": "Fri, 29 Jan 2021 21:35:20 +0000" }