Custom sources indexing API reference

edit

Custom sources indexing API reference

edit

This is a technical API reference. Refer to the Custom API sources for a conceptual walkthrough.

In this API reference

edit

Custom source API endpoints and operations

edit

The Custom Source API supports traditional RESTful operations:

Creating or updating a document:

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_create

Listing documents:

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents

Deleting a document:

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_destroy

Authenticating requests with the custom source API

edit

Workplace Search APIs support multiple methods of authentication.

For simplicity, the examples from this page use a bearer token.

The TOKEN is a bearer token that grants API access (see API Authentication Reference). The ID is used to identify which Custom Source for which documents will be indexed, updated or deleted.

curl -X POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '
  ...
'

Create a new Custom Source or navigate to the Details area of an existing Custom Source from the Workplace Search administrative dashboard to locate the access token and content source ID.

The token grants API access and is not associated with any content source. The id value is a unique identifier for each Custom Source.

Schema management and configuration

edit

Every Custom Source has its own unique schema, allowing you to create document repositories that truly represent the nature of the information you want your team to access via Workplace Search. Read the Custom API sources guide for a walkthrough of the process.

The following guidelines may help you create a maintainable and scalable schema:

  1. A Custom Source schema can be configured with up to 64 fields.
  2. Always index new fields as the same type as existing documents.

    • eg. An existing date field should not receive geolocation data.
  3. Arrays are supported, but nested field objects are not supported.
  4. Fields cannot be deleted once they have been created.
  5. Reserved fields can not be created:

    • _allow_permissions
    • _boost
    • _deny_permissions
    • _explanation
    • _id
    • _index
    • _score
    • _type
    • _version
    • all
    • and
    • any
    • content_source_id
    • engine_id
    • external_id
    • highlight
    • last_updated
    • none
    • not
    • or
    • source
    • updated_at
  6. A field name can only contain lowercase letters, numbers, and underscores.

Schema data types

edit

Custom Source fields can be one of four different types:

text Fields
edit

Text fields are at the heart of search. They are analyzed fields and are used for full-text matching in information retrieval. Any group of characters or text that you want to search over should be text.

Example: A description of an object, the name of a product, the content of a review.

text is the default type for all new fields.

number Fields
edit

number fields represent a single-precision, floating-point value (32 bits): 3.14 or 42.

Number fields enable fine grained sorting, filtering, faceting, and boosting.

(If you need to represent a larger number, consider a text field as a workaround.)

Example: A price, a review score, the number of visitors, or a size.
date Fields
edit

Dates must be in ISO 8601 format, i.e. "2013-02-27T18:09:19Z" or "2013-02-27T17:09:19+01:00".

Example: A product release or publish date, birth date, an air date.
geolocation Fields
edit

Geolocation fields are latitude-longitude pairs, representing locations.

Examples: A store where a product is located; the location of a venue.

Specify a geolocation using any of the following formats:

"location": "41.12,-71.34" 

"location": "drm3btev3e86" 

"location": [ -71.34, 41.12 ] 

"location" : "POINT (-71.34 41.12)" 

Geo-point expressed as a string with the format: "lat,lon"

Geo-point expressed as a geohash

Geo-point expressed as an array with the format: [ lon, lat]

Geo-point expressed as a well-known text POINT with the format: "POINT(lon lat)"

For more details, see Geo-point field type in the Elasticsearch documentation. However, be aware Enterprise Search supports fewer formats than Elasticsearch. Enterprise Search supports only the formats shown above.

Indexing and updating documents

edit

Index new objects into a Custom Source or update existing documents.

Request limits: Maximum 100 documents per request

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_create

Path parameters

id

required

Unique ID for a Custom Source, provided upon creation of a Custom Source.

Authentication parameters

token

required

Must be included in HTTP authorization headers.

Request body

id

optional

ID unique to a document used to identify, modify or delete the record at a later time. If you do not provide an id, a BSON id will be created for you. Learn more about document IDs with the Workplace Search API reference.

last_updated

optional

The date and time the document is last updated. It will default to the current date and time only for the creation of a document, when it’s not provided.

_allow_permissions

optional

Optional for document-level permissions. When a value is set within a document, only users with a matching permission will be able to view it.

_deny_permissions

optional

Optional for document-level permissions. When a value is set within a document, users with the matching permission will be unable to view it. Read the Document permissions for custom sources to learn more.

When updating an existing document — identified by id — the values of the following fields are preserved unless replaced:

  • last_updated
  • _allow_permissions
  • _deny_permissions

To remove existing values, pass null for last_updated, or pass [] for _allow_permissions or _deny_permissions.

curl -X POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  {
    "_allow_permissions": ["permission1"],
    "_deny_permissions": [],
    "id" : 1234,
    "title" : "The Meaning of Time",
    "body" : "Not much. It is a made up thing.",
    "url" : "https://example.com/meaning/of/time",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  },
  {
    "_allow_permissions": [],
    "_deny_permissions": ["permission2"],
    "id" : 1235,
    "title" : "The Meaning of Sleep",
    "body" : "Rest, recharge, and connect to the Ether.",
    "url" : "https://example.com/meaning/of/sleep",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  },
  {
    "_allow_permissions": ["permission1"],
    "_deny_permissions": ["permission2"],
    "id" : 1236,
    "title" : "The Meaning of Life",
    "body" : "Be excellent to each other.",
    "url" : "https://example.com/meaning/of/life",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  }
]'
{
  "results": [
    {
      "id": "1234",
      "errors": []
    },
    {
      "id": "1235",
      "errors": []
    },
    {
      "id": "1236",
      "errors": []
    }
  ]
}

Listing documents

edit

List documents in a Custom Source.

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents

id

required

Unique ID for a Custom Source, provided upon creation of a Custom Source.

token

required

Must be included in HTTP authorization headers.

page

optional

Provides optional keys of size and current. Specifies the number of results per page and which page the API should return. page.current cannot be supplied if cursor is provided.

cursor

optional

Page cursor retrieved from the previous request. Cannot be supplied if page.current is provided.

sort

optional

Sort results ASC or DESC for a field

filters

optional

Query modifiers used to refine a query. Refer to the Search API for allowed values.

curl -X POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '{
  "page": {
    "size": 2
  },
  "sort": {
    "id": "asc"
  },
  "filters": {
    "any": [
      { "states": "Texas" },
      { "world_heritage_site": "true" }
    ]
  }
}'
{
  "meta": {
    "page": {
      "current": 1,
      "total_pages": 8,
      "total_results": 16,
      "size": 2
    },
    "warnings": [],
    "cursor": {
      "current": null,
      "next": "eyJzb3J0Ijp7ImlkIjoiYXNjIn0sInNlYXJjaF9hZnRlciI6WyJwYXJrX2NhcmxzYmFkLWNhdmVybnMiLDEuMCwicGFya19jYXJsc2JhZC1jYXZlcm5zIl19"
    }
  },
  "results": [
    {
      "last_updated": "2021-10-14T23:00:42+00:00",
      "source": "custom",
      "title": "Big Bend",
      "states": [
        "Texas"
      ],
      "world_heritage_site": "false",
      "updated_at": "2021-10-14T23:00:42+00:00",
      "content_source_id": "6168b6991b09597d71076700",
      "id": "park_big-bend"
    },
    {
      "last_updated": "2021-10-14T23:00:42+00:00",
      "source": "custom",
      "title": "Carlsbad Caverns",
      "states": [
        "New Mexico"
      ],
      "world_heritage_site": "true",
      "updated_at": "2021-10-14T23:00:42+00:00",
      "content_source_id": "6168b6991b09597d71076700",
      "id": "park_carlsbad-caverns"
    }
  ]
}
curl -X POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '{
  "cursor": "eyJzb3J0Ijp7ImlkIjoiYXNjIn0sInNlYXJjaF9hZnRlciI6WyJwYXJrX2NhcmxzYmFkLWNhdmVybnMiLDEuMCwicGFya19jYXJsc2JhZC1jYXZlcm5zIl19",
  "page": {
    "size": 1
  },
  "filters": {
    "any": [
      { "states": "Texas" },
      { "world_heritage_site": "true" }
    ]
  }
}'
{
  "meta": {
    "page": {
      "current": null,
      "total_pages": 16,
      "total_results": 16,
      "size": 1
    },
    "warnings": [],
    "cursor": {
      "current": "eyJzb3J0Ijp7ImlkIjoiYXNjIn0sInNlYXJjaF9hZnRlciI6WyJwYXJrX2NhcmxzYmFkLWNhdmVybnMiLDEuMCwicGFya19jYXJsc2JhZC1jYXZlcm5zIl19",
      "next": "eyJzb3J0Ijp7ImlkIjoiYXNjIn0sInNlYXJjaF9hZnRlciI6WyJwYXJrX2V2ZXJnbGFkZXMiLDEuMCwicGFya19ldmVyZ2xhZGVzIl19"
    }
  },
  "results": [
    {
      "last_updated": "2021-10-14T23:00:42+00:00",
      "source": "custom",
      "title": "Everglades",
      "states": [
        "Florida"
      ],
      "world_heritage_site": "true",
      "updated_at": "2021-10-14T23:00:42+00:00",
      "content_source_id": "6168b6991b09597d71076700",
      "id": "park_everglades"
    }
  ]
}

Deleting documents

edit

Deleting documents by ID

edit

Remove documents by ID from a Custom Source.

POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_destroy

id

required

Unique ID for a Custom source, provided upon creation of a Custom Source.

token

required

Must be included in HTTP authorization headers.

ids

required

An array of IDs associated to documents to delete.

curl -X POST <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents/bulk_destroy \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  [DOCUMENT_ID_1], [DOCUMENT_ID_2]
]'
{
  results: [
    {
      "id":1234,
      "success":true
    },
    {
      "id":1235,
      "success":true
    }
  ]
}

Deleting documents by query

edit

Remove documents by query from a Custom Source.

DELETE <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents

id

required

Unique ID for a Custom source, provided upon creation of a Custom Source.

token

required

Must be included in HTTP authorization headers.

filters

optional

Query modifiers used to refine a query. A request without filters will delete all documents in the custom source.

field

field

Name of field upon which to apply your filter. Only last_updated is supported now.

from

optional

Inclusive lower bound of the range. Is required if to is not provided. Must be in RFC-3339 format.

to

optional

Exclusive upper bound of the range. Is required if from is not provided. Must be in RFC-3339 format.

curl -X DELETE <ENTERPRISE_SEARCH_BASE_URL>/api/ws/v1/sources/[ID]/documents \
-H "Authorization: Bearer [TOKEN]"
-d '
{
  "filters" : {
    "last_updated" : {
      "from": "2020-06-01T12:00:00+00:00"
    }
  }
}
'
{
  "total" : 234,
  "deleted" : 234,
  "failures" : []
}

Understanding document IDs

edit

Each document within a content source must have a unique id. If you do not provide an id, a BSON id will be created for you. Two documents in two separate content sources may have the same id.

You can update existing documents by issuing a POST request to an existing id.

If the id does not exist, a new document is created. It is up to you to maintain the integrity of your id for each document within each Custom API Source.

We recommend that you avoid SHAs or any identifier derived from the content of a document. Any modification of the original data will alter the value, making it difficult to identify the document in the search index. This can lead to record duplication.

Synchronizing document-level permissions for custom sources

edit

Custom sources allow you to define at the document-level which user may or may not access the result as part of the search experience. Two reserved fields (_allow_permissions and _deny_permissions) accept array-type values. Using proper user mapping, you can generate sophisticated document access controls.

Deny permissions take precedence.

Read more in the Document permissions for custom sources guide.