Limitations

edit

The following limitations and known problems apply to the 8.17.0 release of the Elastic natural language processing trained models feature.

Document size limitations when using semantic_text fields

edit

When using semantic text to ingest documents, chunking takes place automatically. The number of chunks is limited by the index.mapping.nested_objects.limit cluster setting, which defaults to 10k. Documents that are too large will cause errors during ingestion. To avoid this issue, please split your documents into roughly 1MB parts before ingestion.

ELSER semantic search is limited to 512 tokens per field that inference is applied to

edit

When you use ELSER for semantic search, only the first 512 extracted tokens from each field of the ingested documents that ELSER is applied to are taken into account for the search process. If your data set contains long documents, divide them into smaller segments before ingestion if you need the full text to be searchable.