- Enterprise Search Guide: other versions:
- Getting started
- Prerequisites
- Web crawler
- Connectors
- Ingestion APIs
- Ingest pipelines
- Document enrichment with ML
- Indices, engines, content sources
- Programming language clients
- Search UI
- App Search and Workplace Search
- Enterprise Search server
- Run using Docker images
- Run using downloads (packages)
- Enterprise Search server known issues
- Troubleshooting
- Troubleshooting setup
- Monitoring
- Read-only mode
- Management APIs
- Monitoring APIs
- Read-only mode API
- Storage API
- Configuration
- Configuring encryption keys
- Configuring a mail service
- Configuring SSL/TLS
- Upgrading and migrating
- Upgrading self-managed deployments
- Upgrading from Enterprise Search 7.x
- Upgrading from Enterprise Search 7.11 and earlier
- Migrating from App Search on Elastic Cloud
- Migrating from App Search on Swiftype.com
- Migrating from self-managed App Search
- Logs and logging
- Known issues
- Troubleshooting
- Help, support, and feedback
- Release notes
- 8.6.2 release notes
- 8.6.1 release notes
- 8.6.0 release notes
- 8.5.3 release notes
- 8.5.2 release notes
- 8.5.1 release notes
- 8.5.0 release notes
- 8.4.3 release notes
- 8.4.2 release notes
- 8.4.1 release notes
- 8.4.0 release notes
- 8.3.3 release notes
- 8.3.2 release notes
- 8.3.1 release notes
- 8.3.0 release notes
- 8.2.3 release notes
- 8.2.2 release notes
- 8.2.1 release notes
- 8.2.0 release notes
- 8.1.3 release notes
- 8.1.2 release notes
- 8.1.1 release notes
- 8.1.0 release notes
- 8.0.1 release notes
- 8.0.0 release notes
- 8.0.0-rc2 release notes
- 8.0.0-rc1 release notes
- 8.0.0-beta1 release notes
- 8.0.0-alpha2 release notes
- 8.0.0-alpha1 release notes
Elastic web crawler known issues
editElastic web crawler known issues
editThe Elastic web crawler has the following known issues:
-
The crawler does not crawl pure JavaScript single-page applications (SPAs).
We recommend looking at dynamic rendering to help your crawler properly index your JavaScript websites. Another option is to serve a static HTML version of your Javascript website, using a solution such as Prerender.
-
The crawler does not support dynamic content.
The crawler does not execute JavaScript, and it only pulls text from HTML elements.
-
URLs being indexed despite having duplicate content and a canonical URL setting.
Canonical URL link tags are embedded within HTML source for pages that duplicate the content of other pages. Refer to Duplicate document handling for details. The crawler identifies duplicate content by hashing the content of default deduplication fields derived from the page. These fields are defined by the configuration setting
connector.crawler.extraction.default_deduplication_fields
.The web crawler checks your index for an existing document with the same content hash. Users have faced issues where they set canonical link tags for a page that does not have identical content, because the hashes are different. However, upon inspection, the content is the same.
Use the following workaround:
You can manage which fields the web crawler uses to create the content hash. If your pages all define canonical URLs, you could safely change your deduplication fields settings to include only the
url
field. Otherwise, you may need more fields to help check for duplicates. By default, the web crawler checksbody_content
,headings
,links
,meta_description
,meta_keywords
, andtitle
fields.