Exported fields

edit

The fields which might be extracted from a document are:

  • content,
  • title,
  • author,
  • keywords,
  • date,
  • content_type,
  • content_length,
  • language,
  • modified,
  • format,
  • identifier,
  • contributor,
  • coverage,
  • modifier,
  • creator_tool,
  • publisher,
  • relation,
  • rights,
  • source,
  • type,
  • description,
  • print_date,
  • metadata_date,
  • latitude,
  • longitude,
  • altitude,
  • rating,
  • comments

To extract only certain attachment fields, specify the properties array:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "properties": [ "content", "title" ]
      }
    }
  ]
}

Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.