Troubleshooting anomaly detection

edit

Use the information in this section to troubleshoot common problems and known issues.

Incorrect mappings in 7.9.0 or higher

edit

This problem occurs when you upgrade to 7.9.0 and incorrect mappings are added to the machine learning annotations index or the machine learning config index.

It is also possible for this problem to occur for the machine learning config index when you upgrade to 7.9.1 or higher after previously upgrading to several prior 7.x versions. If you skip version 7.9.0 and upgrade directly to version 7.9.1 or higher then the mappings on the machine learning annotations index will be correct. However, if you upgraded to version 7.9.0 and suffered incorrect mappings then upgrading to 7.9.1 will not fix these; you will need to follow the procedure detailed below.

Symptoms:

  • Some pages in the Machine Learning UI do not display correctly. For example, the Anomaly Explorer fails to load.
  • The following error occurs in Kibana when you try to view annotations for anomaly detection jobs: Error loading the list of annotations for this job
  • Cannot create or update any machine learning jobs. The error messages in this case are illegal argument exceptions like mapper [model_plot_config.annotations_enabled] cannot be changed from type [keyword] to [boolean]. This problem is most likely to occur if after upgrading you open an existing anomaly detection job in 7.9.0 before you create or update a job.

Resolution:

To avoid this problem, manually update the mappings on the machine learning annotations and config indices in your old Elasticsearch version before you upgrade to 7.9.0. For example:

PUT .ml-annotations-6/_mapping
{
  "properties": {
    "event" : {
      "type" : "keyword"
    },
    "detector_index" : {
      "type" : "integer"
    },
    "partition_field_name" : {
      "type" : "keyword"
    },
    "partition_field_value" : {
      "type" : "keyword"
    },
    "over_field_name" : {
      "type" : "keyword"
    },
    "over_field_value" : {
      "type" : "keyword"
    },
    "by_field_name" : {
      "type" : "keyword"
    },
    "by_field_value" : {
      "type" : "keyword"
    }
  }
}

PUT .ml-config/_mapping
{
  "properties": {
    "analysis_config": {
      "properties": {
        "per_partition_categorization" : {
          "properties" : {
            "enabled" : {
              "type" : "boolean"
            },
            "stop_on_warn" : {
              "type" : "boolean"
            }
          }
        }
      }
    },
    "max_num_threads" : {
      "type" : "integer"
    },
    "model_plot_config" : {
      "properties" : {
        "annotations_enabled" : {
          "type" : "boolean"
        }
      }
    }
  }
}

If security features are enabled, you must have the superuser role to alter the .ml-config index.

If you did not manually update the mappings before the upgrade, you can nonetheless try to do it after the upgrade. If either update fails, you must reindex that index. For example, follow these steps:

  1. To reindex the machine learning annotations index:

    1. Enable upgrade mode:

      POST _ml/set_upgrade_mode?enabled=true&timeout=10m
    2. Create a temporary index:

      PUT temp_ml_annotations
    3. Reindex the .ml-annotations-6 index into the temporary index:

      POST _reindex
      {
        "source": { "index": ".ml-annotations-6" },
        "dest": { "index": "temp_ml_annotations" }
      }
    4. Delete the .ml-annotations-6 index:

      DELETE .ml-annotations-6
    5. Disable upgrade mode:

      POST _ml/set_upgrade_mode?enabled=false&timeout=10m
    6. Wait for .ml-annotations-6 to be recreated.
    7. Reindex the temporary index into the .ml-annotations-6 index:

      POST _reindex
      {
        "source": { "index": "temp_ml_annotations" },
        "dest": { "index": ".ml-annotations-6" }
      }
    8. Delete the temporary index:

      DELETE temp_ml_annotations
  2. To reindex the machine learning config index, follow these steps:

    1. Enable upgrade mode:

      POST _ml/set_upgrade_mode?enabled=true&timeout=10m
    2. Create a temporary index:

      PUT temp_ml_config
    3. Reindex the .ml-config index into the temporary index:

      POST _reindex
      {
        "source": { "index": ".ml-config" },
        "dest": { "index": "temp_ml_config" }
      }
    4. Delete the .ml-config index:

      DELETE .ml-config
    5. Create the .ml-config index:

      PUT .ml-config
      {
        "settings": { "auto_expand_replicas": "0-1"}
      }
    6. Reindex the temporary index into the .ml-config index:

      POST _reindex
      {
        "source": { "index": "temp_ml_config" },
        "dest": { "index": ".ml-config" }
      }
    7. Disable upgrade mode:

      POST _ml/set_upgrade_mode?enabled=false&timeout=10m
    8. Delete the temporary index:

      DELETE temp_ml_config

Suboptimal job assignment on Debian 8

edit

Where possible, machine learning jobs are assigned to nodes based on the memory requirement of the job and the memory available on the node. However, in certain cases, the amount of memory on a node cannot be accurately determined and jobs are assigned by balancing the number of jobs per machine learning node. It may lead to a situation where all the jobs with high memory requirements are on one node and the less memory-intensive jobs on another.

One particular case of this problem is that Elasticsearch fails to determine the amount of memory on a machine that is running Debian 8 with the default Cgroups setup and certain updates of Java versions earlier than Java 15. For example, Java 8u271 is known to be affected while Java 8u272 is not. Java 15 was fixed from its initial release.

If you are running Elasticsearch on Debian 8 with an old version of Java and have not already modified the Cgroups setup then it is recommended to do one of the following:

  • Upgrade Java to version 15.
  • Upgrade to the latest Java update for the version of Java you are running.
  • Enable the "memory" Cgroup by editing /etc/default/grub and adding:

    GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1"

    Update your GRUB configuration by running sudo update-grub, then reboot the machine.