IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› ›

Scan/Scroll

edit

Scan/Scroll

edit

The Scan/Scroll functionality of Elasticsearch is similar to search, but different in many ways. It works by executing a search query with a search_type of scan. This initiates a "scan window" which will remain open for the duration of the scan. This allows proper, consistent pagination.

Once a scan window is open, you may start _scrolling) over that window. This returns results matching your query…but returns them in random order. This random ordering is important to performance. Deep pagination is expensive when you need to maintain a sorted, consistent order across shards. By removing this obligation, Scan/Scroll can efficiently export all the data from your index.

This is an example which can be used as a template for more advanced operations:

$client = new Elasticsearch\Client();
$params = array(
    "search_type" => "scan",    // use search_type=scan
    "scroll" => "30s",          // how long between scroll requests. should be small!
    "size" => 50,               // how many results *per shard* you want back
    "index" => "my_index",
    "body" => array(
        "query" => array(
            "match_all" => array()
        )
    )
);

$docs = $client->search($params);   // Execute the search
$scroll_id = $docs['_scroll_id'];   // The response will contain no results, just a _scroll_id

// Now we loop until the scroll "cursors" are exhausted
while (\true) {

    // Execute a Scroll request
    $response = $client->scroll(
        array(
            "scroll_id" => $scroll_id,  //...using our previously obtained _scroll_id
            "scroll" => "30s"           // and the same timeout window
        )
    );

    // Check to see if we got any search hits from the scroll
    if (count($response['hits']['hits']) > 0) {
        // If yes, Do Work Here

        // Get new scroll_id
        // Must always refresh your _scroll_id!  It can change sometimes
        $scroll_id = $response['_scroll_id'];
    } else {
        // No results, scroll cursor is empty.  You've exported all the data
        break;
    }
}

« Function_Score query Namespaces »