Create an index (advanced example)

edit

Create an index (advanced example)

edit

This is a more complicated example of creating an index, showing how to define analyzers, tokenizers, filters and index settings. Although essentially the same as the previous example, the more complicated example can be helpful for "real world" usage of the client, since this particular syntax is easy to mess up.

For brevity, the example is presented using PHP 5.4+ short array syntax ([] as opposed to array()).

$params = [
    'index' => 'reuters',
    'body' => [
        'settings' => [ 
            'number_of_shards' => 1,
            'number_of_replicas' => 0,
            'analysis' => [ 
                'filter' => [
                    'shingle' => [
                        'type' => 'shingle'
                    ]
                ],
                'char_filter' => [
                    'pre_negs' => [
                        'type' => 'pattern_replace',
                        'pattern' => '(\\w+)\\s+((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\b',
                        'replacement' => '~$1 $2'
                    ],
                    'post_negs' => [
                        'type' => 'pattern_replace',
                        'pattern' => '\\b((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\s+(\\w+)',
                        'replacement' => '$1 ~$2'
                    ]
                ],
                'analyzer' => [
                    'reuters' => [
                        'type' => 'custom',
                        'tokenizer' => 'standard',
                        'filter' => ['lowercase', 'stop', 'kstem']
                    ]
                ]
            ]
        ],
        'mappings' => [ 
            '_default_' => [    
                'properties' => [
                    'title' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'body' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'combined' => [
                        'type' => 'string',
                        'analyzer' => 'reuters',
                        'term_vector' => 'yes'
                    ],
                    'topics' => [
                        'type' => 'string',
                        'index' => 'not_analyzed'
                    ],
                    'places' => [
                        'type' => 'string',
                        'index' => 'not_analyzed'
                    ]
                ]
            ],
            'my_type' => [  
                'properties' => [
                    'my_field' => [
                        'type' => 'string'
                    ]
                ]
            ]
        ]
    ]
];
$client->indices()->create($params);

The top level settings contains config about the index (# of shards, etc) as well as analyzers

analysis is nested inside of settings, and contains tokenizers, filters, char filters and analyzers

mappings is another top level element nested inside of body, and contains the mappings for various types

The _default_ type is a dynamic template that is applied to all fields that don’t have an explicit mapping

The my_type type is an example of a user-defined type that holds a single field, my_field