Modelling documents with types
editModelling documents with types
editElasticsearch provides search and aggregation capabilities on the documents that it is sent and indexes. These documents are sent as JSON objects within the request body of a HTTP request. It is natural to model documents within NEST and Elasticsearch.Net using POCOs (Plain Old CLR Objects).
This section provides an overview of how types and type hierarchies can be used to model documents.
Default behaviour
editNEST’s default behaviour is to serialize type property names as camelcase JSON object members. Given the POCO
public class MyDocument
{
public string StringProperty { get; set; }
}
The following example demonstrates this behaviour
var indexResponse = Client.Index(
new MyDocument { StringProperty = "value" },
i => i.Index("my_documents"));
serializing the POCO property named StringProperty to the JSON object member named stringProperty
{
"stringProperty": "value"
}
DefaultFieldNameInferrer setting
editMany different systems may be indexing documents into Elasticsearch, using a different
convention than camelcase for JSON object members. How NEST serializes
POCO property names can be globally controlled using DefaultFieldNameInferrer on
ConnectionSettings. The following example defines a function that applies snake casing
to a passed string, with the function called inside a delegate passed to DefaultFieldNameInferrer
var settings = new ConnectionSettings(); static string ToSnakeCase(string s) { var builder = new StringBuilder(s.Length); for (int i = 0; i < s.Length; i++) { var c = s[i]; if (char.IsUpper(c)) { if (i == 0) builder.Append(char.ToLowerInvariant(c)); else if (char.IsUpper(s[i - 1])) builder.Append(char.ToLowerInvariant(c)); else { builder.Append("_"); builder.Append(char.ToLowerInvariant(c)); } } else builder.Append(c); } return builder.ToString(); } settings.DefaultFieldNameInferrer(p => ToSnakeCase(p)); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDocument { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocument POCO to
{
"string_property": "value"
}
PropertyName attribute
editSometimes there may be a need to change only how specific POCO properties are serialized. The
PropertyName attribute can be applied to POCO properties to control the name that the POCO
property will serialize to and deserialize from. The following example uses the PropertyName attribute
to control how the POCO property named StringProperty is serialized
public class MyDocumentWithPropertyName
{
[PropertyName("string_property")]
public string StringProperty { get; set; }
}
var indexResponse = Client.Index(
new MyDocumentWithPropertyName { StringProperty = "value" },
i => i.Index("my_documents"));
The above example serializes the MyDocumentWithPropertyName POCO to
{
"string_property": "value"
}
NEST property attributes
editThe PropertyName attribute can be used to control how a POCO property is serialized. NEST contains
a collection of other attributes, such as Text attribute, that not only control how a POCO property is serialized,
but also control how a POCO property is mapped when using Attribute mapping. The Name property of
these attributes controls how a POCO property is serialized in a similar fashion to PropertyName attribute.
The following example uses the Text attribute to control how the POCO property named StringProperty is serialized
public class MyDocumentWithTextProperty
{
[Text(Name = "string_property")]
public string StringProperty { get; set; }
}
var indexResponse = Client.Index(
new MyDocumentWithTextProperty { StringProperty = "value" },
i => i.Index("my_documents"));
The above example serializes the MyDocumentWithTextProperty POCO to
{
"string_property": "value"
}
DataMember attribute
editThe System.Runtime.Serialization.DataMember attribute can be used to control how a POCO property is serialized. in a similar
fashion to PropertyName attribute. The DataMember attribute may be preferred over PropertyName attribute in situations where
the project in which the POCOs are defined does not have a dependency on NEST.
The following example uses the DataMember attribute to control how the POCO property
named StringProperty is serialized
public class MyDocumentWithDataMember
{
[DataMember(Name = "string_property")]
public string StringProperty { get; set; }
}
var indexResponse = Client.Index(
new MyDocumentWithDataMember { StringProperty = "value" },
i => i.Index("my_documents"));
The above example serializes the MyDocumentWithDataMember POCO to
{
"string_property": "value"
}
DefaultMappingFor<TDocument> setting
editWhilst DefaultFieldNameInferrer applies a convention to all POCO properties, there may be occasions where
only particular properties of a specific POCO are serialized differently. The DefaultMappingFor<TDocument> setting
on ConnectionSettings can be used to change how properties are mapped for a type. The following example
changes how the StringProperty is serialized for the MyDocument type
var settings = new ConnectionSettings();
settings.DefaultMappingFor<MyDocument>(d => d
.PropertyName(p => p.StringProperty, nameof(MyDocument.StringProperty))
);
var client = new ElasticClient(settings);
var indexResponse = client.Index(
new MyDocument { StringProperty = "value" },
i => i.Index("my_documents"));
The above example serializes the MyDocument POCO to
{
"StringProperty": "value"
}
DefaultMappingFor<TDocument>'s behaviour can be somewhat surprising when class hierarchies are involved. Consider the following
POCOs
public class MyBaseDocument
{
public string StringProperty { get; set; }
}
public class MyDerivedDocument : MyBaseDocument
{
public int IntProperty { get; set; }
}
When serializing an instance of MyDerivedDocument with
var indexResponse = Client.Index(
new MyDerivedDocument { StringProperty = "value", IntProperty = 2 },
i => i.Index("my_documents"));
it serializes to
{
"intProperty": 2,
"stringProperty": "value"
}
Now, consider what happens when DefaultMappingFor<TDocument> is used to control how MyDerivedDocument
is mapped
var settings = new ConnectionSettings();
settings.DefaultMappingFor<MyDerivedDocument>(d => d
.PropertyName(p => p.IntProperty, nameof(MyDerivedDocument.IntProperty))
.Ignore(p => p.StringProperty)
);
var client = new ElasticClient(settings);
var indexResponse = client.Index(
new MyDerivedDocument { StringProperty = "value", IntProperty = 2 },
i => i.Index("my_documents"));
MyDerivedDocument serializes to
{
"IntProperty": 2
}
showing that the POCO property named IntProperty is serialized to JSON object member named "IntProperty" and
StringProperty has not been serialized (ignored). This shouldn’t be surprising.
Now, index an instance of the base class, MyBaseDocument
var indexResponse2 = client.Index(
new MyBaseDocument { StringProperty = "value" },
i => i.Index("my_documents"));
This serializes to an empty JSON object
{}
The StringProperty has not been serialized (ignored) for the base class, even though DefaultMappingFor<TDocument>
was used with the derived class, MyDerivedDocument
This happens because MyBaseDocument is the declaring type for the StringProperty member; when the MemberInfo for
the StringProperty is retrieved from the expression p => p.StringProperty, the DeclaringType is MyBaseDocument.
Since DefaultMappingFor<TDocument> persists property mappings for types in a dictionary keyed on MemberInfo, the
PropertyName() mapping defined using DefaultMappingFor<MyDerivedDocument> also applies to the base type, MyBaseDocument.
Consider a more involved example where the base type defines a member as virtual, and the derived type provides an
override for the member
public class MyBaseDocumentVirtualProperty
{
public virtual string StringProperty { get; set; }
}
public class MyDerivedDocumentOverrideProperty : MyBaseDocumentVirtualProperty
{
public override string StringProperty { get; set; }
public int IntProperty { get; set; }
}
With a similar scenario to the last example, DefaultMappingFor<TDocument> is defined for the
derived type, MyDerivedDocumentOverrideProperty
var settings = new ConnectionSettings();
settings.DefaultMappingFor<MyDerivedDocumentOverrideProperty>(d => d
.PropertyName(p => p.IntProperty, nameof(MyDerivedDocumentOverrideProperty.IntProperty))
.Ignore(p => p.StringProperty)
);
var client = new ElasticClient(settings);
var indexResponse = client.Index(
new MyDerivedDocumentOverrideProperty { StringProperty = "value", IntProperty = 2 },
i => i.Index("my_documents"));
The instance of MyDerivedDocumentOverrideProperty serializes to
{
"stringProperty": "value",
"IntProperty": 2
}
Notably, the StringProperty member has been serialized and not ignored, even though the
DefaultMappingFor<MyDerivedDocumentOverrideProperty> configuration specifies to ignore it.
Serializing an instance of the base type, MyBaseDocumentVirtualProperty
var indexResponse2 = client.Index(
new MyBaseDocumentVirtualProperty { StringProperty = "value" },
i => i.Index("my_documents"));
serializes to an empty JSON object
{}
This may be surprising.
There is a difference in how MemberInfo that represent the members of a type are retrieved when using reflection, compared
to how MemberInfo are determined from expressions.
As an example, when retrieving StringProperty member on MyDerivedDocumentOverrideProperty using reflection, both
DeclaringType and ReflectedType are MyDerivedDocumentOverrideProperty
var memberInfo = typeof(MyDerivedDocumentOverrideProperty).GetProperty("StringProperty");
Console.WriteLine($"DeclaringType: {memberInfo.DeclaringType.Name}");
Console.WriteLine($"ReflectedType: {memberInfo.ReflectedType.Name}");
In contrast, when retrieving StringProperty member on MyDerivedDocumentOverrideProperty using an expression, both
DeclaringType and ReflectedType are MyBaseDocumentVirtualProperty
public class MemberVisitor : ExpressionVisitor
{
protected override Expression VisitMember(MemberExpression node)
{
Console.WriteLine($"DeclaringType: {node.Member.DeclaringType.Name}");
Console.WriteLine($"ReflectedType: {node.Member.ReflectedType.Name}");
return base.VisitMember(node);
}
}
Expression<Func<MyDerivedDocumentOverrideProperty, string>> memberExpression =
p => p.StringProperty;
var visitor = new MemberVisitor();
visitor.Visit(memberExpression);
Crucially, this difference in how MemberInfo are retrieved explains the result of the previous example;
The serialization implementation determines the members for a given type using reflection, whereas DefaultMappingFor<TDocument>
determines the member in PropertyName using the expression passed.
As another example, consider a derived type that hides a base type member, using the new keyword
public class MyDerivedDocumentShadowProperty : MyBaseDocument
{
public new string StringProperty { get; set; }
}
Now when configuring DefaultMappingFor<TDocument> for MyDerivedDocumentShadowProperty
var settings = new ConnectionSettings();
settings.DefaultMappingFor<MyDerivedDocumentShadowProperty>(d => d
.Ignore(p => p.StringProperty)
);
var client = new ElasticClient(settings);
var indexResponse = client.Index(
new MyDerivedDocumentShadowProperty { StringProperty = "value" },
i => i.Index("my_documents"));
an instance of MyDerivedDocumentShadowProperty serializes to
{}
Whilst the base type MyBaseDocument
var indexResponse2 = client.Index(
new MyBaseDocument { StringProperty = "value" },
i => i.Index("my_documents"));
serializes to
{
"stringProperty": "value"
}
In summary, careful consideration should be made when using type hierarchies to represent documents that are indexed in Elasticsearch. It is generally recommended to stick to simple POCOs, where possible.