Custom Serialization

edit

After internalizing the serialization routines, and IL-merging the Newtonsoft.Json package in 6.x, we are pleased to announce that the next stage of serialization improvements have been completed in 7.0.

Both SimpleJson and Newtonsoft.Json have been completely removed and replaced with an implementation of Utf8Json, a fast serializer that works directly with UTF-8 binary.

With the move to Utf8Json, we have removed some features that were available in the previous JSON libraries that have proven too onerous to carry forward at this stage.

  • JSON in the request is never indented, even if SerializationFormatting.Indented is specified. The serialization routines generated by Utf8Json never generate an IJsonFormatter<T> that will indent JSON, for performance reasons. We are considering options for exposing indented JSON for development and debugging purposes.
  • NEST types cannot be extended by inheritance. With NEST 6.x, additional properties can be included for a type by deriving from that type and annotating these new properties. With the current implementation of serialization with Utf8Json, this approach will not work.
  • Serializer uses Reflection.Emit. Utf8Json uses Reflection.Emit to generate efficient formatters for serializing types that it sees. Reflection.Emit is not supported on all platforms, for example, UWP, Xamarin.iOS, and Xamarin.Android.
  • Elasticsearch.Net.DynamicResponse deserializes JSON arrays to List<object>. SimpleJson deserialized JSON arrays to object[], but Utf8Json deserializes them to List<object>. This change is preferred for allocation and performance reasons.
  • Utf8Json is much stricter when deserializing JSON object field names to C# POCO properties. With the internal Json.NET serializer in 6.x, JSON object field names would attempt to be matched with C# POCO property names first by an exact match, falling back to a case insensitive match. With Utf8Json in 7.x however, JSON object field names must match exactly the name configured for the C# POCO property name.

Injecting a new serializer

edit

You can inject a serializer that is isolated to only be called for the (de)serialization of _source, _fields, or wherever a user provided value is expected to be written and returned.

Within NEST, we refer to this serializer as the SourceSerializer.

Another serializer also exists within NEST known as the RequestResponseSerializer. This serializer is internal and is responsible for serializing the request and response types that are part of NEST.

If SourceSerializer is left unconfigured, the internal RequestResponseSerializer is the SourceSerializer as well.

Implementing IElasticsearchSerializer is technically enough to inject your own SourceSerializer

public class VanillaSerializer : IElasticsearchSerializer
{
    public T Deserialize<T>(Stream stream) => throw new NotImplementedException();

    public object Deserialize(Type type, Stream stream) => throw new NotImplementedException();

    public Task<T> DeserializeAsync<T>(Stream stream, CancellationToken cancellationToken = default(CancellationToken)) =>
        throw new NotImplementedException();

    public Task<object> DeserializeAsync(Type type, Stream stream, CancellationToken cancellationToken = default(CancellationToken)) =>
        throw new NotImplementedException();

    public void Serialize<T>(T data, Stream stream, SerializationFormatting formatting = SerializationFormatting.Indented) =>
        throw new NotImplementedException();

    public Task SerializeAsync<T>(T data, Stream stream, SerializationFormatting formatting = SerializationFormatting.Indented,
        CancellationToken cancellationToken = default(CancellationToken)) =>
        throw new NotImplementedException();
}

Hooking up the serializer is performed in the ConnectionSettings constructor

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(
    pool,
    sourceSerializer: (builtin, settings) => new VanillaSerializer()); 
var client = new ElasticClient(connectionSettings);

what the Func?

If implementing IElasticsearchSerializer is enough, why do we need to provide an instance wrapped in a factory Func?

There are various cases where you might have a POCO type that contains a NEST type as one of its properties. For example, consider if you want to use percolation; you need to store Elasticsearch queries as part of the _source of your document, which means you need to have a POCO that looks something like this

public class MyPercolationDocument
{
    public QueryContainer Query { get; set; }
    public string Category { get; set; }
}

A custom serializer would not know how to serialize QueryContainer or other NEST types that could appear as part of the _source of a document, therefore a custom serializer needs to have a way to delegate serialization of NEST types back to NEST’s built-in serializer.

JsonNetSerializer

edit

We ship a separate NEST.JsonNetSerializer package that helps in composing a custom SourceSerializer using Json.NET, that is smart enough to delegate the serialization of known NEST types back to the built-in RequestResponseSerializer. This package is also useful if

  1. You want to control how your documents and values are stored and retrieved from Elasticsearch using Json.NET
  2. You want to use Newtonsoft.Json.Linq types such as JObject within your documents

The easiest way to hook this custom source serializer up is as follows

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings =
    new ConnectionSettings(pool, sourceSerializer: JsonNetSerializer.Default);
var client = new ElasticClient(connectionSettings);

JsonNetSerializer.Default is just syntactic sugar for passing a delegate like

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(
    pool,
    sourceSerializer: (builtin, settings) => new JsonNetSerializer(builtin, settings));
var client = new ElasticClient(connectionSettings);

JsonNetSerializer's constructor takes several methods that allow you to control the JsonSerializerSettings and modify the contract resolver from Json.NET.

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings =
    new ConnectionSettings(pool, sourceSerializer: (builtin, settings) => new JsonNetSerializer(
        builtin, settings,
        () => new JsonSerializerSettings { NullValueHandling = NullValueHandling.Include },
        resolver => resolver.NamingStrategy = new SnakeCaseNamingStrategy()
    ));
var client = new ElasticClient(connectionSettings);

Derived serializers

edit

If you’d like to be more explicit, you can also derive from ConnectionSettingsAwareSerializerBase and override the CreateJsonSerializerSettings and ModifyContractResolver methods

public class MyFirstCustomJsonNetSerializer : ConnectionSettingsAwareSerializerBase
{
    public MyFirstCustomJsonNetSerializer(IElasticsearchSerializer builtinSerializer, IConnectionSettingsValues connectionSettings)
        : base(builtinSerializer, connectionSettings) { }

    protected override JsonSerializerSettings CreateJsonSerializerSettings() =>
        new JsonSerializerSettings
        {
            NullValueHandling = NullValueHandling.Include
        };

    protected override void ModifyContractResolver(ConnectionSettingsAwareContractResolver resolver) =>
        resolver.NamingStrategy = new SnakeCaseNamingStrategy();
}

Using MyFirstCustomJsonNetSerializer, we can serialize using

  • a Json.NET NamingStrategy that snake cases property names
  • JsonSerializerSettings that includes null properties

without affecting how NEST’s own types are serialized. Furthermore, because this serializer is aware of the built-in serializer, we can automatically inject a JsonConverter to handle known NEST types that could appear as part of the source, such as the aformentioned QueryContainer.

Let’s demonstrate with an example document type

public class MyDocument
{
    public int Id { get; set; }

    public string Name { get; set; }

    public string FilePath { get; set; }

    public int OwnerId { get; set; }

    public IEnumerable<MySubDocument> SubDocuments { get; set; }
}

public class MySubDocument
{
    public string Name { get; set; }
}

Hooking up the serializer and using it is as follows

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(
    pool,
    connection: new InMemoryConnection(), 
    sourceSerializer: (builtin, settings) => new MyFirstCustomJsonNetSerializer(builtin, settings))
    .DefaultIndex("my-index");

var client = new ElasticClient(connectionSettings);

an in-memory connection is used here for example purposes. In your production application, you would use an IConnection implementation that actually sends a request.

Now, if we index an instance of our document type

var document = new MyDocument
{
    Id = 1,
    Name = "My first document",
    OwnerId = 2
};

var ndexResponse = client.IndexDocument(document);

it serializes to

{
  "id": 1,
  "name": "My first document",
  "file_path": null,
  "owner_id": 2,
  "sub_documents": null
}

which adheres to the conventions of our configured MyCustomJsonNetSerializer serializer.

Serializing Type Information

edit

Here’s another example that implements a custom contract resolver. The custom contract resolver will include the type name within the serialized JSON for the document, which can be useful when returning covariant document types within a collection.

public class MySecondCustomContractResolver : ConnectionSettingsAwareContractResolver
{
    public MySecondCustomContractResolver(IConnectionSettingsValues connectionSettings)
        : base(connectionSettings) { }

    protected override JsonContract CreateContract(Type objectType)
    {
        var contract = base.CreateContract(objectType);
        if (contract is JsonContainerContract containerContract)
        {
            if (containerContract.ItemTypeNameHandling == null)
                containerContract.ItemTypeNameHandling = TypeNameHandling.None;
        }

        return contract;
    }
}

public class MySecondCustomJsonNetSerializer : ConnectionSettingsAwareSerializerBase
{
    public MySecondCustomJsonNetSerializer(IElasticsearchSerializer builtinSerializer, IConnectionSettingsValues connectionSettings)
        : base(builtinSerializer, connectionSettings) { }

    protected override JsonSerializerSettings CreateJsonSerializerSettings() =>
        new JsonSerializerSettings
        {
            TypeNameHandling = TypeNameHandling.All,
            NullValueHandling = NullValueHandling.Ignore,
            TypeNameAssemblyFormatHandling = TypeNameAssemblyFormatHandling.Simple
        };

    protected override ConnectionSettingsAwareContractResolver CreateContractResolver() =>
        new MySecondCustomContractResolver(ConnectionSettings); 
}

override the contract resolver

Now, hooking up this serializer

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(
        pool,
        connection: new InMemoryConnection(),
        sourceSerializer: (builtin, settings) => new MySecondCustomJsonNetSerializer(builtin, settings))
    .DefaultIndex("my-index");

var client = new ElasticClient(connectionSettings);

and indexing an instance of our document type

var document = new MyDocument
{
    Id = 1,
    Name = "My first document",
    OwnerId = 2,
    SubDocuments = new []
    {
        new MySubDocument { Name = "my first sub document" },
        new MySubDocument { Name = "my second sub document" },
    }
};

var ndexResponse = client.IndexDocument(document);

serializes to

{
  "$type": "Tests.ClientConcepts.HighLevel.Serialization.GettingStarted+MyDocument, Tests",
  "id": 1,
  "name": "My first document",
  "ownerId": 2,
  "subDocuments": [
    {
      "name": "my first sub document"
    },
    {
      "name": "my second sub document"
    }
  ]
}

the type information is serialized for the outer MyDocument instance, but not for each MySubDocument instance in the SubDocuments collection.

When implementing a custom contract resolver derived from ConnectionSettingsAwareContractResolver, be careful not to change the behaviour of the resolver for NEST types; doing so will result in unexpected behaviour.

Per the Json.NET documentation on TypeNameHandling, it should be used with caution when your application deserializes JSON from an external source.