Building multilingual RAG with Elastic and Mistral

Building a multilingual RAG application using Elastic and Mixtral 8x22B model

Mixtral 8x22B is the most performant open model, and one of its most powerful features is fluency in many languages; including English, Spanish, French, Italian, and German.

Imagine a multinational company with support tickets and solutions in different languages and wants to take advantage of that knowledge across divisions. Currently, knowledge is limited to the language the agent speaks. Let's fix that!

In this article, I’m going to show you how to test Mixtral’s language capabilities, by creating a multilingual RAG system.

You can follow the notebook to reproduce this article's example here

Steps

  1. Creating embeddings endpoint
  2. Creating mappings
  3. Indexing data
  4. Asking questions

Creating embeddings endpoint

Our support tickets for this example will come in English, Spanish, and German. The Mistral embeddings model is not multilingual, but we can generate multilingual embeddings using the e5 model, so we can index text on different languages and manage it as a single source, giving us a much richer context.

To create e5 multilingual embeddings you can use Kibana:

Or the _inference API:

PUT _inference/text_embedding/multilingual-embeddings
 {
    "service": "elasticsearch",
    "service_settings": {
        "model_id": ".multilingual-e5-small",
        "num_allocations": 1 ,
        "num_threads": 1
    }
}

Creating Mappings

For the mappings we will use semantic_text mapping type, which is one of my favorite features. It handles the process of chunking the data, generating embeddings, and querying embeddings for you!

PUT multilingual-mistral
{
  "mappings": {
    "properties": {
      "super_body": {
        "type": "semantic_text",
        "inference_id": "multilingual-embeddings"
      }
    }
  }
}

We call the text field super_body because with a single mapping type it will handle chunks and embeddings.

Indexing data

We will index a couple of support tickets with problems and solutions in two languages, and then ask a question about problems within many documents in a third.

The following documents will be added to the index:

1. English Support Ticket: Calendar Sync Issue

Support Ticket #EN1234 Subject: Calendar sync not working with Google Calendar

Description: I'm having trouble syncing my project deadlines with Google Calendar. Whenever I try to sync, I get an error message saying "Unable to connect to external calendar service."

Resolution: The issue was resolved by following these steps:

  1. Go to Settings > Integrations

  1. Disconnect the Google Calendar integration

  1. Clear browser cache and cookies

  1. Reconnect the Google Calendar integration

  1. Authorize the app again in Google's security settings

The sync should now work correctly. If problems persist, ensure that third-party cookies are enabled in your browser settings.

2. German Support Ticket: File Upload Problem

Support-Ticket #DE5678 Betreff: Datei-Upload funktioniert nicht

Beschreibung: Ich kann keine Dateien mehr in meine Projekte hochladen. Jedes Mal, wenn ich es versuche, bleibt der Ladebalken bei 99% stehen und dann erscheint eine Fehlermeldung.

Lösung: Das Problem wurde durch folgende Schritte gelöst:

  1. Überprüfen Sie die Dateigröße. Die maximale Uploadgröße beträgt 100 MB.

  1. Deaktivieren Sie vorübergehend den Virenschutz oder die Firewall.

  1. Versuchen Sie, die Datei im Inkognito-Modus hochzuladen.

  1. Wenn das nicht funktioniert, leeren Sie den Browser-Cache und die Cookies.

  1. Als letzten Ausweg, versuchen Sie einen anderen Browser zu verwenden.

In den meisten Fällen lag das Problem an zu großen Dateien oder an Interferenzen durch Sicherheitssoftware. Nach Anwendung dieser Schritte sollte der Upload funktionieren.

3. Marketing Campaign Ideas (noise)

Q3 Marketing Campaign Ideas

  1. Social media contest: "Share Your Productivity Hack"
    • Users share tips using our software, best entry wins a premium subscription.

  1. Webinar series: "Mastering Project Management"
    • Invite industry experts to share insights using our tool.

  1. Email campaign: "Unlock Hidden Features"
    • Series of emails highlighting lesser-known but powerful features.

  1. Partner with a productivity podcast for sponsored content.

  1. Create a "Project Management Memes" social media account for lighter, shareable content.

4. Mitarbeiter des Monats (noise)

Mitarbeiter des Monats: Juli 2023

Wir freuen uns, bekannt zu geben, dass Sarah Schmidt zur Mitarbeiterin des Monats Juli gewählt wurde!

Sarah hat außergewöhnliche Leistungen in folgenden Bereichen gezeigt:

  • Kundenbetreuung: Sarah hat durchschnittlich 95% positive Bewertungen erhalten.

  • Teamarbeit: Sie hat maßgeblich zur Verbesserung unseres internen Wissensmanagementsystems beigetragen.

  • Innovation: Sarah hat eine neue Methode zur Priorisierung von Support-Tickets vorgeschlagen, die unsere Reaktionszeiten um 20% verbessert hat.

Bitte gratulieren Sie Sarah zu dieser wohlverdienten Anerkennung!

This is how a document will look like inside Elasticsearch:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.9155389,
        "hits": [
            {
                "_index": "multilingual-mistral",
                "_id": "1",
                "_score": 0.9155389,
                "_source": {
                    "super_body": {
                        "text": "\n        _Support Ticket #EN1234_\n        **Subject**: Calendar sync not working with Google Calendar\n\n        **Description**:\n        I'm having trouble syncing my project deadlines with Google Calendar. Whenever I try to sync, I get an error message saying \"Unable to connect to external calendar service.\"\n\n        **Resolution**:\n        The issue was resolved by following these steps:\n        1. Go to Settings > Integrations\n        2. Disconnect the Google Calendar integration\n        3. Clear browser cache and cookies\n        4. Reconnect the Google Calendar integration\n        5. Authorize the app again in Google's security settings\n\n        The sync should now work correctly. If problems persist, ensure that third-party cookies are enabled in your browser settings.\n    ",
                        "inference": {
                            "inference_id": "multilingual-embeddings",
                            "model_settings": {
                                "task_type": "text_embedding",
                                "dimensions": 384,
                                "similarity": "cosine",
                                "element_type": "float"
                            },
                            "chunks": [
                                {
                                    "text": "passage: \n        _Support Ticket #EN1234_\n        **Subject**: Calendar sync not working with Google Calendar\n\n        **Description**:\n        I'm having trouble syncing my project deadlines with Google Calendar. Whenever I try to sync, I get an error message saying \"Unable to connect to external calendar service.\"\n\n        **Resolution**:\n        The issue was resolved by following these steps:\n        1. Go to Settings > Integrations\n        2. Disconnect the Google Calendar integration\n        3. Clear browser cache and cookies\n        4. Reconnect the Google Calendar integration\n        5. Authorize the app again in Google's security settings\n\n        The sync should now work correctly. If problems persist, ensure that third-party cookies are enabled in your browser settings.",
                                    "embeddings": [
                                        0.0059651174,
                                        0.0016363655,
                                        -0.064753555,
                                        0.0093298275,
                                        0.05689768,
                                        -0.049640983,
                                        0.02504726,
                                        0.0048340675,
                                        0.08093895,
                                        ...
                                    ]
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Asking questions

Now, we are going to ask a question in Spanish:

Hola, estoy teniendo problemas para ocupar su aplicación, estoy teniendo problemas para sincronizar mi calendario, y encima al intentar subir un archivo me da error.

The expectation is retrieving documents #1 and #2, then sending them to the LLM as additional context, and finally, getting an answer in Spanish.

Retrieving documents

To retrieve the relevant documents, we can use this nice and short query that will run a search on the embeddings, and return the support tickets most relevant to the question.

GET multilingual-mistral/_search
{
   "size": 2,
   "_source": {
    "excludes": ["*embeddings", "*chunks"]
   },
  "query": {
    "semantic": {
      "field": "super_body",
      "query": "Hola, estoy teniendo problemas para ocupar su aplicación, estoy teniendo problemas para sincronizar mi calendario, y encima al intentar subir un archivo me da error."
    }
  }
}

Notes about the parameters set: size: 2 Because we know we want the top 2 documents. excludes For clarity in the response. Documents are short so each one will be one chunk long.

Answering the question

Now we can call the Mistral completion API using the Python library to answer the question.

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "open-mixtral-8x22b"

client = MistralClient(api_key=api_key)

system_message = "You are a helpful multilingual agent that helps users with their problems. You have access to a knowledge base of different languages and you must answer in the same language the question was asked."

user_message = """
## Question:

Hola, estoy teniendo problemas para ocupar su aplicación, estoy teniendo problemas para sincronizar mi calendario, y encima al intentar subir un archivo me da error. 

## Related knowledge:

Support Ticket #EN1234 Subject: Calendar sync not working with Google Calendar...
(the rest of the content of the document)

\n

Support-Ticket #DE5678 Betreff: Datei-Upload funktioniert nicht... 
(the rest of the content of the document)

ANSWER:

"""

messages = [
    ChatMessage(role="system", content=system_message),
    ChatMessage(role="user", content=user_message)
]

chat_response = client.chat(
    model=model,
    messages=messages,
)

print(chat_response.choices[0].message.content)

The answer is in perfect Spanish and on point!

Show answer

Hola, me alegra que te hayas comunicado con nosotros. Parece que hay dos problemas distintos.

En cuanto a la sincronización del calendario, puedes seguir estos pasos para resolver el problema:

  1. Ve a Configuración > Integraciones

  1. Desconecta la integración del Calendario de Google

  1. Borra la caché y las cookies del navegador

  1. Vuelve a conectar la integración del Calendario de Google

  1. Autoriza de nuevo la aplicación en la configuración de seguridad de Google

Si sigues teniendo problemas, asegúrate de que las cookies de terceros están habilitadas en la configuración de tu navegador.

En cuanto al problema de subir un archivo, hay varias cosas que puedes probar:

  1. Comprueba el tamaño del archivo. El tamaño máximo de carga es de 100 MB.

  1. Desactiva temporalmente el antivirus o el cortafuegos.

  1. Intenta cargar el archivo en modo incógnito.

  1. Si eso no funciona, borra la caché y las cookies del navegador.

  1. Como último recurso, prueba a usar un navegador diferente.

En la mayoría de los casos, el problema se debe a archivos demasiado grandes o a interferencias causadas por software de seguridad. Al seguir estos pasos, deberías poder cargar el archivo correctamente.

¡Espero que esto te ayude a resolver tus problemas! Si tienes alguna otra pregunta, no dudes en preguntar.

Conclusion

Mixtral 8x22B is a powerful model that enables us to leverage data sources in different languages, being able to answer, understand, and translate in many languages. This ability– together with multilingual embeddings– allows you to have multilingual support both in the data retrieval and the answer generation stages, removing language barriers entirely.

If you are interested on reproducing the examples of this article, you can find the Python Notebook with the requests here

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself