Jina AI is now part of Elastic, bringing its high-performance multilingual and multimodal search AI to Elasticsearch’s powerful data storage, retrieval, and indexing capabilities.. Jina AI models can be integrated with Elasticsearch via a public API, which includes 10 million free tokens for testing.
jina-embeddings-v4 is a multilingual and multimodal embedding model that supports images and texts in 30 major languages. With 3.8 billion parameters, it achieves state-of-the-art performance among models of comparable size and excels not only at text-to-image retrieval, but also text-to-text tasks. It has especially strong performance at visual document retrieval, handling common image types such as charts, slides, maps, screenshots, scans, and diagrams, areas where most computer vision models fall short.
The model supports input of up to 32,768 tokens of text and images of up to 20 megapixels. One of this model’s key innovations is its two output modes:
- Single-vector embeddings – Compact document embeddings for texts and images in a common semantic space. Users can choose embedding vector sizes ranging from 2048 to 128 dimensions, with minimal loss of precision.. Shorter embeddings save storage space and increase indexing and retrieval speed, but are less precise, so users can decide for themselves the trade-off between speed, computing resources, and retrieval accuracy.
- Multi-vector embeddings – Multi-vector embeddings are the same size as the input (128 dimensions per text token, and proportionate to size for images), and are useful in “late interaction” similarity measures. These embeddings are larger, and comparisons are more computationally expensive than with single-vector embeddings, but they result in higher-precision matching.
Jina AI has optimized this model for several tasks, with compact and selectable LoRA extension modules supporting three different uses:
- Asymmetric Retrieval — Embeddings-based retrieval performs better when documents and query texts are encoded differently, and Jina Embeddings v4 supports this through two separate LoRA extensions trained to work together: one for documents to be indexed, and one for queries.
- Semantic Similarity — Measuring how closely two texts align in meaning or topic.. Related document discovery, deduplication, and translation alignment are common applications of semantic similarity.
- Code-Specific Tasks – Special behavior and training for computer technology and programming language similarity.
jina-embeddings-v3 is a multilingual, multi-purpose text-only embedding model supporting up to 8192 tokens of text input and producing user-selected variable-length embeddings from 64 to 1024 dimensions. This compact model has under 600 million parameters and delivers strong performance for its size, even though it was released in 2024..
Jina AI has trained five LoRA extension modules to support four tasks: one for semantic similarity and two for asymmetric retrieval, similar to jina-embeddings-v4 above, as well as two additional ones:
- Classification — Sorting texts into categories. You can use it for sentiment analysis, spam filtering, content moderation, and fraud identification, among others.
- Clustering — Letting the distribution of texts determine the categories they fall into. It’s often used for recommendation systems, news aggregation, and similar kinds of tasks.
jina-code-embeddings (0.5b & 1.5b) are a pair of specialized embedding models – one with a half-billion parameters, one with 1.5 billion – for programming languages and frameworks. Both models generate embeddings for natural language texts and for 15 different programming schemes, on inputs of up to 32,768 tokens. Users can select their own output embedding size, from 64 dimensions to 896 for the smaller model, and 128 to 1536 dimensions for the larger.
They have five task-specific retrieval modes, producing optimized query and document embeddings for each task:
- Code to Code – Retrieve similar code across programming languages. This is used for code alignment, code deduplication, and support for porting and refactoring.
- Natural Language to Code – Retrieve code matching natural language queries, comments, descriptions, and documentation.
- Code to Natural Language – Match code to documentation or other natural language texts.
- Code to Code Completion – Used to suggest relevant code to complete or enhance existing code.
- Technical Q&A – Identifying natural language answers to questions about information technologies, ideally suited for technical support tasks.
jina-clip-v2 is a multimodal embedding model supporting both texts and images. It has been trained so that texts and images produce similar embeddings when the text describes the image content.. This makes multimodal matching possible, and any database that already supports text embeddings can use this model out of the box to support image retrieval from text queries.
This model has been trained to also serve as a high-performance text embedding model, with broad multilingual support and 8,192 token input context for text. This reduces costs for users, eliminating the need for separate models for text-to-text and text-to-image retrieval.
Image input is rescaled to 512x512 pixels.
jina-reranker-m0 is a multilingual and multimodal text pairwise document reranker that uses a more fine-grained “late interaction” analysis to improve retrieval precision. The reranker receives a textual query and two candidates, which could be texts, images, or one of each, and tells you which one better matches the query. This model has been trained to support a wide variety of print and computer-generated graphic materials, such as slides, screenshots, and diagrams. It provides a powerful way to enhance precision in challenging search environments. Images must be at least 56 pixels on each side, and very large images will be resized until they yield no more than 768 patches of 28x28 pixels. Query texts and candidate documents must be no more than 10,240 tokens in total.
jina-reranker-v3 is a listwise multilingual text document reranker that uses the same “late interaction” approach as jina-reranker-m0, but reorders an entire list of documents by how well they match a query. Listwise reranking with AI models is compatible with any search scheme, not just AI-based ones, that produces a limited candidate match list, and as a supplement to an existing search scheme, it improves accuracy across the board. This makes it ideal as a drop-in enhancement for hybrid and legacy search systems.
This reranker applies only to texts and accepts a total of 131,000 tokens of input, including the query and all candidate documents to rerank.
ReaderLM-v2 is a small generative language model that converts HTML, including DOM-tree dumps of web pages, into Markdown or into JSON, according to user-provided output schemas and natural language instructions. This tool brings AI to your data preprocessing, intelligently handling the chaotic structure of web-scraped data. This compact model outperforms GPT-4 on the narrow data conversion tasks it was made for.
Getting Started
Check out the Jina AI website for access to the models and instructions on using the web APIs or downloading and using them yourself.
Tutorials and Notebooks
These tutorials refer to older Jina AI models, with new tutorials on the way.