Retriever®

Overview

Tech Stack

Docs

Retrieval Engine Deep Dive.

Understanding the tech powering your LLM's search.

How Our Retrieval Engine Works

Our intelligent retrieval layer sits at the core of our LLM pipeline, enabling rapid and robust search over massive document collections. By leveraging purpose-built vector databases, we ensure lightning-fast and highly relevant queries, augmented for large language model workflows.

Central to our approach is the use of FAISS, Qdrant, and Weaviate—each offering unique strengths for indexing, managing, and querying high-dimensional vectors. Their integration enables flexibility in scaling, storage options, and advanced filtering, adapting to diverse and evolving data needs.

FAISS: Fast, Efficient Indexing

FAISS, developed by Meta AI, is a gold standard for similarity search and clustering of dense vectors. It’s renowned for its speed and modularity. We use FAISS for in-memory, high-throughput scenarios—powering the bulk of our rapid query response.

Weavite: Hybrid Retrieval

Weaviate is a full-featured vector database with an easy-to-use API and built-in support for generating vectors using models like OpenAI or Hugging Face. It allows filtering results by metadata (e.g., "show only items in the tech category") and supports hybrid search combining keywords and vectors.

Qdrant: API-based Search

Qdrant is a vector database built in Rust for speed and scalability. Like Weaviate, it supports fast vector search with metadata filtering, but focuses more on performance and real-time updates. It doesn’t create vectors for you, but integrates easily into existing pipelines.

Benefits of Our Hybrid Approach

By integrating multiple vector databases, our retrieval engine provides best-in-class speed, reliability, and search relevance across varied data types. This multi-pronged architecture lets us tailor strategies for different use cases—outperforming single-system solutions in both accuracy and scalability.

We support strict, type-safe filtering, rapid updates, and seamless horizontal scaling. Each component can scale independently—ensuring uptime and performance are never compromised, even for demanding workloads.

The result is a flexible, future-proof system designed for the next wave of language model applications.