Traditional keyword search breaks when users do not use the exact terms that exist in your data. A customer searching "comfortable shoes for standing all day" finds nothing if your product descriptions say "ergonomic footwear with cushioned insoles." Semantic search understands meaning, not just keywords. Powered by embedding models and vector databases, it matches queries to content based on conceptual similarity — transforming search from a string-matching exercise into genuine information retrieval.
From Keywords to Vectors
Semantic search works by converting text into high-dimensional vectors (embeddings) that capture meaning. Similar concepts end up close together in vector space regardless of the specific words used. An embedding model like OpenAI's text-embedding-3, Cohere's embed-v3, or the open-source E5 family maps both queries and documents into this shared space. At search time, finding relevant results becomes a nearest-neighbour search in vector space — mathematically simple but conceptually powerful.
- Dense embeddings: Neural models encode the full semantic meaning of text into fixed-size vectors (768–3072 dimensions). They understand synonyms, paraphrases, and conceptual relationships that keyword search misses entirely.
- Sparse embeddings: Models like SPLADE produce sparse vectors that retain the precision of keyword matching while adding learned term expansion — bridging the gap between lexical and semantic search.
- Hybrid search: Combining dense and sparse retrieval captures both semantic similarity and exact keyword matches. This is the production standard because neither approach alone covers all query types optimally.
- Cross-encoders for re-ranking: After initial retrieval, cross-encoder models score each candidate against the query with much higher accuracy than bi-encoder similarity, reordering results for maximum relevance.
Vector Databases: The Infrastructure Layer
Vector databases are purpose-built to store, index, and search high-dimensional vectors at scale. Unlike relational databases optimised for exact lookups, vector databases use approximate nearest-neighbour (ANN) algorithms — HNSW, IVF, or DiskANN — to find similar vectors in milliseconds even across billions of entries. The choice of vector database shapes your system's performance, scalability, and operational complexity.
Dedicated vector databases like Pinecone, Weaviate, Qdrant, and Milvus offer rich features: metadata filtering, multi-tenancy, hybrid search, and managed infrastructure. PostgreSQL with pgvector provides vector search within your existing database, reducing operational complexity at the cost of scale and performance. For many applications — particularly those under 10 million vectors — pgvector offers the best balance of simplicity and capability, and it aligns well with data residency requirements for EU-based businesses.
Building a Production Semantic Search System
A production search system requires more than embeddings and a vector database. The full pipeline includes query understanding, retrieval, re-ranking, and result presentation. Query understanding handles spelling correction, query expansion, and intent classification — determining whether the user wants a product, a help article, or a category page. Retrieval fetches candidates using hybrid search. Re-ranking applies a cross-encoder to reorder by fine-grained relevance. Result presentation adds facets, filters, and grouping.
- Index management: As your catalogue changes, embeddings must be updated. Build pipelines that detect new, modified, and deleted content and update the vector index incrementally rather than rebuilding from scratch.
- Query analytics: Log every search query, the results returned, and user engagement (clicks, conversions, refinements). This data reveals gaps in your search quality and guides embedding model fine-tuning.
- Latency budgets: Users expect search results in under 200ms. Allocate your latency budget across embedding generation (20–50ms), vector search (10–30ms), re-ranking (50–100ms), and result assembly. Optimise each stage independently.
Domain-Specific Embedding Models
General-purpose embedding models work well for common language but may underperform on specialised domains — legal terminology, medical jargon, iGaming-specific terms, or highly technical product catalogues. Fine-tuning an embedding model on your domain data improves retrieval quality significantly. The process requires pairs of queries and relevant documents from your domain, which can be generated from search logs, click data, or synthetic generation using an LLM.
Multilingual embedding models like multilingual-e5 and Cohere's multilingual embed handle cross-language search natively — a user searching in Maltese can find content written in English, and vice versa. This is particularly valuable for businesses operating across EU markets where customers search in their native language but product catalogues may not be fully localised. Fine-tuning on multilingual query-document pairs from your specific domain further improves cross-language retrieval accuracy.
Measuring and Improving Search Quality
Search quality measurement starts with standard information retrieval metrics: NDCG (normalised discounted cumulative gain), MRR (mean reciprocal rank), and precision at k. But business metrics matter more: search conversion rate, null result rate (searches that return nothing), and search exit rate (users who leave after searching). A search system with perfect NDCG but poor conversion is not serving the business. Build evaluation pipelines that track both retrieval quality and business outcomes, running automated evaluations against a golden test set after every system change.
At Born Digital, we build AI-powered search systems that understand what your users mean, not just what they type. From eCommerce product search to internal knowledge retrieval, we implement semantic search infrastructure using vector databases, hybrid retrieval, and custom embedding models — helping businesses across Malta and Europe deliver search experiences that drive engagement and revenue.