Skip to main content

Vector Search Quickstart

Open In Colab Download View source on GitHub

This notebook shows the core vector search workflow in ApertureDB:

  1. Create a DescriptorSet (vector index)
  2. Add Descriptors (embeddings with metadata)
  3. Run KNN search to find similar items

For real embeddings and runnable end-to-end examples, jump straight to the notebooks linked at the bottom.

Connect to ApertureDB

Option A: ApertureDB Cloud (recommended)
Sign up for a free 30-day trial. Get your key from Connect > Generate API Key, add it to a .env file in this directory:

APERTUREDB_KEY=your_key_here

Option B: Community Edition (local Docker)
Run this in a terminal before starting the notebook:

docker run -d --name aperturedb \
-p 55555:55555 -e ADB_MASTER_KEY=admin -e ADB_FORCE_SSL=false \
aperturedata/aperturedb-community

See client configuration options for all connection methods and server setup options for deployment choices.

%pip install --upgrade --quiet aperturedb python-dotenv
# Option A: ApertureDB Cloud
from dotenv import load_dotenv
load_dotenv() # loads APERTUREDB_KEY from .env into the environment
True
# Option B: Community Edition (local Docker)
# !adb config create localdb --active \
# --host localhost --port 55555 \
# --username admin --password admin \
# --no-use-ssl --no-interactive
from aperturedb.CommonLibrary import create_connector

client = create_connector()
response, _ = client.query([{"GetStatus": {}}])
client.print_last_response()
[
{
"GetStatus": {
"info": "OK",
"status": 0,
"system": "ApertureDB",
"version": "0.19.6"
}
}
]

Create a Vector Index

A DescriptorSet is a named, indexed collection of vectors. All vectors in a set must have the same number of dimensions.

SET_NAME = "recipe_search"

client.query([{
"AddDescriptorSet": {
"name": SET_NAME,
"dimensions": 4, # use 384, 512, 1024, etc. for real models
"engine": "FaissFlat", # exact search; use HNSW for large-scale ANN
"metric": "CS", # cosine similarity; or "L2" for Euclidean
}
}])
client.print_last_response()
[
{
"AddDescriptorSet": {
"status": 0
}
}
]

Add Vectors

Each Descriptor is a float32 vector plus optional metadata properties. The vector is passed as a binary blob.

import numpy as np

dishes = [
{"name": "Butter Chicken", "cuisine": "Indian", "vec": [0.9, 0.1, 0.8, 0.2]},
{"name": "Rajma Chawal", "cuisine": "Indian", "vec": [0.8, 0.2, 0.9, 0.1]},
{"name": "Ramen", "cuisine": "Japanese", "vec": [0.1, 0.9, 0.2, 0.8]},
{"name": "Sushi", "cuisine": "Japanese", "vec": [0.2, 0.8, 0.1, 0.9]},
{"name": "Focaccia", "cuisine": "Italian", "vec": [0.5, 0.5, 0.6, 0.4]},
]

for dish in dishes:
vec = np.array(dish["vec"], dtype="float32")
client.query([{
"AddDescriptor": {
"set": SET_NAME,
"properties": {"name": dish["name"], "cuisine": dish["cuisine"]},
}
}], [vec.tobytes()])

print(f"Added {len(dishes)} descriptors")
Added 5 descriptors

FindDescriptor takes a query vector and returns the k nearest neighbors by the set's distance metric.

query_vec = np.array([0.85, 0.15, 0.85, 0.15], dtype="float32")  # close to Indian dishes

response, _ = client.query([{
"FindDescriptor": {
"set": SET_NAME,
"k_neighbors": 3,
"distances": True,
"results": {"all_properties": True},
}
}], [query_vec.tobytes()])

client.print_last_response()
[
{
"FindDescriptor": {
"entities": [
{
"_distance": 0.9966610670089722,
"_set_name": "recipe_search",
"_uniqueid": "3.192.488740",
"cuisine": "Indian",
"name": "Butter Chicken"
},
{
"_distance": 0.9966610670089722,
"_set_name": "recipe_search",
"_uniqueid": "3.193.488760",
"cuisine": "Indian",
"name": "Rajma Chawal"
},
{
"_distance": 0.867941677570343,
"_set_name": "recipe_search",
"_uniqueid": "3.196.488820",
"cuisine": "Italian",
"name": "Focaccia"
}
],
"returned": 3,
"status": 0
}
}
]

Python SDK: Descriptors Wrapper

The Descriptors class in the Python SDK wraps the query language and adds reranking with MMR.

from aperturedb.Descriptors import Descriptors

descriptors = Descriptors(client)

# Basic similarity search — distances available
descriptors.find_similar(
set=SET_NAME,
vector=query_vec,
k_neighbors=3,
distances=True,
)
print("find_similar:")
for r in descriptors.response:
print(f" {r['name']:<20} distance={r['_distance']:.4f}")

print()

# MMR: diversify results (avoids near-duplicates)
# Note: find_similar_mmr uses blobs internally for reranking;
# _distance is not available in the output.
descriptors.find_similar_mmr(
set=SET_NAME,
vector=query_vec,
k_neighbors=3,
fetch_k=5,
lambda_mult=0.5, # 0.0 = max diversity, 1.0 = similarity only
)
print("find_similar_mmr (diversified):")
for r in descriptors.response:
print(f" {r['name']:<20} cuisine={r['cuisine']}")
find_similar:
Butter Chicken distance=0.9967
Rajma Chawal distance=0.9967
Focaccia distance=0.8679

find_similar_mmr (diversified):
Butter Chicken cuisine=Indian
Rajma Chawal cuisine=Indian
Focaccia cuisine=Italian

Cleanup

client.query([{"DeleteDescriptorSet": {"with_name": SET_NAME}}])
client.print_last_response()
[
{
"DeleteDescriptorSet": {
"count": 1,
"status": 0
}
}
]

Next Steps

Replace the synthetic vectors above with real embeddings from your data:

Data typeNotebook
Text / documentsRecipe Text Search — sentence-transformers on Cookbook dish descriptions
PDFWork with PDFs — chunk, embed, and search a PDF blob
ImagesImage Vector Search — CLIP embeddings on dish images, text-to-image search
Video framesVideo Vector Search — CLIP frame embeddings, text-to-frame search
AudioAudio Vector Search — audio embedding and search
Bulk loadingBulk Embeddings — ParallelLoader for large-scale ingestion
Hybrid searchHybrid Search — combine KNN with metadata filters