Image Embedding Models
ApertureDB stores images and their embeddings together, linked by a graph edge. A KNN query can traverse from matching descriptors directly to image blobs — no separate fetch step.
- Image Vector Search — CLIP embeddings on Cookbook dish images, text-to-image search
- Quick Start — includes CLIP food image search (section 5c)
For setup and client configuration, see Client Configuration. For server setup options, see Server Setup.
CLIP
CLIP embeds images and text into the same vector space, enabling text-to-image and image-to-image search. The clip-ViT-B-32 model from sentence-transformers is the simplest way to use it without PyTorch boilerplate:
pip install -U aperturedb sentence-transformers Pillow requests
import requests
import numpy as np
from PIL import Image
from io import BytesIO
from sentence_transformers import SentenceTransformer
from aperturedb.CommonLibrary import create_connector
client = create_connector()
model = SentenceTransformer("clip-ViT-B-32") # 512-dimensional
# Create DescriptorSet
client.query([{"AddDescriptorSet": {
"name": "food_image_search",
"dimensions": 512,
"engine": "HNSW",
"metric": "CS",
}}])
# Add image + embedding in one transaction
image_url = "https://example.com/butter_chicken.jpg"
resp = requests.get(image_url, timeout=10)
img = Image.open(BytesIO(resp.content)).convert("RGB")
emb = model.encode(img, normalize_embeddings=True).astype("float32")
client.query(
[
{"AddImage": {"url": image_url, "_ref": 1, "properties": {"dish": "Butter Chicken", "cuisine": "Indian"}}},
{"AddDescriptor": {"set": "food_image_search", "connect": {"ref": 1, "class": "has_embedding"}, "properties": {"dish": "Butter Chicken"}}},
],
[emb.tobytes()]
)
Text-to-image search — CLIP text and image embeddings are comparable, so a text query returns visually matching images:
query_emb = model.encode("creamy curry", normalize_embeddings=True).astype("float32")
q = [
{"FindDescriptor": {"set": "food_image_search", "k_neighbors": 5, "distances": True, "_ref": 1}},
{"FindImage": {"is_connected_to": {"ref": 1, "class": "has_embedding"}, "blobs": True, "results": {"all_properties": True}}},
]
response, blobs = client.query(q, [query_emb.tobytes()])
The FindDescriptor → FindImage traversal returns matched images and metadata in one round trip.
Ingesting the Cookbook Dataset
The Cookbook dataset (20+ dish photos) can be ingested with CLIP embeddings in one command using the ApertureDB CLI:
wget https://github.com/aperture-data/Cookbook/raw/refs/heads/main/scripts/load_cookbook_data.sh
bash load_cookbook_data.sh
This ingests all dish images with CLIP ViT-B/16 embeddings stored in a ViT-B/16 DescriptorSet. After ingestion, the Quick Start notebook's section 5c runs text-to-image search over all dish photos.
FaceNet
For a large-scale example with a different model, the CelebA Face Similarity Search walkthrough uses FaceNet embeddings on 200k+ celebrity images with metadata-filtered KNN search (hair color, glasses, age).
Structured Ingestion with DataModels
For bulk ingestion using typed Pydantic schemas, see Structured Ingestion with DataModels. This approach is used in the Cookbook dataset loader and the CelebA similarity search example.
What's Next
- Image Vector Search notebook — end-to-end with the Cookbook dataset
- Bulk Embedding Ingestion — parallel ingestion with
ParallelLoader - Structured Ingestion with DataModels — Pydantic schemas for typed ingestion pipelines