AI/ML Models

As a database, ApertureDB is agnostic to which machine learning models our users utilize to either:

Infer content (e.g. classification, detection) about any data that might already be in ApertureDB or is going to be
Query data to generate new outputs in response to any given query
Finetune a model to work well on organization specific dataset

While building various application examples, we have used a few models ourselves as shown in some examples below.

Semantic Search and Generative Models

Vector or semantic search is a key piece in generating responses to user questions. Step 1 to doing that usually is indexing embeddings for the data on which the queries would be based or could be answered with. Step 2 is feeding those K-near neighbor responses to large (text|vision|multimodal) models to create the responses. The indexed data, queries, and responses can be text, images, videos, documents, or other multimodal data types, especially when working with ApertureDB. Below are a few examples with various data types and models.

Text

We use the Hugging Face Datasets library to load a dataset provided by Cohere. This contains the content of Wikipedia (from November 2023), already cleaned up, chunked, and with pre-generated embeddings.

Cohere Embed-multilingual-v3.0

We know that the Cohere embeddings are 1024-dimensional. See AddDescriptorSet for more information about selecting an engine and metric. Here is a quick code snippet (full example here)

from langchain_cohere import CohereEmbeddings
embeddings = CohereEmbeddings(model="embed-multilingual-v3.0")

emb = embeddings.embed_query("Hello, world!")
print(emb[:10], len(emb))

Meta-Llama-3-8B-Instruct.Q4

For generating responses there's no need to use the same provider as we used for embeddings when doing a vector search. We use GPT4ALL for our website crawl demo

from langchain_community.llms import GPT4All

llm = GPT4All(model="Meta-Llama-3-8B-Instruct.Q4_0.gguf", allow_download=True)

Images

As you are loading images in ApertureDB, you can use the command line tool "adb" to not only ingest images and create additional queries to add attributes like width, height through pretrained models, but you can also extract embeddings that then get added alongside the images to allow of semantic search later.

CLIP Models

https://openai.com/index/clip/

adb ingest from-csv dishes.adb.csv --ingest-type IMAGE --transformer image_properties --transformer clip_pytorch_embeddings

http://192.168.2.4:3002/HowToGuides/Ingestion/Updates

Embeddings extraction

"RN50", "RN101", "RN50x4", "RN50x16", "RN50x64", "ViT-B/32", "ViT-B/16", "ViT-L/14", "ViT-L/14@336px"

Videos

Twelve Labs Marengo

We use the Twelve Labs Marengo models to generate embeddings for video clips and then use the retrieval model to extract embeddings for text or image that the user wants to perform a semantic search to find videos with as shown in this example

Here is a snippet of how you can generate the embeddings and store in ApertureDB.

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)


video_url = "https://storage.googleapis.com/ad-demos-datasets/videos/Ecommerce%20v2.5.mp4"
clips = embeddings

# Instantiate an ApertureDB client
aperturedb_client = Connector(
    host="workshop.datasets.gcp.cloud.aperturedata.io",
    user="admin",
    password=ADB_PASSWORD
)

# Create a descriptor set (collection)
collection = DescriptorSetDataModel(
    name="marengo26", dimensions=len(clips[0]['embedding']))
q, blobs, c = generate_add_query(collection)
result, response, blobs = execute_query(query=q, blobs=blobs, client=aperturedb_client)
print(f"Descriptor set creation: {result=}, {response=}")

# Create and insert the video object with clips and embeddings
video = create_video_object_with_clips(video_url, clips, collection)
q, blobs, c = generate_add_query(video)
result, response, blobs = execute_query(query=q, blobs=blobs, client=aperturedb_client)
print(f"Video insertion: {result=}, {response=}")

Here is how you can retrieve the video clips or the original video by performining a semantic search in ApertureDB using embeddings from the text query.

import struct
from aperturedb.Descriptors import Descriptors
from aperturedb.Query import ObjectType
from aperturedb.NotebookHelpers import display_video_mp4
from IPython.display import display

# Generate a text embedding for our search query
text_embedding = twelvelabs_client.embed.create(
  engine_name="Marengo-retrieval-2.6",
  text="Show me the part which has lot of outfits being displayed",
  text_truncate="none"
)

print("Created a text embedding")
print(f" Engine: {text_embedding.engine_name}")
print(f" Embedding: {text_embedding.text_embedding.float[:5]}...")  # Display first 5 values

# Define the descriptor set we'll search in
descriptorset = "marengo26"

# Find similar descriptors to the text embedding
descriptors = Descriptors(aperturedb_client)
descriptors.find_similar(
  descriptorset,
  text_embedding.text_embedding.float,
  k_neighbors=3,
  distances=True
)

# Find connected clips to the descriptors
clip_descriptors = descriptors.get_connected_entities(ObjectType.CLIP)

print(f"Found {len(clip_descriptors)} relevant clips")

Inference or Classification

Very often, you need to automatically extract some information from your datasets and models can help you do that.

Object detection

We've used models to detect bounding boxes, polygons, and to classify content of images and videos.

SAM Classifier

The segment anything model (SAM) works very well in identifying detailed shapes in images.

print("\nRunning inference on images from data source: " + data_source + "\n")

imgs  = Images(client)

# Benefit of doing this is - data is downloaded as needed, avoiding unnecessary slowdowns

query = {
    "FindImage": {
        "blobs": True,
        "constraints": {
            "adb_data_source": ["==", data_source]
        },
         "operations": [
            { "type": "resize", "width": 400, "height": 400},
        ],
    }
}
dataset = ApertureDBDataset(client = client, query = [query])

total = len(dataset)
print("Total images in the dataset: ", total)

# Choose a random image
test_index = random.randint(0, total-1)

# You can access any image within this new dataset created above.
img, inference_label = dataset[test_index]
# Using the image queried above
SAMClassifier.display_image(img)

checkpoint = f"{os.path.expanduser('~')}/.cache/SAM/sam_vit_h_4b8939.pth"
model_type = "vit_h"
sam = sam_model_registry[model_type](checkpoint=checkpoint)

mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(img)

R-CNN and RetinaNet

You can also use the more accurate but slower "frcnn-resnet" or the faster but less accurate "frcnn-mobilenet". We have also used RetinaNet to strike a balance between the two.

Face Detection

Facenet

You can detect faces using the Facenet model as we show in the example with celebrity faces from the Kaggle dataset.

# Define some common variables.
from aperturedb.transformers.facenet import generate_embedding
from CelebADataKaggle import CelebADataKaggle


search_set_name = "facenet_pytorch_embeddings"
from aperturedb.CommonLibrary import create_connector
from aperturedb.Utils import Utils

# Connect to the ApertureDB instance.
con = create_connector()

utils = Utils(con)

# Create a new empty descriptor set.
utils.add_descriptorset(search_set_name, 512,
    metric=["L2"], engine="FaissFlat")
from aperturedb.ParallelLoader import ParallelLoader

# Load the CelebA dataset from Kaggle.
dataset = CelebADataKaggle(
        records_count=10000, # In the interest of time, only pick the first 10k images (of ~200k total)
        embedding_generator=generate_embedding, # use facenet to generate embeddings (ie. descriptors)
        search_set_name=search_set_name
        )

# Ingest from the dataset created previously using a ParallelQuery.
loader = ParallelLoader(create_connector())
loader.ingest(dataset, stats=True)

Detecting in Videos

Yolov4

ApertureDB allows storing of not just videos but also clips from videos and frames (simple example). You can use the Yolo models to extract interesting clips from a video.

Fine Tuning

MoViNets

Mobile video networks or MoViNets can operate on videos to recognize actions. We have trained one using the HMDB51 dataset to make it recognize actions in a retail store.

Other Models and Use Cases

Our users of course have used a variety of models like Gemini, ResNet50, Claude, and others. Given ApertureDB makes it easy to store, query, and update different modalities of data, it becomes pretty easy to setup extract, enhance, and load/reload information extracted from any models.

AI/ML Models

Semantic Search and Generative Models​

Text​

Cohere Embed-multilingual-v3.0​

Meta-Llama-3-8B-Instruct.Q4​

Images​

CLIP Models​

Videos​

Twelve Labs Marengo​

Inference or Classification​

Object detection​

SAM Classifier​

R-CNN and RetinaNet​

Face Detection​

Facenet​

Detecting in Videos​

Yolov4​

Fine Tuning​

MoViNets​

Other Models and Use Cases​