In this tutorial, we present how to use classical vision models (i.e.,
SIFT feature extractor) in EvaDB to search for similar images powered by a
vector index. In particular, we focus on retrieving similar images from the
motifs. EvaDB makes it easy to do image search using its built-in support for vision models and vector database systems (e.g.,
To follow along, you will need to set up a local instance of EvaDB via pip.
Connect to EvaDB#
After installing EvaDB, use the following Python code to establish a connection and obtain a
cursor for running
import evadb cursor = evadb.connect().cursor()
We will assume that the input
EvaDB. To download this image dataset and load it into
EvaDB, see the complete image search notebook on Colab.
Create Image Feature Extraction Function#
To create a custom
SiftFeatureExtractor function, use the
CREATE FUNCTION statement. We will assume that the file is downloaded and stored as
sift_feature_extractor.py. Now, run the following query to register this function:
CREATE FUNCTION IF NOT EXISTS SiftFeatureExtractor IMPL 'evadb/udfs/sift_feature_extractor.py'
Create Vector Index for Similar Image Search#
To locate images with similar appearance, we next create an index based on the feature vectors returned by
SiftFeatureExtractor on the loaded images. EvaDB will later use this vector index to quickly returns similar images.
EvaDB lets you connect to your favorite vector database via the
CREATE INDEX statement. In this query, we will create a new index using the
FAISS vector index framework from
The following EvaQL statement creates a vector index on the
SiftFeatureExtractor(data) column in the
CREATE INDEX reddit_sift_image_index ON reddit_dataset (SiftFeatureExtractor(data)) USING FAISS;
Similar Image Search Powered By Vector Index#
EvaQL supports the
ORDER BY and
LIMIT clauses to retrieve the
top-k most similar images for a given image.
EvaDB contains a built-in
Similarity(x, y) function that computes the Euclidean distance between
y. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.
EvaDB’s query optimizer automatically picks the correct vector index to accelerate a given EvaQL query. It uses the vector index created in the prior step to accelerate the following image search query:
SELECT name FROM reddit_dataset ORDER BY Similarity( SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')), SiftFeatureExtractor(data) ) LIMIT 5
This query returns the top-5 most similar images in a
+---------------------------------+ | reddit_dataset.name | |---------------------------------| | reddit-images/g1074_d4mxztt.jpg | | reddit-images/g348_d7ju7dq.jpg | | reddit-images/g1209_ct6bf1n.jpg | | reddit-images/g1190_cln9xzr.jpg | | reddit-images/g1190_clna2x2.jpg | +---------------------------------+
👋 If you are excited about our vision of bringing AI inside databases, consider: