Image Search#
Introduction#
In this tutorial, we present how to use classical vision models (i.e., SIFT feature extractor
) in EvaDB to search for similar images powered by a vector index
. In particular, we focus on retrieving similar images from the Reddit
dataset that contain similar motifs
. EvaDB makes it easy to do image search using its built-in support for vision models and vector database systems (e.g., FAISS
).
Prerequisites#
To follow along, you will need to set up a local instance of EvaDB via pip.
Connect to EvaDB#
After installing EvaDB, use the following Python code to establish a connection and obtain a cursor
for running EvaQL
queries.
import evadb
cursor = evadb.connect().cursor()
We will assume that the input Reddit
image collection is loaded into EvaDB
. To download this image dataset and load it into EvaDB
, see the complete image search notebook on Colab.
Create Image Feature Extraction Function#
To create a custom SiftFeatureExtractor
function, use the CREATE FUNCTION
statement. We will assume that the file is downloaded and stored as sift_feature_extractor.py
. Now, run the following query to register this function:
CREATE FUNCTION
IF NOT EXISTS SiftFeatureExtractor
IMPL 'evadb/udfs/sift_feature_extractor.py'
Create Vector Index for Similar Image Search#
To locate images with similar appearance, we next create an index based on the feature vectors returned by SiftFeatureExtractor
on the loaded images. EvaDB will later use this vector index to quickly returns similar images.
EvaDB lets you connect to your favorite vector database via the CREATE INDEX
statement. In this query, we will create a new index using the FAISS
vector index framework from Meta
.
The following EvaQL statement creates a vector index on the SiftFeatureExtractor(data)
column in the reddit_dataset
table:
CREATE INDEX reddit_sift_image_index
ON reddit_dataset (SiftFeatureExtractor(data))
USING FAISS;
Similar Image Search Powered By Vector Index#
EvaQL supports the ORDER BY
and LIMIT
clauses to retrieve the top-k
most similar images for a given image.
EvaDB contains a built-in Similarity(x, y)
function that computes the Euclidean distance between x
and y
. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.
EvaDBโs query optimizer automatically picks the correct vector index to accelerate a given EvaQL query. It uses the vector index created in the prior step to accelerate the following image search query:
SELECT name FROM reddit_dataset ORDER BY
Similarity(
SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')),
SiftFeatureExtractor(data)
)
LIMIT 5
This query returns the top-5 most similar images in a DataFrame
:
+---------------------------------+
| reddit_dataset.name |
|---------------------------------|
| reddit-images/g1074_d4mxztt.jpg |
| reddit-images/g348_d7ju7dq.jpg |
| reddit-images/g1209_ct6bf1n.jpg |
| reddit-images/g1190_cln9xzr.jpg |
| reddit-images/g1190_clna2x2.jpg |
+---------------------------------+
Whatโs Next?#
๐ If you are excited about our vision of bringing AI inside databases, consider:
๐ joining our Slack: https://evadb.ai/slack
๐ following us on Github: https://evadb.ai/github
๐ฆ following us on Twitter: https://evadb.ai/twitter
๐ following us on Medium: https://evadb.ai/blog
๐ฅ๏ธ contributing to EvaDB: https://evadb.ai/github

Language Models (๐ฆ) and Databases#