Contents Menu Expand Light mode Dark mode Auto light/dark mode
EvaDB 0.3.6
EvaDB 0.3.6

Overview

  • Getting Started
    • Installation Options
  • Connect to Database
  • Model inference
  • Concepts
    • Data Sources

Use Cases

  • Sentiment Analysis
  • Question Answering
  • Text Summarization
  • Image Classification
  • Image Search
  • Object Detection
  • Emotion Analysis
  • Home Sale Forecasting

User Reference

  • EvaQL
    • LOAD
    • SELECT
    • EXPLAIN
    • SHOW
    • CREATE
    • DROP
    • INSERT
    • DELETE
    • RENAME
    • USE
  • Python API
    • evadb.connect
    • evadb.EvaDBConnection.cursor
    • evadb.EvaDBCursor.query
    • evadb.EvaDBCursor.df
  • Data Sources
    • PostgreSQL
    • SQLite
    • MySQL
    • MariaDB
  • AI Engines
    • Model Training
    • Time Series Forecasting
    • Hugging Face
    • OpenAI
    • YOLO
    • Custom Model
  • Optimizations

Benchmarks

  • Text Summarization

Developer Reference

  • Contributing to EvaDB
    • Setup Environment
    • Testing
    • Submit a PR
    • Code Style
    • Troubleshooting
  • Debugging EvaDB
    • VSCode Debugger
    • Alternative Debugger
  • Extending EvaDB
    • Structured Data Source Integration
    • Operators
  • Releasing EvaDB
    • Setup PyPI Account
    • Release Guide
  • Architecture Diagram
Back to top
Edit this page

Image Search#

Run on Google Colab View source on GitHub Download notebook


Introduction#

In this tutorial, we present how to use classical vision models (i.e., SIFT feature extractor) in EvaDB to search for similar images powered by a vector index. In particular, we focus on retrieving similar images from the Reddit dataset that contain similar motifs. EvaDB makes it easy to do image search using its built-in support for vision models and vector database systems (e.g., FAISS).

Prerequisites#

To follow along, you will need to set up a local instance of EvaDB via pip.

Connect to EvaDB#

After installing EvaDB, use the following Python code to establish a connection and obtain a cursor for running EvaQL queries.

import evadb
cursor = evadb.connect().cursor()

We will assume that the input Reddit image collection is loaded into EvaDB. To download this image dataset and load it into EvaDB, see the complete image search notebook on Colab.

Create Image Feature Extraction Function#

To create a custom SiftFeatureExtractor function, use the CREATE FUNCTION statement. We will assume that the file is downloaded and stored as sift_feature_extractor.py. Now, run the following query to register this function:

cursor.query("""
    CREATE FUNCTION
    IF NOT EXISTS SiftFeatureExtractor
    IMPL  'evadb/udfs/sift_feature_extractor.py'
""").df()
CREATE FUNCTION
    IF NOT EXISTS SiftFeatureExtractor
    IMPL  'evadb/udfs/sift_feature_extractor.py'

Create Vector Index for Similar Image Search#

To locate images with similar appearance, we next create an index based on the feature vectors returned by SiftFeatureExtractor on the loaded images. EvaDB will later use this vector index to quickly returns similar images.

EvaDB lets you connect to your favorite vector database via the CREATE INDEX statement. In this query, we will create a new index using the FAISS vector index framework from Meta.

The following EvaQL statement creates a vector index on the SiftFeatureExtractor(data) column in the reddit_dataset table:

cursor.query("""
    CREATE INDEX reddit_sift_image_index
    ON reddit_dataset (SiftFeatureExtractor(data))
    USING FAISS;
""").df()
CREATE INDEX reddit_sift_image_index
    ON reddit_dataset (SiftFeatureExtractor(data))
    USING FAISS;

Similar Image Search Powered By Vector Index#

EvaQL supports the ORDER BY and LIMIT clauses to retrieve the top-k most similar images for a given image.

EvaDB contains a built-in Similarity(x, y) function that computets the Euclidean distance between x and y. We will use this function to compare the feature vector of image being search (i.e., the given image) and the feature vectors of all the images in the dataset that is stored in the vector index.

EvaDB’s query optimizer automatically picks the correct vector index to accelerate a given EvaQL query. It uses the vector index created in the prior step to accelerate the following image search query:

query = cursor.query("""
    SELECT name FROM reddit_dataset ORDER BY
    Similarity(
        SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')),
        SiftFeatureExtractor(data)
    )
    LIMIT 5
""").df()
SELECT name FROM reddit_dataset ORDER BY
Similarity(
    SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')),
    SiftFeatureExtractor(data)
)
LIMIT 5

This query returns the top-5 most similar images in a DataFrame:

+---------------------------------+
| reddit_dataset.name             |
|---------------------------------|
| reddit-images/g1074_d4mxztt.jpg |
| reddit-images/g348_d7ju7dq.jpg  |
| reddit-images/g1209_ct6bf1n.jpg |
| reddit-images/g1190_cln9xzr.jpg |
| reddit-images/g1190_clna2x2.jpg |
+---------------------------------+

What’s Next?#

👋 EvaDB’s vision is to bring AI inside your database system and make it easy to build fast AI-powered apps. If you liked this tutorial and are excited about our vision, show some ❤️ by:

  • 🐙 giving a ⭐ for the EvaDB repository on Github: https://github.com/georgia-tech-db/evadb

  • 📟 engaging with the EvaDB community on Slack to ask questions and share your ideas and thoughts: https://evadb.ai/community

  • 🎉 contributing to EvaDB by developing cool applications/integrations: https://github.com/georgia-tech-db/evadb/issues

  • 🐦 following us on Twitter: https://twitter.com/evadb_ai

  • 📝 following us on Medium: https://medium.com/evadb-blog

Next
Object Detection
Previous
Image Classification
Copyright © 2023, EvaDB.
Made with Sphinx and @pradyunsg's Furo
On this page
  • Image Search
    • Introduction
    • Prerequisites
    • Connect to EvaDB
    • Create Image Feature Extraction Function
    • Create Vector Index for Similar Image Search
    • Similar Image Search Powered By Vector Index
    • What’s Next?