Similarity search for motif mining#
In this tutorial, we demonstrate how to utilize the similarity functionality to discover images with similar motifs from a collection of Reddit images. We employ the classic SIFT
feature to identify images with a strikingly similar appearance (image-level pipeline).
Additionally, we extend the pipeline by incorporating an object detection model, YOLO
, in combination with the SIFT feature. This enables us to identify objects within the images that exhibit a similar appearance (object-level similarity).
To illustrate the seamless integration of different vector stores, we leverage the power of multiple vector stores, namely FAISS
and QDRANT
, within evadb. This demonstrates the ease with which you can utilize diverse vector stores to construct indexes, enhancing your similarity search experience.
Run on Google Colab | View source on GitHub | Download notebook |
Connect to EvaDB#
%pip install evadb
import evadb
cursor = evadb.connect().cursor()
Requirement already satisfied: evadb in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (0.2.7+dev)
Requirement already satisfied: numpy>=1.19.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.24.3)
Requirement already satisfied: pandas>=1.1.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.0.1)
Requirement already satisfied: opencv-contrib-python-headless>=4.6.0.66 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (4.7.0.72)
Requirement already satisfied: Pillow>=8.4.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (9.5.0)
Requirement already satisfied: sqlalchemy<2.0.0,>=1.4.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.4.48)
Requirement already satisfied: sqlalchemy-utils>=0.36.6 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.41.1)
Requirement already satisfied: lark>=1.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.1.5)
Requirement already satisfied: pyyaml>=5.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (6.0)
Requirement already satisfied: importlib-metadata<5.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (4.13.0)
Requirement already satisfied: ray>=1.13.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.4.0)
Requirement already satisfied: aenum>=2.2.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (3.1.12)
Requirement already satisfied: diskcache>=5.4.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (5.6.1)
Requirement already satisfied: eva-decord>=0.6.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.6.1)
Requirement already satisfied: boto3 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.26.133)
Requirement already satisfied: nest-asyncio in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.5.6)
Requirement already satisfied: langchain in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.0.179)
Requirement already satisfied: pymupdf in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.22.3)
Requirement already satisfied: pdfminer.six in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (20221105)
Requirement already satisfied: sentence-transformers in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.2.2)
Requirement already satisfied: torch>=1.10.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.0.1)
Requirement already satisfied: torchvision>=0.11.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.15.2)
Requirement already satisfied: faiss-cpu in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (1.7.4)
Requirement already satisfied: facenet-pytorch>=2.5.2 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.5.3)
Requirement already satisfied: ipython<8.13.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (8.12.2)
Requirement already satisfied: thefuzz in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.19.0)
Requirement already satisfied: ultralytics>=8.0.93 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (8.0.93)
Requirement already satisfied: transformers>=4.27.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (4.29.1)
Requirement already satisfied: openai>=0.27.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.27.6)
Requirement already satisfied: timm>=0.6.13 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (0.9.1)
Requirement already satisfied: norfair>=2.2.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from evadb) (2.2.0)
Requirement already satisfied: requests in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from facenet-pytorch>=2.5.2->evadb) (2.30.0)
Requirement already satisfied: zipp>=0.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from importlib-metadata<5.0->evadb) (3.15.0)
Requirement already satisfied: backcall in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (0.2.0)
Requirement already satisfied: decorator in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (4.4.2)
Requirement already satisfied: jedi>=0.16 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (0.18.2)
Requirement already satisfied: matplotlib-inline in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (0.1.6)
Requirement already satisfied: pickleshare in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (3.0.38)
Requirement already satisfied: pygments>=2.4.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (2.15.1)
Requirement already satisfied: stack-data in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (0.6.2)
Requirement already satisfied: traitlets>=5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (5.9.0)
Requirement already satisfied: pexpect>4.3 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ipython<8.13.0->evadb) (4.8.0)
Requirement already satisfied: filterpy<2.0.0,>=1.4.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from norfair>=2.2.0->evadb) (1.4.5)
Requirement already satisfied: rich<13.0.0,>=9.10.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from norfair>=2.2.0->evadb) (12.6.0)
Requirement already satisfied: scipy>=1.5.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from norfair>=2.2.0->evadb) (1.10.1)
Requirement already satisfied: tqdm in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from openai>=0.27.4->evadb) (4.65.0)
Requirement already satisfied: aiohttp in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from openai>=0.27.4->evadb) (3.8.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pandas>=1.1.5->evadb) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pandas>=1.1.5->evadb) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pandas>=1.1.5->evadb) (2023.3)
Requirement already satisfied: attrs in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (21.4.0)
Requirement already satisfied: click>=7.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (8.1.3)
Requirement already satisfied: filelock in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (3.12.0)
Requirement already satisfied: jsonschema in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (4.17.3)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (1.0.5)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (4.23.0)
Requirement already satisfied: aiosignal in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (1.3.1)
Requirement already satisfied: frozenlist in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (1.3.3)
Requirement already satisfied: virtualenv<20.21.1,>=20.0.24 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (20.21.0)
Requirement already satisfied: packaging in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (23.1)
Requirement already satisfied: grpcio<=1.51.3,>=1.42.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ray>=1.13.0->evadb) (1.51.3)
Requirement already satisfied: greenlet!=0.4.17 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from sqlalchemy<2.0.0,>=1.4.0->evadb) (2.0.2)
Requirement already satisfied: huggingface-hub in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from timm>=0.6.13->evadb) (0.14.1)
Requirement already satisfied: safetensors in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from timm>=0.6.13->evadb) (0.3.1)
Requirement already satisfied: typing-extensions in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (4.5.0)
Requirement already satisfied: sympy in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (1.12)
Requirement already satisfied: networkx in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (3.1)
Requirement already satisfied: jinja2 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (3.0.3)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.7.99)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.7.101)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (8.5.0.96)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.10.3.66)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (10.2.10.91)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.4.0.1)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.7.4.91)
Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (2.14.3)
Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (11.7.91)
Requirement already satisfied: triton==2.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from torch>=1.10.0->evadb) (2.0.0)
Requirement already satisfied: setuptools in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.10.0->evadb) (67.7.2)
Requirement already satisfied: wheel in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.10.0->evadb) (0.38.4)
Requirement already satisfied: cmake in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.10.0->evadb) (3.26.3)
Requirement already satisfied: lit in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.10.0->evadb) (16.0.3)
Requirement already satisfied: regex!=2019.12.17 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from transformers>=4.27.4->evadb) (2023.5.5)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from transformers>=4.27.4->evadb) (0.13.3)
Requirement already satisfied: matplotlib>=3.2.2 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (3.7.1)
Requirement already satisfied: opencv-python>=4.6.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (4.7.0.72)
Requirement already satisfied: seaborn>=0.11.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (0.12.2)
Requirement already satisfied: psutil in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (5.9.5)
Requirement already satisfied: thop>=0.1.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (0.1.1.post2209072238)
Requirement already satisfied: sentry-sdk in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from ultralytics>=8.0.93->evadb) (1.22.2)
Requirement already satisfied: botocore<1.30.0,>=1.29.133 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from boto3->evadb) (1.29.133)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from boto3->evadb) (1.0.1)
Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from boto3->evadb) (0.6.1)
Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (4.0.2)
Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (0.5.7)
Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (2.8.4)
Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (1.2.4)
Requirement already satisfied: pydantic<2,>=1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (1.10.7)
Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from langchain->evadb) (8.2.2)
Requirement already satisfied: charset-normalizer>=2.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pdfminer.six->evadb) (2.1.1)
Requirement already satisfied: cryptography>=36.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pdfminer.six->evadb) (40.0.2)
Requirement already satisfied: scikit-learn in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from sentence-transformers->evadb) (1.2.2)
Requirement already satisfied: nltk in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from sentence-transformers->evadb) (3.8.1)
Requirement already satisfied: sentencepiece in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from sentence-transformers->evadb) (0.1.99)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from aiohttp->openai>=0.27.4->evadb) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from aiohttp->openai>=0.27.4->evadb) (1.9.2)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from botocore<1.30.0,>=1.29.133->boto3->evadb) (1.26.15)
Requirement already satisfied: cffi>=1.12 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from cryptography>=36.0.0->pdfminer.six->evadb) (1.15.1)
Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain->evadb) (3.19.0)
Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain->evadb) (1.5.1)
Requirement already satisfied: typing-inspect>=0.4.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain->evadb) (0.9.0)
Requirement already satisfied: fsspec in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from huggingface-hub->timm>=0.6.13->evadb) (2023.5.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from jedi>=0.16->ipython<8.13.0->evadb) (0.8.3)
Requirement already satisfied: contourpy>=1.0.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from matplotlib>=3.2.2->ultralytics>=8.0.93->evadb) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from matplotlib>=3.2.2->ultralytics>=8.0.93->evadb) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from matplotlib>=3.2.2->ultralytics>=8.0.93->evadb) (4.39.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from matplotlib>=3.2.2->ultralytics>=8.0.93->evadb) (1.4.4)
Requirement already satisfied: pyparsing>=2.3.1 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from matplotlib>=3.2.2->ultralytics>=8.0.93->evadb) (3.0.9)
Requirement already satisfied: ptyprocess>=0.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from pexpect>4.3->ipython<8.13.0->evadb) (0.7.0)
Requirement already satisfied: wcwidth in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython<8.13.0->evadb) (0.2.6)
Requirement already satisfied: six>=1.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>=1.1.5->evadb) (1.16.0)
Requirement already satisfied: idna<4,>=2.5 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from requests->facenet-pytorch>=2.5.2->evadb) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from requests->facenet-pytorch>=2.5.2->evadb) (2023.5.7)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from rich<13.0.0,>=9.10.0->norfair>=2.2.0->evadb) (0.9.1)
Requirement already satisfied: distlib<1,>=0.3.6 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray>=1.13.0->evadb) (0.3.6)
Requirement already satisfied: platformdirs<4,>=2.4 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray>=1.13.0->evadb) (3.5.1)
Requirement already satisfied: MarkupSafe>=2.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from jinja2->torch>=1.10.0->evadb) (2.1.2)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from jsonschema->ray>=1.13.0->evadb) (0.19.3)
Requirement already satisfied: joblib in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from nltk->sentence-transformers->evadb) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from scikit-learn->sentence-transformers->evadb) (3.1.0)
Requirement already satisfied: executing>=1.2.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from stack-data->ipython<8.13.0->evadb) (1.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from stack-data->ipython<8.13.0->evadb) (2.2.1)
Requirement already satisfied: pure-eval in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from stack-data->ipython<8.13.0->evadb) (0.2.2)
Requirement already satisfied: mpmath>=0.19 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from sympy->torch>=1.10.0->evadb) (1.3.0)
Requirement already satisfied: pycparser in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six->evadb) (2.21)
Requirement already satisfied: mypy-extensions>=0.3.0 in /home/jarulraj3/eva/test_evadb/lib/python3.10/site-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain->evadb) (1.0.0)
Note: you may need to restart the kernel to use updated packages.
Download reddit dataset#
!wget -nc https://www.dropbox.com/scl/fo/fcj6ojmii0gw92zg3jb2s/h\?dl\=1\&rlkey\=j3kj1ox4yn5fhonw06v0pn7r9 -O reddit-images.zip
!unzip -o reddit-images.zip -d reddit-images
File ‘reddit-images.zip’ already there; not retrieving.
Archive: reddit-images.zip
warning: stripped absolute path spec from /
mapname: conversion of failed
extracting: reddit-images/g348_d7jgzgf.jpg
extracting: reddit-images/g348_d7jphyc.jpg
extracting: reddit-images/g348_d7ju7dq.jpg
extracting: reddit-images/g348_d7jhhs3.jpg
extracting: reddit-images/g1074_d4n1lmn.jpg
extracting: reddit-images/g1074_d4mxztt.jpg
extracting: reddit-images/g1074_d4n60oy.jpg
extracting: reddit-images/g1074_d4n6fgs.jpg
extracting: reddit-images/g1190_cln9xzr.jpg
extracting: reddit-images/g1190_cln97xm.jpg
extracting: reddit-images/g1190_clna260.jpg
extracting: reddit-images/g1190_clna2x2.jpg
extracting: reddit-images/g1190_clna91w.jpg
extracting: reddit-images/g1190_clnad42.jpg
extracting: reddit-images/g1190_clnajd7.jpg
extracting: reddit-images/g1190_clnapoy.jpg
extracting: reddit-images/g1190_clnarjl.jpg
extracting: reddit-images/g1190_clnavnu.jpg
extracting: reddit-images/g1190_clnbalu.jpg
extracting: reddit-images/g1190_clnbf07.jpg
extracting: reddit-images/g1190_clnc4uy.jpg
extracting: reddit-images/g1190_clncot0.jpg
extracting: reddit-images/g1190_clndsnu.jpg
extracting: reddit-images/g1190_clnce4b.jpg
extracting: reddit-images/g1209_ct65pvl.jpg
extracting: reddit-images/g1209_ct66erw.jpg
extracting: reddit-images/g1209_ct67oqk.jpg
extracting: reddit-images/g1209_ct6a0g5.jpg
extracting: reddit-images/g1209_ct6bf1n.jpg
extracting: reddit-images/g1418_cj3o1h6.jpg
extracting: reddit-images/g1418_cj3om3h.jpg
extracting: reddit-images/g1418_cj3qysz.jpg
extracting: reddit-images/g1418_cj3r4gw.jpg
extracting: reddit-images/g1418_cj3z7jw.jpg
Load all images into evadb#
response = cursor.query("DROP TABLE IF EXISTS reddit_dataset;").df()
cursor.query(
"LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;"
).df()
06-07-2023 18:29:29 WARNING[drop_executor:drop_executor.py:exec:0045] Table: reddit_dataset does not exist
0 | |
---|---|
0 | Number of loaded IMAGE: 34 |
Register a SIFT FeatureExtractor#
It uses kornia
library to extract sift features for each image
%pip install kornia --quiet
Note: you may need to restart the kernel to use updated packages.
cursor.query("""CREATE UDF IF NOT EXISTS SiftFeatureExtractor
IMPL '../evadb/udfs/sift_feature_extractor.py'""").df()
0 | |
---|---|
0 | UDF SiftFeatureExtractor successfully added to... |
# Keep track of which image gets the most votes
from collections import Counter
vote = Counter()
Image-level similarity search pipeline.#
This pipeline creates one vector per image. Next, we should breakdown steps how we build the index and search similar vectors using the index.
#1. Create index for the entire image
cursor.query("""CREATE INDEX reddit_sift_image_index
ON reddit_dataset (SiftFeatureExtractor(data))
USING FAISS""").df()
0 | |
---|---|
0 | Index reddit_sift_image_index successfully add... |
#2. Search similar vectors
response = cursor.query("""SELECT name FROM reddit_dataset ORDER BY
Similarity(
SiftFeatureExtractor(Open('reddit-images/g1190_clna260.jpg')),
SiftFeatureExtractor(data)
)
LIMIT 5""").df()
#3. Update votes
for i in range(len(response)):
vote[response["reddit_dataset.name"][i]] += 1
print(vote)
Counter({'reddit-images/g1190_clna260.jpg': 1, 'reddit-images/g1190_clndsnu.jpg': 1, 'reddit-images/g1190_clna91w.jpg': 1, 'reddit-images/g1190_clnc4uy.jpg': 1, 'reddit-images/g1190_cln97xm.jpg': 1})
Object-level similarity search pipeline.#
This pipeline detects objects within images and generates vectors exclusively from the cropped objects. The index is then constructed using these vectors. To showcase the versatility of evadb
, we leverage Qdrant
vector store specifically for building this index. This demonstrates how seamlessly you can leverage different vector stores within evadb.
1. Extract all the object using Yolo
from the images#
create_index_query = """
CREATE MATERIALIZED VIEW IF NOT EXISTS
reddit_object_table (name, data, bboxes,labels)
AS SELECT name, data, bboxes, labels FROM reddit_dataset
JOIN LATERAL UNNEST(Yolo(data)) AS Obj(labels, bboxes, scores)"""
cursor.query(create_index_query).df()
2023-06-07 18:29:40,013 INFO worker.py:1625 -- Started a local Ray instance.
2. Build an index on the feature vectors of the extracted objects#
cursor.query("""CREATE INDEX reddit_sift_object_index
ON reddit_object_table (SiftFeatureExtractor(Crop(data, bboxes)))
USING QDRANT""").df()
0 | |
---|---|
0 | Index reddit_sift_object_index successfully ad... |
# Create a cropped images (We are actively working on features to allow
# us to not do this outside SQL)
response = (
cursor.query(
"LOAD IMAGE 'reddit-images/g1190_clna260.jpg' INTO reddit_search_image_dataset"
)
.df()
)
print(response)
response = (
cursor.query("SELECT Yolo(data).bboxes FROM reddit_search_image_dataset;")
.df()
)
print(response)
import cv2
import pathlib
bboxes = response["yolo.bboxes"][0]
img = cv2.imread("reddit-images/g1190_clna260.jpg")
pathlib.Path("reddit-images/search-object/").mkdir(parents=True, exist_ok=True)
for i, bbox in enumerate(bboxes):
xmin, ymin, xmax, ymax = bbox
xmin, ymin, xmax, ymax = int(xmin), int(ymin), int(xmax), int(ymax)
cropped_img = img[ymin:ymax, xmin:xmax]
cv2.imwrite(f"reddit-images/search-object/search-{i}.jpg", cropped_img)
0
0 Number of loaded IMAGE: 1
yolo.bboxes
0 [[257.2467956542969, 256.8749084472656, 457.67...
3. Retrieve using object-level similarity search#
#4.
import os
for path in os.listdir("reddit-images/search-object/"):
path = "reddit-images/search-object/" + path
query = f"""SELECT name FROM reddit_object_table ORDER BY
Similarity(
SiftFeatureExtractor(Open('{path}')),
SiftFeatureExtractor(data)
)
LIMIT 1"""
response = cursor.query(query).df()
for i in range(len(response)):
vote[response["reddit_object_table.name"][i]] += 0.5
print(response)
06-07-2023 18:30:07 WARNING[batch:batch.py:merge_column_wise:0266] Duplicated column name detected siftfeatureextractor.features
0 [[0.08674885, 0.085803166, 0.03718313, 0.10726... \
1 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
2 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
3 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
4 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
.. ...
102 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
103 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
104 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
105 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
106 [[0.08674885, 0.085803166, 0.03718313, 0.10726...
siftfeatureextractor.features
0 [[0.029336654, 0.03975001, 0.049383506, 0.0270...
1 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
2 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
3 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
4 [[0.12801395, 0.08356745, 0.14769939, 0.062067...
.. ...
102 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
103 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
104 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
105 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
106 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
[107 rows x 2 columns]
reddit_object_table.name
0 reddit-images/g1190_cln9xzr.jpg
06-07-2023 18:30:09 WARNING[batch:batch.py:merge_column_wise:0266] Duplicated column name detected siftfeatureextractor.features
0 [[0.014462836, 0.043761455, 0.12498117, 0.1124... \
1 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
2 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
3 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
4 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
.. ...
102 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
103 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
104 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
105 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
106 [[0.014462836, 0.043761455, 0.12498117, 0.1124...
siftfeatureextractor.features
0 [[0.029336654, 0.03975001, 0.049383506, 0.0270...
1 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
2 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
3 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
4 [[0.12801395, 0.08356745, 0.14769939, 0.062067...
.. ...
102 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
103 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
104 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
105 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
106 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
[107 rows x 2 columns]
reddit_object_table.name
0 reddit-images/g1190_cln9xzr.jpg
06-07-2023 18:30:10 WARNING[batch:batch.py:merge_column_wise:0266] Duplicated column name detected siftfeatureextractor.features
0 [[0.15890582, 0.044900134, 0.025972784, 0.0218... \
1 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
2 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
3 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
4 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
.. ...
102 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
103 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
104 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
105 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
106 [[0.15890582, 0.044900134, 0.025972784, 0.0218...
siftfeatureextractor.features
0 [[0.029336654, 0.03975001, 0.049383506, 0.0270...
1 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
2 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
3 [[0.05483284, 0.10698648, 0.10225655, 0.048905...
4 [[0.12801395, 0.08356745, 0.14769939, 0.062067...
.. ...
102 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
103 [[0.06295063, 0.117861055, 0.11103479, 0.05209...
104 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
105 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
106 [[0.057811715, 0.11047239, 0.104055986, 0.0535...
[107 rows x 2 columns]
reddit_object_table.name
0 reddit-images/g348_d7jgzgf.jpg
Combine the scores from image level and object level similarity to show similar images#
# !pip install matplotlib
import matplotlib.pyplot as plt
# Display top images
vote_list = list(reversed(sorted([(path, count) for path, count in vote.items()], key=lambda x: x[1])))
img_list = [path for path, _ in vote_list]
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=[18,10])
ax[0].imshow(cv2.imread("reddit-images/g1190_clna260.jpg"))
ax[0].set_title("Search")
for i in range(5):
axi = ax[i + 1]
img = cv2.imread(img_list[i])
axi.imshow(img)
axi.set_title(f"Top-{i + 1}")
plt.show()