EVA DATABASE SYSTEM#

AI-Relational Database System | SQL meets Deep Learning

EVA Banner

pypi_status License Discuss Python Versions


What is EVA?#

EVA is an open-source AI-relational database with first-class support for deep learning models. It aims to support AI-powered database applications that operate on both structured (tables) and unstructured data (videos, text, podcasts, PDFs, etc.) with deep learning models.

EVA accelerates AI pipelines using a collection of optimizations inspired by relational database systems including function caching, sampling, and cost-based operator reordering. It comes with a wide range of models for analyzing unstructured data including image classification, object detection, OCR, face detection, etc. It is fully implemented in Python, and licensed under the Apache license.

EVA supports a AI-oriented query language for analysing unstructured data. Here are some illustrative applications:

If you are wondering why you might need a video database system, start with page on Video Database Systems. It describes how EVA lets users easily make use of deep learning models and how they can reduce money spent on inference on large image or video datasets.

The Getting Started page shows how you can use EVA for different computer vision tasks, and how you can easily extend EVA to support your custom deep learning model in the form of user-defined functions.

The User Guides section contains Jupyter Notebooks that demonstrate how to use various features of EVA. Each notebook includes a link to Google Colab, where you can run the code by yourself.

Key Features#

  1. With EVA, you can easily combine SQL and deep learning models to build next-generation database applications. EVA treats deep learning models as functions similar to traditional SQL functions like SUM().

  2. EVA is extensible by design. You can write an user-defined function (UDF) that wraps arounds your custom deep learning model. In fact, all the built-in models that are included in EVA are written as user-defined functions.

  3. EVA comes with a collection of built-in sampling, caching, and filtering optimizations inspired by relational database systems. These optimizations help speed up queries on large datasets and save money spent on model inference.

Next Steps#

A step-by-step guide to installing EVA and running queries

List of all the query commands supported by EVA

A step-by-step tour of registering a user defined function that wraps around a custom deep learning model


Illustrative EVA Applications#

Traffic Analysis Application using Object Detection Model#

Source Video Query Result

MNIST Digit Recognition using Image Classification Model#

Source Video Query Result

Movie Analysis Application using Face Detection + Emotion Classfication Models#

Source Video Query Result


Community#

Join the EVA community on Slack to ask questions and to share your ideas for improving EVA.

EVA Slack Channel

Video Database System#

Over the last decade, deep learning models have radically changed the world of computer vision. They are accurate on a variety of tasks ranging from image classification to emotion detection. However, there are two challenges that prevent a lot of users from benefiting from these models.

Usability and Application Maintainability#

To use a vision model, the user must do a lot of imperative programming across low-level libraries, like OpenCV and PyTorch. This is a tedious process that often leads to a complex program or Jupyter Notebook that glues together these libraries to accomplish the given task. This programming complexity prevents a lot of people who are experts in other domains from benefiting from these models.

Historically, database systems have been successful because the query language is simple enough in its basic structure that users without prior experience are able to learn a usable subset of the language on their first sitting. EVA supports a declarative SQL-like query language, called EVAQL, that is designed to make it easier for users to leverage these models. With this query language, the user may compose multiple models in a single query to accomplish complicated tasks with minimal programming.

Here is a illustrative query that examines the emotions of actors in a movie by leveraging multiple deep learning models that take care of detecting faces and analyzing the emotions of the detected bounding boxes:

SELECT id, bbox, EmotionDetector(Crop(data, bbox))
FROM Interstellar
     JOIN LATERAL UNNEST(FaceDetector(data)) AS Face(bbox, conf)
WHERE id < 15;

By using a declarative language, the complexity of the program or Jupyter Notebook is significantly reduced. This in turn leads to more maintainable code that allows users to build on top of each other’s queries.

GPU Cost and Human Time#

From a cost standpoint, it is very expensive to run these deep learning models on large image or video datasets. For example, the state-of-the-art object detection model takes multiple GPU-decades to process just a year’s worth of videos from a single traffic monitoring camera. Besides the money spent on hardware, this also increases the time that the user spends waiting for the model inference process to finish.

EVA automatically optimizes the queries to reduce inference cost and query execution time using its Cascades-style query optimizer. EVA’s optimizer is tailored for video analytics. The Cascades-style extensible query optimization framework has worked very well for several decades in SQL database systems. Query optimization is one of the signature components of database systems — the bridge that connects the declarative query language to efficient execution.

Getting Started#

Part 1: Install EVA#

EVA supports Python (versions >= 3.7). To install EVA, we recommend using the pip package manager:

pip install evadb

Launch EVA server#

EVA is based on a client-server architecture. To launch the EVA server, run the following command on the terminal:

eva_server &

Part 2: Start a Jupyter Notebook Client#

Here is an illustrative Jupyter notebook focusing on MNIST image classification using EVA. The notebook works on Google Colab.

Connect to the EVA server#

To connect to the EVA server in the notebook, use the following Python code:

# allow nested asyncio calls for client to connect with server
import nest_asyncio
nest_asyncio.apply()
from eva.server.db_api import connect

# hostname and port of the server where EVA is running
connection = connect(host = '0.0.0.0', port = 5432)

# cursor allows the notebook client to send queries to the server
cursor = connection.cursor()

Load video for analysis#

Download the MNIST video for analysis.

!wget -nc https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4

Use the LOAD statement is used to load a video onto a table in EVA server.

cursor.execute('LOAD VIDEO "mnist.mp4" INTO MNISTVideoTable;')
response = cursor.fetch_all()
print(response)

Part 3: Register an user-defined function (UDF)#

User-defined functions allow us to combine SQL with deep learning models. These functions wrap around deep learning models.

Download the user-defined function for classifying MNIST images.

!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/apps/mnist/eva_mnist_udf.py
cursor.execute("""CREATE UDF IF NOT EXISTS MnistCNN
                  INPUT  (data NDARRAY (3, 28, 28))
                  OUTPUT (label TEXT(2))
                  TYPE  Classification
                  IMPL  'eva_mnist_udf.py';
                """)
response = cursor.fetch_all()
print(response)

Run a query using the newly registered UDF!#

cursor.execute("""SELECT data, MnistCNN(data).label
                  FROM MNISTVideoTable
                  WHERE id = 30;""")
response = cursor.fetch_all()

Visualize the output#

The output of the query is visualized in the notebook.

Part 5: Start a Command Line Client#

Besides the notebook interface, EVA also exports a command line interface for querying the server. This interface allows for quick querying from the terminal:

>>> eva_client
eva=# LOAD VIDEO "mnist.mp4" INTO MNISTVid;
@status: ResponseStatus.SUCCESS
@batch:

0 Video successfully added at location: mnist.p4
@query_time: 0.045

eva=# SELECT id, data FROM MNISTVid WHERE id < 1000;
@status: ResponseStatus.SUCCESS
@batch:
            mnistvid.id     mnistvid.data
    0          0             [[[ 0 2 0]\n [0 0 0]\n...
    1          1             [[[ 2 2 0]\n [1 1 0]\n...
    2          2             [[[ 2 2 0]\n [1 2 2]\n...
    ..       ...
    997        997           [[[ 0 2 0]\n [0 0 0]\n...
    998        998           [[[ 0 2 0]\n [0 0 0]\n...
    999        999           [[[ 2 2 0]\n [1 1 0]\n...

[1000 rows x 2 columns]
@query_time: 0.216

eva=# exit

List of Notebooks#

This section contains Jupyter Notebooks that demonstrate how to use various features of EVA. Each notebook includes a link to Google Colab, where you can run the code by yourself.

Start EVA Server#

Run on Google Colab View source on GitHub Download notebook

Launch EVA server#

We use this notebook for launching the EVA server.

## Install EVA package if needed
# %pip install "evadb[dev]" --quiet

import os
import time
from psutil import process_iter
from signal import SIGTERM
import re
import itertools

def shell(command):
    print(command)
    os.system(command)

def stop_eva_server():
    for proc in process_iter():
        if proc.name() == "eva_server":
            proc.send_signal(SIGTERM)

def launch_eva_server():
    # Stop EVA server if it is running
    stop_eva_server()

    os.environ['GPU_DEVICES'] = '0'

    # Start EVA server
    shell("nohup eva_server >> eva.log 2>&1 &")

    last_few_lines_count = 3
    try:
        with open('eva.log', 'r') as f:
            for lines in itertools.zip_longest(*[f]*last_few_lines_count):
                print(lines)
    except FileNotFoundError:
        pass

    # Wait for server to start
    time.sleep(10)

def connect_to_server():
    from eva.server.db_api import connect
    %pip install nest_asyncio --quiet
    import nest_asyncio
    nest_asyncio.apply()

    # Connect client with server
    connection = connect(host = '127.0.0.1', port = 5432) 
    cursor = connection.cursor()

    return cursor

# Launch server
launch_eva_server()
nohup eva_server > eva.log 2>&1 &

MNIST TUTORIAL#

Run on Google Colab View source on GitHub Download notebook

Start EVA server#

We are reusing the start server notebook for launching the EVA server.

!wget -nc "https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/00-start-eva-server.ipynb"
%run 00-start-eva-server.ipynb
cursor = connect_to_server()
File ‘00-start-eva-server.ipynb’ already there; not retrieving.
nohup eva_server > eva.log 2>&1 &
Note: you may need to restart the kernel to use updated packages.

Downloading the videos#

# Getting MNIST as a video
!wget -nc https://www.dropbox.com/s/yxljxz6zxoqu54v/mnist.mp4
# Getting a udf
!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/apps/mnist/eva_mnist_udf.py
File ‘mnist.mp4’ already there; not retrieving.
File ‘eva_mnist_udf.py’ already there; not retrieving.

Upload the video for analysis#

cursor.execute('DROP TABLE MNISTVid')
response = cursor.fetch_all()
print(response)
cursor.execute('LOAD VIDEO "mnist.mp4" INTO MNISTVid')
response = cursor.fetch_all()
print(response)
02-08-2023 12:40:03 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: DROP TABLE MNISTVid;
02-08-2023 12:40:03 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: LOAD VIDEO "mnist.mp4" INTO MNISTVid;
@status: ResponseStatus.SUCCESS
@batch: 
                                       0
0  Table Successfully dropped: MNISTVid
@query_time: 0.038727404084056616
@status: ResponseStatus.SUCCESS
@batch: 
                            0
0  Number of loaded VIDEO: 1
@query_time: 0.06973895477131009

Visualize Video#

from IPython.display import Video
Video("mnist.mp4", embed=True)

Create an user-defined function (UDF) for analyzing the frames#

cursor.execute("""CREATE UDF IF NOT EXISTS MnistCNN 
                  INPUT  (data NDARRAY (3, 28, 28))
                  OUTPUT (label TEXT(2))
                  TYPE  Classification
                  IMPL  'eva_mnist_udf.py'
        """)
response = cursor.fetch_all()
print(response)
02-08-2023 12:40:04 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: CREATE UDF IF NOT EXISTS MnistCNN                    INPUT  (data NDARRAY (3, 28, 28))                   OUTPUT (label TEXT(2))                   TYPE  Classification                   IMPL  'eva_mnist_udf.py';
@status: ResponseStatus.SUCCESS
@batch: 
                                              0
0  UDF MnistCNN already exists, nothing added.
@query_time: 0.010083530098199844

Run the Image Classification UDF on video#

cursor.execute("""SELECT data, MnistCNN(data).label 
                  FROM MNISTVid
                  WHERE id = 30 OR id = 50 OR id = 70 OR id = 0 OR id = 140""")
response = cursor.fetch_all()
print(response.batch)
02-08-2023 12:40:04 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT data, MnistCNN(data).label                    FROM MNISTVid                   WHERE id = 30 OR id = 50 OR id = 70 OR id = 0 OR id = 140;
                                                                                         mnistvid.data  \
0  [[[ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0...   
1  [[[2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2 2]\n [2 2...   
2  [[[13 13 13]\n [ 2  2  2]\n [ 2  2  2]\n [13 13 13]\n [ 6  6  6]\n [ 0  0  0]\n [ 5  5  5]\n [22...   
3  [[[ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 1  1  1]\n [ 3...   
4  [[[ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0  0  0]\n [ 0...   

  mnistcnn.label  
0              6  
1              2  
2              3  
3              7  
4              5  

Visualize output of query on the video#

# !pip install matplotlib
import matplotlib.pyplot as plt
import numpy as np

# create figure (fig), and array of axes (ax)
fig, ax = plt.subplots(nrows=1, ncols=5, figsize=[6,8])

df = response.batch.frames
for axi in ax.flat:
    idx = np.random.randint(len(df))
    img = df['mnistvid.data'].iloc[idx]
    label = df['mnistcnn.label'].iloc[idx]
    axi.imshow(img)
    
    axi.set_title(f'label: {label}')

plt.show()
_images/f13a4d5a5f84a1e02b4a126a14be9bf9d311a618640e8088bb0f22790f3c5992.png

Object Detection Tutorial#

Run on Google Colab View source on GitHub Download notebook

Start EVA server#

We are reusing the start server notebook for launching the EVA server.

!wget -nc "https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/00-start-eva-server.ipynb"
%run 00-start-eva-server.ipynb
cursor = connect_to_server()
File '00-start-eva-server.ipynb' already there; not retrieving.

nohup eva_server > eva.log 2>&1 &
Note: you may need to restart the kernel to use updated packages.

Download the Videos#

# Getting the video files
!wget -nc https://www.dropbox.com/s/k00wge9exwkfxz6/ua_detrac.mp4?raw=1 -O ua_detrac.mp4
# Getting the Yolo object detector
!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/eva/udfs/yolo_object_detector.py
File 'ua_detrac.mp4' already there; not retrieving.
File 'yolo_object_detector.py' already there; not retrieving.

Load the surveillance videos for analysis#

We use regular expression to load all the videos into the table#

cursor.execute('DROP TABLE ObjectDetectionVideos')
response = cursor.fetch_all()
print(response)
cursor.execute('LOAD VIDEO "ua_detrac.mp4" INTO ObjectDetectionVideos;')
response = cursor.fetch_all()
print(response)
@status: ResponseStatus.SUCCESS
@batch: 
                                                    0
0  Table Successfully dropped: ObjectDetectionVideos
@query_time: 0.03936416003853083
@status: ResponseStatus.SUCCESS
@batch: 
                            0
0  Number of loaded VIDEO: 1
@query_time: 0.07350237295031548

Visualize Video#

from IPython.display import Video
Video("ua_detrac.mp4", embed=True)

Register YOLO Object Detector an an User-Defined Function (UDF) in EVA#

cursor.execute("""CREATE UDF IF NOT EXISTS YoloV5
      INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
      OUTPUT (labels NDARRAY STR(ANYDIM), bboxes NDARRAY FLOAT32(ANYDIM, 4),
                scores NDARRAY FLOAT32(ANYDIM))
      TYPE  Classification
      IMPL  'yolo_object_detector.py';
      """)
response = cursor.fetch_all()
print(response)
@status: ResponseStatus.SUCCESS
@batch: 
                                            0
0  UDF YoloV5 already exists, nothing added.
@query_time: 0.014365370385348797

Run Object Detector on the video#

cursor.execute("""SELECT id, YoloV5(data)
                  FROM ObjectDetectionVideos 
                  WHERE id < 20""")
response = cursor.fetch_all()
print(response)
@status: ResponseStatus.SUCCESS
@batch: 
     objectdetectionvideos.id  \
0                          0   
1                          1   
2                          2   
3                          3   
4                          4   
5                          5   
6                          6   
7                          7   
8                          8   
9                          9   
10                        10   
11                        11   
12                        12   
13                        13   
14                        14   
15                        15   
16                        16   
17                        17   
18                        18   
19                        19   

                                                                                          yolov5.labels  \
0   [car, car, car, car, car, car, car, car, car, person, car, motorcycle, car, truck, car, car, car...   
1   [car, car, car, car, car, car, car, car, car, person, car, car, truck, car, car, car, car, motor...   
2   [car, car, car, car, car, car, car, car, car, car, person, car, truck, car, motorcycle, car, car...   
3   [car, car, car, car, car, car, car, car, car, car, car, car, truck, person, motorcycle, truck, c...   
4   [car, car, car, car, car, car, car, car, car, car, car, person, car, car, motorcycle, truck, car...   
5   [car, car, car, car, car, car, car, car, car, car, car, person, car, car, car, car, motorcycle, ...   
6   [car, car, car, car, car, car, car, car, person, car, car, car, car, car, motorcycle, car, car, ...   
7   [car, car, car, car, car, car, car, car, car, car, person, car, car, car, motorcycle, car, car, ...   
8   [car, car, car, car, car, car, car, car, car, person, car, motorcycle, car, car, car, car, truck...   
9   [car, car, car, car, car, car, car, car, car, car, person, car, car, car, car, truck, car, perso...   
10  [car, car, car, car, car, car, car, car, car, person, car, car, car, car, truck, car, car, car, ...   
11  [car, car, car, car, car, car, car, car, person, car, truck, car, truck, car, car, car, car, car...   
12  [car, car, car, car, car, car, car, car, car, person, car, car, motorcycle, car, car, car, truck...   
13  [car, car, car, car, car, car, car, car, car, truck, car, person, motorcycle, truck, car, car, m...   
14  [car, car, car, car, car, car, car, person, car, car, car, truck, truck, motorcycle, motorcycle,...   
15  [car, car, car, car, car, car, car, person, car, car, car, motorcycle, car, car, car, motorcycle...   
16  [car, car, car, car, car, car, person, car, car, car, car, motorcycle, motorcycle, truck, car, c...   
17  [car, car, car, car, car, car, car, person, car, car, car, car, car, truck, motorcycle, motorcyc...   
18  [car, car, car, car, car, car, car, car, car, person, car, car, car, bus, truck, motorcycle, car...   
19  [car, car, car, car, car, car, car, car, car, person, car, car, car, car, truck, bus, car, car, ...   

                                                                                          yolov5.bboxes  \
0   0                     [830.513916015625, 276.9407958984375, 960.0, 360.9553527832031]
1         ...   
1   0                  [833.4142456054688, 277.43121337890625, 960.0, 363.24542236328125]
1      [61...   
2   0                          [837.3671875, 278.4643859863281, 960.0, 363.9206848144531]
1         ...   
3   0                      [840.7059326171875, 279.1480712890625, 960.0, 360.564208984375]
1        ...   
4   0           [623.8599853515625, 218.58824157714844, 729.417724609375, 280.35791015625]
1        ...   
5   0             [624.716796875, 217.99139404296875, 731.593994140625, 280.8290100097656]
1        ...   
6   0         [625.336669921875, 217.91189575195312, 734.4633178710938, 281.4993896484375]
1        ...   
7   0       [626.2813110351562, 218.29205322265625, 736.9923095703125, 282.62371826171875]
1        ...   
8   0         [628.342041015625, 219.6263427734375, 738.7610473632812, 284.00531005859375]
1        ...   
9   0            [630.125732421875, 221.2857666015625, 740.33349609375, 284.7383728027344]
1        ...   
10  0       [632.170654296875, 222.32720947265625, 743.0787353515625, 284.72711181640625]
1        [...   
11  0          [634.2802734375, 222.06613159179688, 747.0035400390625, 287.9921569824219]
1         ...   
12  0         [636.1023559570312, 222.5532989501953, 748.899658203125, 288.3439636230469]
1         ...   
13  0         [636.18408203125, 222.69622802734375, 751.6583862304688, 287.90509033203125]
1        ...   
14  0        [638.757080078125, 223.4717559814453, 753.6521606445312, 288.96051025390625]
1       [1...   
15  0        [641.4051513671875, 224.3956298828125, 757.6414794921875, 290.0587158203125]
1         ...   
16  0          [644.2322387695312, 225.87051391601562, 761.1083984375, 290.7424011230469]
1         ...   
17  0           [646.2791137695312, 225.9928436279297, 763.2777709960938, 291.8134765625]
1         ...   
18  0      [647.1568603515625, 226.51368713378906, 765.8187866210938, 292.62884521484375]
1         ...   
19  0          [648.61767578125, 227.11984252929688, 768.1045532226562, 293.3794250488281]
1        ...   

                                                                                          yolov5.scores  
0   [0.9019389748573303, 0.8878238201141357, 0.8540899157524109, 0.8021702766418457, 0.7838875651359...  
1   [0.8980059623718262, 0.8685558438301086, 0.8364439606666565, 0.8237311244010925, 0.7718729972839...  
2   [0.8956702351570129, 0.8518660068511963, 0.8487427234649658, 0.8453817963600159, 0.7860085368156...  
3   [0.8803829550743103, 0.866111695766449, 0.8498753905296326, 0.8373444676399231, 0.82564580440521...  
4   [0.8975156545639038, 0.8809236884117126, 0.8435657024383545, 0.8381263613700867, 0.7975137829780...  
5   [0.9156807661056519, 0.8994482159614563, 0.8464253544807434, 0.8432754278182983, 0.7775278091430...  
6   [0.9120901226997375, 0.9068282246589661, 0.8413792848587036, 0.8137449026107788, 0.7970511317253...  
7   [0.9046720266342163, 0.8707322478294373, 0.8454322218894958, 0.8174154162406921, 0.7925598621368...  
8   [0.8893280625343323, 0.8552685379981995, 0.8536511063575745, 0.8232781887054443, 0.7951847910881...  
9   [0.8876322507858276, 0.8557678461074829, 0.8532631993293762, 0.7946798205375671, 0.7720419764518...  
10  [0.8991596102714539, 0.835357129573822, 0.8304201364517212, 0.8152026534080505, 0.78578490018844...  
11  [0.9085388779640198, 0.8469469547271729, 0.8388872742652893, 0.7947012186050415, 0.7883725166320...  
12  [0.9089047908782959, 0.861946165561676, 0.8548004031181335, 0.8538579344749451, 0.80343031883239...  
13  [0.8987026214599609, 0.8819432854652405, 0.8584625720977783, 0.8529505729675293, 0.8020280003547...  
14  [0.8915325999259949, 0.8549790978431702, 0.8541241884231567, 0.8505778312683105, 0.8061172962188...  
15  [0.8875617980957031, 0.863188624382019, 0.8474203944206238, 0.8387583494186401, 0.82478755712509...  
16  [0.8969656229019165, 0.8630248308181763, 0.8239783644676208, 0.8124073147773743, 0.8089075684547...  
17  [0.8984967470169067, 0.8612021207809448, 0.8340322971343994, 0.8170910477638245, 0.8132005929946...  
18  [0.8983427882194519, 0.8600170016288757, 0.8409520983695984, 0.8305366635322571, 0.8134506940841...  
19  [0.8998425006866455, 0.8530007004737854, 0.8488078713417053, 0.8350291848182678, 0.8190277814865...  
@query_time: 7.09188617952168

Visualizing output of the Object Detector on the video#

import cv2
from pprint import pprint
from matplotlib import pyplot as plt

def annotate_video(detections, input_video_path, output_video_path):
    color1=(207, 248, 64)
    color2=(255, 49, 49)
    thickness=4

    vcap = cv2.VideoCapture(input_video_path)
    width = int(vcap.get(3))
    height = int(vcap.get(4))
    fps = vcap.get(5)
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') #codec
    video=cv2.VideoWriter(output_video_path, fourcc, fps, (width,height))

    frame_id = 0
    # Capture frame-by-frame
    # ret = 1 if the video is captured; frame is the image
    ret, frame = vcap.read() 

    while ret:
        df = detections
        df = df[['yolov5.bboxes', 'yolov5.labels']][df.index == frame_id]
        if df.size:
            dfLst = df.values.tolist()
            for bbox, label in zip(dfLst[0][0], dfLst[0][1]):
                x1, y1, x2, y2 = bbox
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                # object bbox
                frame=cv2.rectangle(frame, (x1, y1), (x2, y2), color1, thickness) 
                # object label
                cv2.putText(frame, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color1, thickness) 
                # frame label
                cv2.putText(frame, 'Frame ID: ' + str(frame_id), (700, 500), cv2.FONT_HERSHEY_SIMPLEX, 1.2, color2, thickness) 
            video.write(frame)

            # Stop after twenty frames (id < 20 in previous query)
            if frame_id == 20:
                break

            # Show every fifth frame
            if frame_id % 5 == 0:
                plt.imshow(frame)
                plt.show()

        
        frame_id+=1
        ret, frame = vcap.read()

    video.release()
    vcap.release()
from ipywidgets import Video, Image
input_path = 'ua_detrac.mp4'
output_path = 'video.mp4'

dataframe = response.batch.frames
annotate_video(dataframe, input_path, output_path)
Video.from_file(output_path)
_images/be53987808ddd0051fa459ced93ffad8bdea532ef40513771d1bbd2932e12d38.png _images/d4958b68b52e23637a1114e761f11a791708c8be7c035d2f776ffd5576e382e2.png _images/8bb4329bdd9dec693c13eff39846f343bf513cdc529d111fb1a9e91efaf46d27.png _images/cbcf8b8b5add1ce276ec4b2253d220f214972fe7db8749ecd7891e19abfac86b.png

Dropping an User-Defined Function (UDF)#

cursor.execute("DROP UDF YoloV5;")
response = cursor.fetch_all()
print(response)
@status: ResponseStatus.SUCCESS
@batch: 
                                  0
0  UDF YoloV5 successfully dropped
@query_time: 0.019077396020293236

EMOTION ANALYSIS#

Run on Google Colab View source on GitHub Download notebook

Start EVA Server#

We are reusing the start server notebook for launching the EVA server

!wget -nc "https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/00-start-eva-server.ipynb"
%run 00-start-eva-server.ipynb
cursor = connect_to_server()
File ‘00-start-eva-server.ipynb’ already there; not retrieving.
nohup eva_server > eva.log 2>&1 &
Note: you may need to restart the kernel to use updated packages.

Video Files#

getting some video files to test

# A video of a happy person
!wget -nc https://www.dropbox.com/s/gzfhwmib7u804zy/defhappy.mp4?raw=1 -O defhappy.mp4

# Adding Emotion detection
!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/eva/udfs/emotion_detector.py

# Adding Face Detector
!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/eva/udfs/face_detector.py
File ‘defhappy.mp4’ already there; not retrieving.
File ‘emotion_detector.py’ already there; not retrieving.
File ‘face_detector.py’ already there; not retrieving.

Adding the video file to EVADB for analysis#

cursor.execute('DROP TABLE HAPPY')
response = cursor.fetch_all()
print(response)
cursor.execute('LOAD VIDEO "defhappy.mp4" INTO HAPPY')
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:00 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: DROP TABLE HAPPY;
02-08-2023 12:41:00 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: LOAD VIDEO "defhappy.mp4" INTO HAPPY;
@status: ResponseStatus.SUCCESS
@batch: 
                                    0
0  Table Successfully dropped: HAPPY
@query_time: 0.039577786810696125
@status: ResponseStatus.SUCCESS
@batch: 
                            0
0  Number of loaded VIDEO: 1
@query_time: 0.09385837288573384

Visualize Video#

from IPython.display import Video
Video("defhappy.mp4", height=450, width=800, embed=True)

Create an user-defined function(UDF) for analyzing the frames#

cursor.execute("""CREATE UDF IF NOT EXISTS EmotionDetector 
        INPUT (frame NDARRAY UINT8(3, ANYDIM, ANYDIM)) 
        OUTPUT (labels NDARRAY STR(ANYDIM), scores NDARRAY FLOAT32(ANYDIM)) 
        TYPE  Classification IMPL 'emotion_detector.py';
        """)
response = cursor.fetch_all()
print(response)
cursor.execute("""CREATE UDF IF NOT EXISTS FaceDetector
                  INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
                  OUTPUT (bboxes NDARRAY FLOAT32(ANYDIM, 4),
                          scores NDARRAY FLOAT32(ANYDIM))
                  TYPE  FaceDetection
                  IMPL  'face_detector.py';
        """)
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:01 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: CREATE UDF IF NOT EXISTS EmotionDetector          INPUT (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))          OUTPUT (labels NDARRAY STR(ANYDIM), scores NDARRAY FLOAT32(ANYDIM))          TYPE  Classification IMPL 'emotion_detector.py';
02-08-2023 12:41:01 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: CREATE UDF IF NOT EXISTS FaceDetector                   INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))                   OUTPUT (bboxes NDARRAY FLOAT32(ANYDIM, 4),                           scores NDARRAY FLOAT32(ANYDIM))                   TYPE  FaceDetection                   IMPL  'face_detector.py';
@status: ResponseStatus.SUCCESS
@batch: 
                                                     0
0  UDF EmotionDetector already exists, nothing added.
@query_time: 0.010804984718561172
@status: ResponseStatus.SUCCESS
@batch: 
                                                  0
0  UDF FaceDetector already exists, nothing added.
@query_time: 0.0055634258314967155

Run the Face Detection UDF on video#

cursor.execute("""SELECT id, FaceDetector(data)
                  FROM HAPPY WHERE id<10""")
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:01 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT id, FaceDetector(data)                   FROM HAPPY WHERE id<10;
@status: ResponseStatus.SUCCESS
@batch: 
    happy.id                          facedetector.bboxes facedetector.scores
0         0      [[493.4729  89.1079 769.5677 441.7005]]         [0.9997701]
1         1  [[501.57224  89.75874 773.5567  442.99564]]        [0.99984527]
2         2  [[503.33987  92.72798 773.9184  444.89667]]         [0.9998871]
3         3  [[506.64032  91.81204 774.7923  446.76813]]         [0.9994814]
4         4  [[508.46338  93.29222 777.00275 448.40146]]        [0.99958366]
5         5  [[506.592    99.09416 772.56396 445.32654]]        [0.99950814]
6         6  [[508.85898  98.99975 774.42487 450.29272]]          [0.999731]
7         7  [[512.3384   98.99459 781.2488  451.23007]]         [0.9997571]
8         8  [[513.3356   97.60012 783.34937 451.96744]]        [0.99983895]
9         9  [[514.25696  98.21023 784.0434  452.2896 ]]         [0.9998286]
@query_time: 0.7188037186861038

Run the Emotion Detection UDF on the outputs of the Face Detection UDF#

cursor.execute("""SELECT id, bbox, EmotionDetector(Crop(data, bbox)) 
                  FROM HAPPY JOIN LATERAL  UNNEST(FaceDetector(data)) AS Face(bbox, conf)  
                  WHERE id < 15;""")
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:01 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT id, bbox, EmotionDetector(Crop(data, bbox))                    FROM HAPPY JOIN LATERAL  UNNEST(FaceDetector(data)) AS Face(bbox, conf)                     WHERE id < 15;
@status: ResponseStatus.SUCCESS
@batch: 
     happy.id                                    Face.bbox  \
0          0      [493.4729, 89.1079, 769.5677, 441.7005]   
1          1   [501.57224, 89.75874, 773.5567, 442.99564]   
2          2     [503.33987, 92.728, 773.9184, 444.89667]   
3          3   [506.64032, 91.81204, 774.7923, 446.76813]   
4          4  [508.46338, 93.29221, 777.00275, 448.40146]   
5          5   [506.592, 99.094154, 772.56396, 445.32654]   
6          6  [508.85898, 98.99975, 774.42487, 450.29272]   
7          7    [512.3384, 98.99459, 781.2488, 451.23007]   
8          8   [513.3356, 97.60013, 783.34937, 451.96744]   
9          9    [514.25696, 98.21022, 784.0434, 452.2896]   
10        10    [515.7314, 97.13689, 786.78296, 452.3648]   
11        10  [50.10141, 524.19183, 120.75999, 599.16064]   
12        11    [513.3706, 96.37443, 784.8802, 452.24167]   
13        12      [512.84, 95.54421, 785.3812, 453.33618]   
14        13    [512.6082, 94.14093, 785.8945, 452.27637]   
15        14   [512.9006, 93.912674, 786.7881, 453.10126]   

   emotiondetector.labels  emotiondetector.scores  
0                   happy                0.999640  
1                   happy                0.999674  
2                   happy                0.999696  
3                   happy                0.999686  
4                   happy                0.999690  
5                   happy                0.999709  
6                   happy                0.999732  
7                   happy                0.999721  
8                   happy                0.999709  
9                   happy                0.999718  
10                  happy                0.999701  
11                neutral                0.998291  
12                  happy                0.999687  
13                  happy                0.999676  
14                  happy                0.999639  
15                  happy                0.999649  
@query_time: 1.141005412209779
import cv2
from pprint import pprint
from matplotlib import pyplot as plt

def annotate_video(detections, input_video_path, output_video_path):
    color1=(207, 248, 64)
    color2=(255, 49, 49)
    thickness=4

    vcap = cv2.VideoCapture(input_video_path)
    width = int(vcap.get(3))
    height = int(vcap.get(4))
    fps = vcap.get(5)
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') #codec
    video=cv2.VideoWriter(output_video_path, fourcc, fps, (width,height))

    frame_id = 0
    # Capture frame-by-frame
    # ret = 1 if the video is captured; frame is the image
    ret, frame = vcap.read() 

    while ret:
        df = detections
        df = df[['Face.bbox', 'emotiondetector.labels', 'emotiondetector.scores']][df.index == frame_id]
        if df.size:
            
            x1, y1, x2, y2 = df['Face.bbox'].values[0]
            label = df['emotiondetector.labels'].values[0]
            score = df['emotiondetector.scores'].values[0]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            # object bbox
            frame=cv2.rectangle(frame, (x1, y1), (x2, y2), color1, thickness) 
            # object label
            cv2.putText(frame, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color1, thickness)
            # object score
            cv2.putText(frame, str(round(score, 5)), (x1+120, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color1, thickness)
            # frame label
            cv2.putText(frame, 'Frame ID: ' + str(frame_id), (700, 500), cv2.FONT_HERSHEY_SIMPLEX, 1.2, color2, thickness) 
        
            video.write(frame)
            # Show every fifth frame
            if frame_id % 5 == 0:
                plt.imshow(frame)
                plt.show()
        
        frame_id+=1
        ret, frame = vcap.read()

    video.release()
    vcap.release()
from ipywidgets import Video, Image
input_path = 'defhappy.mp4'
output_path = 'video.mp4'

dataframe = response.batch.frames
annotate_video(dataframe, input_path, output_path)
_images/d4e37633cf62fe265ba2f76c1d815fde264780a18d162828aff17bf7b53b6f8d.png _images/2ffa455b66239c3f1f0cb0f442abbf028e1543808d11a747765c3a7e62dbbe74.png _images/156452542832d2b6218ea652fb80ed16c0e76a5f4d6eb4934f4780ecc1a9ee31.png _images/9b36543db9960da6ba1c51486b7192d7ebb3e5bddc2f8f9e1d19d9b557688aa0.png

Custom Model Tutorial#

Run on Google Colab View source on GitHub Download notebook

Start EVA server#

We are reusing the start server notebook for launching the EVA server.

!wget -nc "https://raw.githubusercontent.com/georgia-tech-db/eva/master/tutorials/00-start-eva-server.ipynb"
%run 00-start-eva-server.ipynb
cursor = connect_to_server()
File ‘00-start-eva-server.ipynb’ already there; not retrieving.
nohup eva_server > eva.log 2>&1 &
Note: you may need to restart the kernel to use updated packages.

Download custom user-defined function (UDF), model, and video#

# Download UDF
!wget -nc https://www.dropbox.com/s/nht5lwjemmclqx3/gender.py?raw=1 -O gender.py

# Download built-in Face Detector
!wget -nc https://raw.githubusercontent.com/georgia-tech-db/eva/master/eva/udfs/face_detector.py

# Download models
!wget -nc https://www.dropbox.com/s/yyp7awyczv7esf4/gender.pth?raw=1 -O gender.pth

# Download videos
!wget -nc https://www.dropbox.com/s/f5rorxf0840ajjd/short.mp4?raw=1 -O short.mp4
File ‘gender.py’ already there; not retrieving.
File ‘face_detector.py’ already there; not retrieving.
File ‘gender.pth’ already there; not retrieving.
File ‘short.mp4’ already there; not retrieving.

Load video for analysis#

cursor.execute("DROP TABLE TIKTOK;")
response = cursor.fetch_all()
print(response)
cursor.execute("LOAD VIDEO 'short.mp4' INTO TIKTOK;")
response = cursor.fetch_all()
print(response)
cursor.execute("""SELECT id FROM TIKTOK WHERE id < 5""")
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:23 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: DROP TABLE TIKTOK;
02-08-2023 12:41:23 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: LOAD VIDEO 'short.mp4' INTO TIKTOK;
@status: ResponseStatus.SUCCESS
@batch: 
                                     0
0  Table Successfully dropped: TIKTOK
@query_time: 0.04090874316170812
02-08-2023 12:41:23 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT id FROM TIKTOK WHERE id < 5;
@status: ResponseStatus.SUCCESS
@batch: 
                            0
0  Number of loaded VIDEO: 1
@query_time: 0.0753261181525886
@status: ResponseStatus.SUCCESS
@batch: 
    tiktok.id
0          0
1          1
2          2
3          3
4          4
@query_time: 0.15638999687507749

Visualize Video#

from IPython.display import Video
Video("short.mp4", embed=True)

Create GenderCNN and FaceDetector UDFs#

cursor.execute("""DROP UDF GenderCNN;""")
response = cursor.fetch_all()
print(response)

cursor.execute("""CREATE UDF IF NOT EXISTS 
                  GenderCNN
                  INPUT (data NDARRAY UINT8(3, 224, 224)) 
                  OUTPUT (label TEXT(10)) 
                  TYPE  Classification 
                  IMPL 'gender.py';
        """)
response = cursor.fetch_all()
print(response)

cursor.execute("""CREATE UDF IF NOT EXISTS
                  FaceDetector
                  INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
                  OUTPUT (bboxes NDARRAY FLOAT32(ANYDIM, 4),
                          scores NDARRAY FLOAT32(ANYDIM))
                  TYPE  FaceDetection
                  IMPL  'face_detector.py';
        """)
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:23 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: DROP UDF GenderCNN;
02-08-2023 12:41:23 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: CREATE UDF IF NOT EXISTS                    GenderCNN                   INPUT (data NDARRAY UINT8(3, 224, 224))                    OUTPUT (label TEXT(10))                    TYPE  Classification                    IMPL 'gender.py';
@status: ResponseStatus.SUCCESS
@batch: 
                                     0
0  UDF GenderCNN successfully dropped
@query_time: 0.017597425263375044
02-08-2023 12:41:26 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: CREATE UDF IF NOT EXISTS                   FaceDetector                   INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))                   OUTPUT (bboxes NDARRAY FLOAT32(ANYDIM, 4),                           scores NDARRAY FLOAT32(ANYDIM))                   TYPE  FaceDetection                   IMPL  'face_detector.py';
@status: ResponseStatus.SUCCESS
@batch: 
                                                    0
0  UDF GenderCNN successfully added to the database.
@query_time: 2.812693185172975
@status: ResponseStatus.SUCCESS
@batch: 
                                                  0
0  UDF FaceDetector already exists, nothing added.
@query_time: 0.005886779632419348

Run Face Detector on video#

cursor.execute("""SELECT id, FaceDetector(data).bboxes 
                  FROM TIKTOK WHERE id < 10""")
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:26 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT id, FaceDetector(data).bboxes                    FROM TIKTOK WHERE id < 10;
@status: ResponseStatus.SUCCESS
@batch: 
    tiktok.id                              facedetector.bboxes
0          0      [[ 90.70622 208.44966 281.64642 457.68872]]
1          1      [[ 91.01816 208.27583 281.0808  457.91995]]
2          2  [[ 90.358536 207.3743   283.4399   457.96234 ]]
3          3  [[ 90.694214 207.56027  284.37817  458.6282  ]]
4          4  [[ 90.684944 208.98653  282.1281   460.90894 ]]
5          5      [[ 89.47423 209.38083 283.45938 460.58548]]
6          6      [[ 88.50081 208.31546 283.29172 461.8374 ]]
7          7  [[ 89.838646 206.07619  282.93942  464.7494  ]]
8          8      [[ 90.18522 224.35588 281.29733 469.89603]]
9          9      [[ 94.34447 234.13255 279.6476  468.85303]]
@query_time: 1.1357362917624414

Composing UDFs in a query#

Detect gender of the faces detected in the video by composing a set of UDFs (GenderCNN, FaceDetector, and Crop)

cursor.execute("""SELECT id, bbox, GenderCNN(Crop(data, bbox)) 
                  FROM TIKTOK JOIN LATERAL  UNNEST(FaceDetector(data)) AS Face(bbox, conf)  
                  WHERE id < 50;""")
response = cursor.fetch_all()
print(response)
02-08-2023 12:41:27 INFO  [db_api:db_api.py:_multiline_query_transformation:0084] Query: SELECT id, bbox, GenderCNN(Crop(data, bbox))                    FROM TIKTOK JOIN LATERAL  UNNEST(FaceDetector(data)) AS Face(bbox, conf)                     WHERE id < 50;
@status: ResponseStatus.SUCCESS
@batch: 
     tiktok.id                                     Face.bbox gendercnn.label
0           0   [90.70624, 208.44968, 281.64642, 457.68872]          female
1           1    [91.01816, 208.27583, 281.0808, 457.91992]          female
2           2   [90.358536, 207.3743, 283.43994, 457.96234]          female
3           3   [90.694214, 207.56027, 284.37817, 458.6282]          female
4           4   [90.684944, 208.98653, 282.1281, 460.90894]          female
5           5   [89.47423, 209.38083, 283.45938, 460.58545]          female
6           6   [88.50081, 208.31546, 283.29172, 461.83743]          female
7           7    [89.83865, 206.07619, 282.93942, 464.7494]          female
8           8    [90.18519, 224.35585, 281.2973, 469.89606]          female
9           9    [94.34447, 234.13254, 279.6476, 468.85303]          female
10         10   [94.53462, 231.94533, 280.37552, 469.60095]          female
11         11   [93.62811, 232.48692, 278.80774, 470.71677]          female
12         12    [94.5706, 232.88577, 280.19693, 469.20734]          female
13         13    [94.18951, 226.97621, 281.2876, 468.45206]          female
14         14  [93.782196, 225.13283, 281.57428, 469.45212]          female
15         15   [92.72016, 222.57924, 281.52145, 471.10934]          female
16         16   [91.76486, 220.04295, 282.50293, 472.32422]          female
17         17     [91.180595, 219.383, 282.56488, 472.7332]          female
18         18   [91.45817, 224.86871, 280.40808, 471.70938]          female
19         19   [91.75995, 229.18222, 278.51724, 470.76422]          female
20         20   [90.86253, 228.00526, 277.29852, 469.97522]          female
21         21    [86.87827, 220.22151, 278.28793, 474.4017]          female
22         22    [86.17063, 220.2833, 277.47998, 473.55865]          female
23         23     [87.24197, 223.13232, 276.06287, 472.329]          female
24         24      [85.91275, 221.83832, 276.25464, 473.94]          female
25         25   [86.46627, 223.12836, 276.40482, 473.91782]          female
26         26   [87.90794, 222.48033, 277.04114, 472.63095]          female
27         27   [87.26338, 222.81485, 277.85394, 472.65347]          female
28         28   [89.96093, 222.24153, 278.90247, 471.04422]          female
29         29   [92.93111, 221.20155, 279.88617, 468.67712]          female
30         30     [95.86487, 222.3673, 280.08804, 468.6138]          female
31         31   [97.352905, 222.22885, 282.08548, 470.4421]          female
32         32     [98.23183, 219.5644, 286.48532, 472.6992]          female
33         33     [99.83777, 223.5303, 286.11328, 472.0794]          female
34         34    [98.918564, 224.1231, 287.2161, 471.09912]          female
35         35     [99.63552, 223.40047, 288.5786, 471.4875]          female
36         36   [102.69069, 223.99548, 288.7781, 471.87228]          female
37         37    [101.27347, 223.9684, 290.67612, 472.4894]          female
38         38  [100.11153, 213.72807, 292.49606, 472.19666]          female
39         39   [98.218315, 210.41318, 293.1625, 473.68002]          female
40         40    [98.33731, 211.89398, 292.41443, 472.6961]          female
41         41    [97.3301, 211.76442, 291.71097, 472.21356]          female
42         42    [96.33257, 211.30165, 291.4119, 472.43082]          female
43         43  [96.392715, 212.37268, 290.87326, 471.60538]          female
44         44    [96.0622, 212.98589, 289.65698, 470.89297]          female
45         45  [94.273346, 213.00847, 289.17032, 470.96674]          female
46         46    [94.76167, 213.07986, 289.17407, 470.6863]          female
47         47   [94.89173, 214.84763, 288.04193, 470.53317]          female
48         48    [95.12451, 217.9163, 285.73044, 470.59174]          female
49         49  [95.080414, 221.98688, 285.05048, 471.19278]          female
@query_time: 1.30877415696159

Visualize Output#

import cv2
from matplotlib import pyplot as plt

def annotate_video(detections, input_video_path, output_video_path):
    color=(207, 248, 64)
    thickness=4

    vcap = cv2.VideoCapture(input_video_path)
    width = int(vcap.get(3))
    height = int(vcap.get(4))
    fps = vcap.get(5)
    fourcc = cv2.VideoWriter_fourcc(*'MP4V') #codec
    video=cv2.VideoWriter(output_video_path, fourcc, fps, (width,height))

    frame_id = 0
    # Capture frame-by-frame
    ret, frame = vcap.read()  # ret = 1 if the video is captured; frame is the image

    while ret:
        df = detections
        df = df[['Face.bbox', 'gendercnn.label']][df['tiktok.id'] == frame_id]
        
        if df.size:
            for bbox, label in df.values:
                x1, y1, x2, y2 = bbox
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                frame=cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness) # object bbox
                cv2.putText(frame, str(label), (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, thickness-1) # object label
            
            video.write(frame)
            # Show every fifth frame
            if frame_id % 5 == 0:
                plt.imshow(frame)
                plt.show()

        if frame_id == 50:
            return

        frame_id+=1
        ret, frame = vcap.read()

    video.release()
    vcap.release()
#!pip install ipywidgets
from ipywidgets import Video
input_path = 'short.mp4'
output_path = 'annotated_short.mp4'

dataframe = response.batch.frames
annotate_video(dataframe, input_path, output_path)
_images/e78f4c48bdbb6fd686d6bbfa32520519a2df3bd7a5d4c63c35b53318c29b90d3.png _images/586c7b103200559fa76f3718e0d83bc785be00e76048c39357d15b2e20a57d13.png _images/9d9dafe7e354bdd2186bb2b0cc7995164a7a9f3ed0031d33e0165729d5fe2ac8.png _images/1ec83780299af661b836e7f67f6bab530696d1d3b1148416ec2aa1b287d2e345.png _images/cc6dfff5f0e79363bf23f88e718819411ae00543f129219cd6294113f1d07062.png _images/ffdf1cb67a492284fe712059fe56b6696c3d4c378688f750526695b578533cdc.png _images/b9cc94ffc7d110d2a0acbf7e0da6999c5c6a47cbdc8e6f387c4aecfd40556070.png _images/ab60550a621337216d907a0561902ebae0d788cab79fee3e7eefe04d90b76830.png _images/b396ee2dfdb7d6664a4262aad8089ce58bc77f7bdaffd579ed3f977d308212c0.png _images/bc9a268728399be4b80c4313c184026eb6fd918a62d2459b2ce3db90d5b44213.png

EVA Query Language Reference#

EVA Query Language (EVAQL) is derived from SQL. It is tailored for video analytics. EVAQL allows users to invoke deep learning models in the form of user-defined functions (UDFs).

Here is an example where we first define a UDF wrapping around the FastRCNN object detection model. We then issue a query with this function to detect objects.

--- Create an user-defined function wrapping around FastRCNN ObjectDetector
CREATE UDF IF NOT EXISTS FastRCNNObjectDetector
INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
OUTPUT (labels NDARRAY STR(ANYDIM), bboxes NDARRAY FLOAT32(ANYDIM, 4),
        scores NDARRAY FLOAT32(ANYDIM))
TYPE  Classification
IMPL  'eva/udfs/fastrcnn_object_detector.py';

--- Use the function to retrieve frames that contain more than 3 cars
SELECT id FROM MyVideo
WHERE ArrayCount(FastRCNNObjectDetector(data).label, 'car') > 3
ORDER BY id;

This page presents a list of all the EVAQL statements that you can leverage in your Jupyter Notebooks.

LOAD#

LOAD VIDEO FROM FILESYSTEM#

LOAD VIDEO 'test_video.mp4' INTO MyVideo;
  • test_video.mp4 is the location of the video file in the filesystem on the client.

  • MyVideo is the name of the table in EVA where this video is loaded. Subsequent queries over the video must refer to this table name.

When a video is loaded, there is no need to specify the schema for the video table. EVA automatically generates the following schema with two columns: id and data, that correspond to the frame id and frame content (in Numpy format).

LOAD VIDEO FROM S3#

LOAD VIDEO 's3://bucket/dummy.avi' INTO MyVideo;
LOAD VIDEO 's3://bucket/eva_videos/*.mp4' INTO MyVideos;

The videos are downloaded to a directory that can be configured in the EVA configuration file under storage:s3_download_dir. The default directory is ~/.eva/s3_downloads.

LOAD CSV#

To LOAD a CSV file, we need to first specify the table schema.

CREATE TABLE IF NOT EXISTS MyCSV (
                id INTEGER UNIQUE,
                frame_id INTEGER,
                video_id INTEGER,
                dataset_name TEXT(30),
                label TEXT(30),
                bbox NDARRAY FLOAT32(4),
                object_id INTEGER
            );

LOAD CSV 'test_metadata.csv' INTO MyCSV;
  • test_metadata.csv needs to be loaded onto the server using LOAD statement.

  • The CSV file may contain additional columns. EVA will only load the columns listed in the defined schema.

SELECT#

SELECT FRAMES WITH PREDICATES#

Search for frames with a car

SELECT id, frame
FROM MyVideo
WHERE ['car'] <@ FastRCNNObjectDetector(frame).labels
ORDER BY id;

Search frames with a pedestrian and a car

SELECT id, frame
FROM MyVideo
WHERE ['pedestrian', 'car'] <@ FastRCNNObjectDetector(frame).labels;

Search for frames containing greater than 3 cars

SELECT id FROM MyVideo
WHERE ArrayCount(FastRCNNObjectDetector(data).label, 'car') > 3
ORDER BY id;

SELECT WITH MULTIPLE UDFS#

Compose multiple user-defined functions in a single query to construct semantically complex queries.

SELECT id, bbox, EmotionDetector(Crop(data, bbox))
FROM HAPPY JOIN LATERAL UNNEST(FaceDetector(data)) AS Face(bbox, conf)
WHERE id < 15;

EXPLAIN#

EXPLAIN QUERY#

List the query plan associated with a EVAQL query

Append EXPLAIN in front of the query to retrieve the plan.

EXPLAIN SELECT CLASS FROM TAIPAI;

SHOW#

SHOW UDFS#

List the registered user-defined functions

SHOW UDFS;

CREATE#

CREATE TABLE#

To create a table, specify the schema of the table.

CREATE TABLE IF NOT EXISTS MyCSV (
                id INTEGER UNIQUE,
                frame_id INTEGER,
                video_id INTEGER,
                dataset_name TEXT(30),
                label TEXT(30),
                bbox NDARRAY FLOAT32(4),
                object_id INTEGER
 );

CREATE UDF#

To register an user-defined function, specify the implementation details of the UDF.

CREATE UDF IF NOT EXISTS FastRCNNObjectDetector
INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
OUTPUT (labels NDARRAY STR(ANYDIM), bboxes NDARRAY FLOAT32(ANYDIM, 4),
        scores NDARRAY FLOAT32(ANYDIM))
TYPE  Classification
IMPL  'eva/udfs/fastrcnn_object_detector.py';

CREATE MATERIALIZED VIEW#

To create a view with materialized results – like the outputs of deep learning model, use the following template:

CREATE MATERIALIZED VIEW UADETRAC_FastRCNN (id, labels) AS
SELECT id, FastRCNNObjectDetector(frame).labels
FROM UADETRAC
WHERE id<5;

DROP#

DROP TABLE#

DROP TABLE DETRACVideo;

DROP UDF#

DROP UDF FastRCNNObjectDetector;

INSERT#

TABLE MyVideo#

MyVideo Table schema

CREATE TABLE MyVideo
(id INTEGER,
data NDARRAY FLOAT32(ANYDIM));

INSERT INTO TABLE#

Insert a tuple into a table.

INSERT INTO MyVideo (id, data) VALUES
    (1,
        [[[40, 40, 40] , [40, 40, 40]],
         [[40, 40, 40] , [40, 40, 40]]]);

DELETE#

DELETE INTO TABLE#

Delete a tuple from a table based on a predicate.

DELETE FROM MyVideo WHERE id<10;

RENAME#

RENAME TABLE#

RENAME TABLE MyVideo TO MyVideo1;

User-Defined Functions#

This section provides an overview of how you can create and use a custom user-defined function (UDF) in your queries. For example, you could write an UDF that wraps around a PyTorch model.

Part 1: Writing a custom UDF#

Illustrative UDF implementation

During each step, use this UDF implementation as a reference.

  1. Create a new file under udfs/ folder and give it a descriptive name. eg: fastrcnn_object_detector.py, midas_depth_estimator.py.

Note

UDFs packaged along with EVA are located inside the udfs folder.

  1. Create a Python class that inherits from PytorchClassifierAbstractUDF.

  • The PytorchClassifierAbstractUDF is a parent class that defines and implements standard methods for model inference.

  • _get_predictions() - an abstract method that needs to be implemented in your child class.

  • classify() - A method that receives the frames and calls the _get_predictions() implemented in your child class. Based on GPU batch size, it also decides whether to split frames into chunks and performs the accumulation.

  • Additionally, it contains methods that help in:

    • Moving tensors to GPU

    • Converting tensors to numpy arrays.

    • Defining the forward() function that gets called when the model is invoked.

    • Basic transformations.

You can however choose to override these methods depending on your requirements.

  1. A typical UDF class has the following components:

  • __init__() constructor:

    • Define variables here that will be required across all methods.

    • Model loading happens here. You can choose to load custom models or models from torch.

      • Example of loading a custom model:
        custom_model_path = os.path.join(EVA_DIR, "data", "models", "vehicle_make_predictor", "car_recognition.pt")
        self.car_make_model = CarRecognitionModel()
        self.car_make_model.load_state_dict(torch.load(custom_model_path))
        self.car_make_model.eval()
        
      • Example of loading a torch model:
        self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
        self.model.eval()
        
  • labels() method:

    • This should return a list of strings that your model aims to target.

    • The index of the list is the value predicted by your model.

  • _get_predictions() method:

    • This is where all your model inference logic goes.

    • While doing the computations, keep in mind that each call of this method is with a batch of frames.

    • Output from each invoke of the model needs to be appended to a dataframe and returned as follows:
      predictions = self.model(frames)
      outcome = pd.DataFrame()
      for prediction in predictions:
      
          ## YOUR INFERENCE LOGIC
      
          # column names depend on your implementation
          outcome = outcome.append(
              {
                  "labels": pred_class,
                  "scores": pred_score,
                  "boxes": pred_boxes
              },
              ignore_index=True)
      

In case you have any other functional requirements (defining custom transformations etc.) you can choose to add more methods. Make sure each method you write is clear, concise and well-documented.


Part 2: Registering and using the UDF in queries#

Now that you have implemented your UDF we need to register it in EVA. You can then use the function in any query.

  1. Register the UDF with a query that follows this template:

    `CREATE UDF [ IF NOT EXISTS ] <name>

    INPUT ( [ <arg_name> <arg_data_type> ] [ , … ] ) OUTPUT ( [ <result_name> <result_data_type> ] [ , … ] ) TYPE <udf_type_name> IMPL ‘<path_to_implementation>’`

    where,

    • <name> - specifies the unique identifier for the UDF.

    • [ <arg_name> <arg_data_type> ] [ , … ] - specifies the name and data type of the udf input arguments. Name is kept for consistency (ignored by eva right now), arguments data type is required. ANYDIM means the shape is inferred at runtime.

    • [ <result_name> <result_data_type> ] [ , … ] - specifies the name and data type of the udf output arguments. Users can access a specific output of the UDF similar to access a column of a table. Eg. <name>.<result_name>

    • <udf_type_name> - specifies the identifier for the type of the UDF. UDFs of the same type are assumed to be interchangeable. They should all have identical input and output arguments. For example, object classification can be one type.

    • <path_to_implementation> - specifies the path to the implementation class for the UDF

    Here, is an example query that registers a UDF that wraps around the ‘FastRCNNObjectDetector’ model that performs Object Detection.

    CREATE UDF IF NOT EXISTS FastRCNNObjectDetector
    INPUT  (frame NDARRAY UINT8(3, ANYDIM, ANYDIM))
    OUTPUT (labels NDARRAY STR(ANYDIM), bboxes NDARRAY FLOAT32(ANYDIM, 4),
            scores NDARRAY FLOAT32(ANYDIM))
    TYPE  Classification
    IMPL  'eva/udfs/fastrcnn_object_detector.py';
    
    • Input is a frame of type NDARRAY with shape (3, ANYDIM, ANYDIM). 3 channels and any width or height.

    • We return 3 variables for this UDF:
      • labels: Predicted label

      • bboxes: Bounding box of this object (rectangle coordinates)

      • scores: Confidence scores for this prediction

    A status of 0 in the response denotes the successful registration of this UDF.

  1. Now you can execute your UDF on any video:

SELECT id, Unnest(FastRCNNObjectDetector(data)) FROM MyVideo;
  1. You can drop the UDF when you no longer need it.

DROP UDF IF EXISTS FastRCNNObjectDetector;

Configure GPU#

  1. Queries in EVA use deep learning models that run much faster on a GPU as opposed to a CPU. If your workstation has a GPU, you can configure EVA to use the GPU during query execution. Use the following command to check your hardware capabilities:

ubuntu-drivers devices
nvidia-smi

A valid output from the command indicates that your GPU is configured and ready to use. If not, you will need to install the appropriate GPU driver. This page provides a step-by-step guide on installing and configuring the GPU driver in the Ubuntu Operating System.

  • When installing an NVIDIA driver, ensure that the version of the GPU driver is correct to avoid compatibiility issues.

  • When installing cuDNN, you will need to create an account and ensure that you get the correct deb files for your operating system and architecture.

  1. You can run the following code in a Jupyter notebook to verify that your GPU is detected by PyTorch:

import torch
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

Output of cuda:0 indicates the presence of a GPU. 0 indicates the index of the GPU in system. If you have multiple GPUs on your workstation, the index must be updated accordingly.

  1. Now configure the executor section in eva.yml as follows:

executor:
    gpus: {'127.0.1.1': [0]}

Here, 127.0.1.1 is the loopback address on which the EVA server is running. 0 refers to the GPU index to be used.

EVA Internals#

Path of a Query#

The following code represents a sequence of operations that can be used to execute a query in a evaql database. found in eva/server/command_handler.py

Parse the query using the Parser() function provided by the evaql library. The result of this step will be a parsed representation of the query in the form of an abstract syntax tree (AST).

stmt = Parser().parse(query)[0]

Bind the parsed AST to a statement context using the StatementBinder() function. This step resolves references to schema objects and performs other semantic checks on the query.

StatementBinder(StatementBinderContext()).bind(stmt)

Convert the bound AST to a logical plan using the StatementToPlanConvertor() function. This step generates a logical plan that specifies the sequence of operations needed to execute the query.

l_plan = StatementToPlanConvertor().visit(stmt)

Generate a physical plan from the logical plan using the plan_generator.build() function. This step optimizes the logical plan and generates a physical plan that specifies how the query will be executed.

p_plan = plan_generator.build(l_plan)

Execute the physical plan using the PlanExecutor() function. This step retrieves the data from the database and produces the final output of the query.

output = PlanExecutor(p_plan).execute_plan()

Overall, this sequence of operations represents the path of query execution in a evaql database, from parsing the query to producing the final output.

Topics#

Catalog#

Catalog Manager#

Explanation for developers on how to use the eva catalog_manager.

CatalogManager class that provides a set of services to interact with a database that stores metadata about tables, columns, and user-defined functions (UDFs). Information like what is the data type in a certain column in a table, type of a table, its name, etc.. It contains functions to get, insert and delete catalog entries for Tables, UDFs, UDF IOs, Columns and Indexes.

This data is stored in the eva_catalog.db file which can be found in ~/.eva/<version>/ folder.

Catalog manager currently has 5 services in it:

TableCatalogService()
ColumnCatalogService()
UdfCatalogService()
UdfIOCatalogService()
IndexCatalogService()
Catalog Services#

This class provides functionality related to a table catalog, including inserting, getting, deleting, and renaming table entries, as well as retrieving all entries. e.g. the TableCatalogService contains code to get, insert and delete a table.

Catalog Models#

These contain the data model that is used by the catalog services. Each model represents a table in the underlying database.

Catalog Diagram

Contributing#

We welcome all kinds of contributions to EVA.

Setting up the Development Environment#

First, you will need to checkout the repository from GitHub and build EVA from the source. Follow the following instructions to build EVA locally. We recommend using a virtual environment and the pip package manager.

git clone https://github.com/georgia-tech-db/eva.git && cd eva
python3 -m venv test_eva_db       # create a virtual environment
source test_eva_db/bin/activate   # activate the virtual environment
pip install --upgrade pip         # upgrade pip
pip install -e ".[dev]"           # build and install the EVA package
bash script/test/test.sh          # run the eva EVA suite

After installing the package locally, you can make changes and run the test cases to check their impact.

pip install .         # reinstall EVA package to include local changes
pkill -9 eva_server   # kill running EVA server (if any)
eva_server&           # launch EVA server with newly installed package

Testing#

Check if your local changes broke any unit or integration tests by running the following script:

bash script/test/test.sh

If you want to run a specific test file, use the following command.

python -m pytest test/integration_tests/test_select_executor.py

Use the following command to run a specific test case within a specific test file.

python -m pytest test/integration_tests/test_select_executor.py -k 'test_should_load_and_select_in_table'

Submitting a Contribution#

Follow the following steps to contribute to EVA:

  • Merge the most recent changes from the master branch

git remote add origin git@github.com:georgia-tech-db/eva.git
git pull . origin/master
  • Run the test script to ensure that all the test cases pass.

  • If you are adding a new EVAQL command, add an illustrative example usage in the documentation.

  • Run the following command to ensure that code is properly formatted.

python script/formatting/formatter.py

Code Style#

We use the black code style for formatting the Python code. For docstrings and documentation, we use Google Pydoc format.

def function_with_types_in_docstring(param1, param2) -> bool:
    """Example function with types documented in the docstring.

    Additional explanatory text can be added in paragraphs.

    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.

    Returns:
        bool: The return value. True for success, False otherwise.

Debugging#

We recommend using Visual Studio Code with a debugger for developing EVA. Here are the steps for setting up the development environment:

  1. Install the Python extension in Visual Studio Code.

  2. Install the Python Test Explorer extension.

3. Follow these instructions to run a particular test case from the file: Getting started.

_images/eva-debug-1.jpg _images/eva-debug-2.jpg

Architecture Diagram#

_images/eva-arch.png

Troubleshooting#

If the test suite fails with a PermissionDenied exception, update the path_prefix attribute under the storage section in the EVA configuration file (~/.eva/eva.yml) to a directory where you have write privileges.

Debugging#

We recommend Visual Studio Code with a debugger for debugging EVA. This tutorial presents a detailed step-by-step process of using the debugger.

Setup debugger#

  1. Install the Python extension in Visual Studio Code.

  2. Install the Python Test Explorer extension.

3. Follow these instructions to run a particular test case from the file: Getting started.

_images/eva-debug-1.jpg _images/eva-debug-2.jpg

Alternative: Manually Setup Debugger for EVA#

When you press the debug icon, you will be given an option to create a launch.json file.

While creating the JSON file, you will be prompted to select the environment to be used. Select the python environment from the command palette at the top. If the Python environment cannot be seen in the drop-down menu, try installing the python extension, and repeat the process.

Once you select the python environment, a launch.json file will be created with the default configurations set to debug a simple .py file.

More configurations can further be added to the file, to modify the environment variables or to debug an entire folder or workspace directory. Use the following configuration in the JSON file:

{

    "version": "0.2.0",
    "configurations": [

       {
            "name": "Python: test_pytorch.py",
            "type": "python",
            "request": "launch",
            "program": "${workspaceFolder}/test/integration_tests/test_pytorch.py",
            "console": "integratedTerminal",
            "cwd": "${workspaceFolder}",
            "env": {"PYTHONPATH": "${workspaceRoot}"}
        }

    ]
}

You can modify the fields of the above JSON file as follows:

name: It is the reader-friendly name to appear in the Debug launch configuration dropdown.

type: The type of debugger to use for this launch configuration.

program: The executable or file to run when launching the debugger. In the above example, test_integration.py will be executed by the debugger.

env: Here you specify the environment variables. In the above example, the path for the conda environment for Eva has been specified.

Using these configuration variables, you can run the debugger both locally as well as on a remote server.

Extending EVA#

This document details the steps involved in adding support for a new operator (or command) in EVA. We illustrate the process using a DDL command.

Command Handler#

An input query string is handled by Parser, StatementTOPlanConvertor, PlanGenerator, and PlanExecutor. We discuss each part separately.

def execute_query(query) -> Iterator[Batch]:
    """
    Execute the query and return a result generator.
    """
    #1. parser
    stmt = Parser().parse(query)[0]
    #2. statement to logical plan
    l_plan = StatementToPlanConvertor().visit(stmt)
    #3. logical to physical plan
    p_plan = PlanGenerator().build(l_plan)
    #4. parser
    return PlanExecutor(p_plan).execute_plan()

1. Parser#

The parser firstly generate syntax tree from the input string, and then tansform syntax tree into statement.

The first part of Parser is build from a LARK grammar file.

parser/eva#

  • eva.lark - add keywords(eg. CREATE, TABLE) under Common Keywords

    • Add new grammar rule (eg. create_table)

    • Write a new grammar, for example:

    create_table: CREATE TABLE if_not_exists? table_name create_definitions
    

The second part of parser is implemented as parser visitor.

parser/parser_visitor#

  • _[cmd]_statement.py - eg. class CreateTable(evaql_parserVisitor)

    • Write functions to transform each input data from syntax tree to desired type. (eg. transform Column information into a list of ColumnDefinition)

    • Write a function to construct [cmd]Statement and return it.

  • __init__.py - import _[cmd]_statement.py and add its class to ParserVisitor’s parent class.

from src.parser.parser_visitor._create_statement import CenameTable
class ParserVisitor(CommonClauses, CreateTable, Expressions,
                    Functions, Insert, Select, TableSources,
                    Load, Upload):

parser/#

  • [cmd]_statement.py - class [cmd]Statement. Its constructor is called in _[cmd]_statement.py

  • types.py - register new StatementType

2. Statement To Plan Convertor#

The part transforms the statement into corresponding logical plan.

Optimizer#

  • operators.py

    • Define class Logical[cmd], which is the logical node for the specific type of command.

    class LogicalCreate(Operator):
        def __init__(self, video: TableRef, column_list: List[DataFrameColumn], if_not_exists: bool = False, children=None):
        super().__init__(OperatorType.LOGICALCREATE, children)
        self._video = video
        self._column_list = column_list
        self._if_not_exists = if_not_exists
        # ...
    
    • Register new operator type to class OperatorType, Notice that must add it before LOGICALDELIMITER !!!

  • statement_to_opr_convertor.py

    • import resource

    from src.optimizer.operators import LogicalCreate
    from src.parser.rename_statement import CreateTableStatement
    
    • implement visit_[cmd]() function, which converts statement to operator

    # May need to convert the statement into another data type.
    # The new data type is usable for excuting command.
    # For example, column_list -> column_metadata_list
    
    def visit_create(self, statement: AbstractStatement):
        video_ref = statement.table_ref
        if video_ref is None:
            LoggingManager().log("Missing Table Name In Create Statement",
                                 LoggingLevel.ERROR)
    
        if_not_exists = statement.if_not_exists
        column_metadata_list = create_column_metadata(statement.column_list)
    
        create_opr = LogicalCreate(
            video_ref, column_metadata_list, if_not_exists)
        self._plan = create_opr
    
    • modify visit function to call the right visit_[cmd] funciton

    def visit(self, statement: AbstractStatement):
        if isinstance(statement, SelectStatement):
            self.visit_select(statement)
        #...
        elif isinstance(statement, CreateTableStatement):
            self.visit_create(statement)
        return self._plan
    

3. Plan Generator#

The part transformed logical plan to physical plan. The modified files are stored under Optimizer and Planner folders.

planner/#

  • [cmd]_plan.py - class [cmd]Plan, which stored information required for rename table.

class CreatePlan(AbstractPlan):
    def __init__(self, video_ref: TableRef,
                 column_list: List[DataFrameColumn],
                 if_not_exists: bool = False):
        super().__init__(PlanOprType.CREATE)
        self._video_ref = video_ref
        self._column_list = column_list
        self._if_not_exists = if_not_exists
    #...
  • types.py - register new plan operator type to PlanOprType

optimizer/rules#

  • rules.py-

    • Import operators

    • Register new ruletype to RuleType and Promise (place it before IMPLEMENTATION_DELIMETER !!)

    • implement class Logical[cmd]ToPhysical, its memeber function apply() will construct a corresbonding[cmd]Plan object.

    class LogicalCreateToPhysical(Rule):
        def __init__(self):
        pattern = Pattern(OperatorType.LOGICALCREATE)
        super().__init__(RuleType.LOGICAL_CREATE_TO_PHYSICAL, pattern)
    
    def promise(self):
        return Promise.LOGICAL_CREATE_TO_PHYSICAL
    
    def check(self, before: Operator, context: OptimizerContext):
        return True
    
    def apply(self, before: LogicalCreate, context: OptimizerContext):
        after = CreatePlan(before.video, before.column_list, before.if_not_exists)
        return after
    

4. Plan Executor#

PlanExecutor uses data stored in physical plan to run the command.

executor/#

  • [cmd]_executor.py - implement an executor that make changes in catalog, metadata, or storage engine to run the command.

    • May need to create helper function in CatalogManager, DatasetService, DataFrameMetadata, etc.

    class CreateExecutor(AbstractExecutor):
        def exec(self):
            if (self.node.if_not_exists):
                # check catalog if we already have this table
                return
    
            table_name = self.node.video_ref.table_info.table_name
            file_url = str(generate_file_path(table_name))
            metadata = CatalogManager().create_metadata(table_name, file_url, self.node.column_list)
    
            StorageEngine.create(table=metadata)
    

Additional Notes#

Key data structures in EVA:

  • Catalog: Records DataFrameMetadata for all tables.

    • data stored in DataFrameMetadata: name, file_url, identifier_id, schema

      • file_url - used to access the real table in storage engine.

    • For the RENAME table command, we use the old_table_name to access the corresponing entry in metadata table, and the modified name of the table.

  • Storage Engine:

    • API is defined in src/storage, currently only supports create, read, write.

EVA Release Guide#

Part 1: Before You Start#

Make sure you have PyPI account with maintainer access to the EVA project. Create a .pypirc in your home directory. It should look like this:

[distutils]
index-servers =
pypi
pypitest

[pypi]
username=YOUR_USERNAME
password=YOUR_PASSWORD

Then run chmod 600 ./.pypirc so that only you can read/write the file.

Part 2: Release Steps#

  1. Ensure that you’re in the top-level eva directory.

  2. Ensure that your branch is in sync with the master branch:

$ git pull origin master
  1. Add a new entry in the Changelog for the release.

##  [0.0.6]
### [Breaking Changes]
### [Added]
### [Changed]
### [Deprecated]
### [Removed]

Make sure CHANGELOG.md is up to date for the release: compare against PRs merged since the last release.

  1. Update version to, e.g. 0.0.6 (remove the +dev label) in eva/version.py.

  2. Commit these changes and create a PR:

git checkout -b release-v0.0.6
git add . -u
git commit -m "[RELEASE]: v0.0.6"
git push --set-upstream origin release-v0.0.6
  1. Once the PR is approved, merge it and pull master locally.

  2. Tag the release:

git tag -a v0.0.6 -m "v0.0.6 release"
git push origin v0.0.6
  1. Build the source and wheel distributions:

rm -rf dist build  # clean old builds & distributions
python3 setup.py sdist  # create a source distribution
python3 setup.py bdist_wheel  # create a universal wheel
  1. Check that everything looks correct by installing the wheel locally and checking the version:

python3 -m venv test_evadb  # create a virtualenv for testing
source test_evadb/bin/activate  # activate virtualenv
python3 -m pip install dist/evadb-0.9.1-py3-none-any.whl
python3 -c "import eva; print(eva.__version__)"
  1. Publish to PyPI

pip install twine  # if not installed
twine upload dist/* -r pypi
  1. A PR is automatically submitted (this will take a few hours) on [conda-forge/eva-feedstock](https://github.com/conda-forge/eva-feedstock) to update the version. * A maintainer needs to accept and merge those changes.

  2. Create a new release on Github. * Input the recently-created Tag Version: v0.0.6 * Copy the release notes in CHANGELOG.md to the GitHub tag. * Attach the resulting binaries in (dist/evadb-x.x.x.*) to the release. * Publish the release.

  3. Update version to, e.g. 0.9.1+dev in eva/version.py.

  4. Add a new changelog entry for the unreleased version in CHANGELOG.md:

##  [Unreleased]
### [Breaking Changes]
### [Added]
### [Changed]
### [Deprecated]
### [Removed]
  1. Commit these changes and create a PR:

git checkout -b bump-v0.9.1+dev
git add . -u
git commit -m "[BUMP]: v0.9.1+dev"
git push --set-upstream origin bump-v0.9.1+dev
  1. Add the new tag to the EVA project on ReadTheDocs,

    • Trigger a build for main to pull new tags.

    • Go to the Versions tab, and Activate the new tag.

    • Go to Admin/Advanced to set this tag as the new default version.

    • In Overview, make sure a build is triggered:
      • For the tag v0.9.1

      • For latest

Credits: Snorkel

Packaging#

This section describes practices to follow when packaging your own models or datasets to be used along with EVA.

Models#

Please follow the following steps to package models:

  • Create a folder with a descriptive name. This folder name will be used by the UDF that is invoking your model.

  • Place all files used by the UDF inside this folder. These are typically:
    • Model weights (The .pt files that contain the actual weights)

    • Model architectures (The .pt files that contain model architecture information)

    • Label files (Extra files that are used in the process of model inference for outputting labels.)

    • Other config files (Any other config files required for model inference)

  • Zip this folder.

  • Upload the zipped folder to this link inside the models folder.

Datasets#

Please follow the following steps to package datasets:

  • Create a folder for your dataset and give it a descriptive name.

  • This dataset folder should contain 2 sub-folders named ‘info’ and ‘videos’. For each video entry in the videos folder, there should be a corresponding CSV file in the info folder with the same name. The structure should look like:

    _images/packaging_folderstructure.png
  • The videos folder should contain the raw videos in a standard format like mp4 or mov.

  • The info folder should contain the meta information corresponding to each video in CSV format. Each row of this CSV file should correspond to 1 unique object in a given frame. Please make sure the columns in your CSV file exactly match to these names. Here is a snapshot of a sample CSV file:

    _images/packaging_metafile.png
    The columns represent the following:
    • id - (Integer) Auto incrementing index that is unique across all files (Since the CSV files are written to the same meta table, we want it to be unique across all files)

    • frame_id - (Integer) id of the frame this row corresponds to.

    • video_id - (Integer) id of the video this file corresponds to.

    • dataset_name - (String) Name of the dataset (should match the folder name)

    • label - (String) label of the object this row corresponds to.

    • bbox - (String) comma separated float values representing x1, y1, x2, y2 (top left and bottom right) coordinates of the bounding box

    • object_id - (Integer) unique id for the object corresponding to this row.

  • Zip this folder.

  • Upload the zipped folder to this link inside the datasets folder.

Note: In the future, will provide utility scripts along with EVA to download models and datasets easily and place them in the appropriate locations.