Structured Data Source Integration#

This document details steps involved in adding a new structured data source integration in EvaDB.

Example Data Source Integration In EvaDB#

Create Data Source Handler#

1. Create a new directory at evadb/third_party/databases/#

Note

The directory name is also the engine name used in the CREATE DATABASE mydb_source WITH ENGINE = “…”. In this document, we use mydb as the example data source we want to integrate in EvaDB.

The directory should contain three files:

  • __init__.py

  • requirements.txt

  • mydb_handler.py

The __init__.py can contain copyright information. The requirements.txt contains the extra python libraries that need to be installed via pip for the mydb data source.

Note

EvaDB will only install a data source’s specific dependency libraries when a connection to the data source is created by the user via, e.g., CREATE DATABASE mydb_source WITH ENGINE = “mydb”;.

2. Implement the data source handler#

In mydb_handler.py, you need to implement the DBHandler declared at evadb/third_party/databases/types.py. There are 7 functions that you need to implement:

class MydbHandler(DBHandler):

     def __init__(self, name: str, **kwargs):
             ...
     def connect(self):
             ...
     def disconnect(self):
             ...
     def check_connection(self) -> DBHandlerStatus:
             ...
     def get_tables(self) -> DBHandlerResponse:
             ...
     def get_columns(self, table_name: str) -> DBHandlerResponse:
             ...
     def execute_native_query(self, query_string: str) -> DBHandlerResponse:
             ...

The get_tables should retrieve the list of tables from the data source. The get_columns should retrieve the columns of a specified table from the database. The execute_native_query specifies how to execute the query through the data source’s engine. For more details, please check the function signature and documentation at evadb/third_party/databases/types.py.

You can get the data source’s configuration parameters from __init__(self, name: str, **kwargs). Below is an example:

def __init__(self, name: str, **kwargs):
     super().__init__(name)
     self.host = kwargs.get("host")
     self.port = kwargs.get("port")
     self.user = kwargs.get("user")
     self.password = kwargs.get("password")

Note

Those parameters will be specified when the user creates a connection to the data source: CREATE DATABASE mydb_source WITH ENGINE = “mydb”, PARAMETERS = {“host”: “localhost”, “port”: “5432”, “user”: “eva”, “password”: “password”};.

You can check the PostgreSQL’s handler example at evadb/third_party/databases/postgres/postgres_handler.py for ideas.

Register the Data Source Handler#

Add your created data source handler in get_database_handler function at evadb/third_party/databases/interface.py. Below is an example of registering the created mydb data source:

...
elif engine == "mydb":
     return mod.MydbHandler(engine, **kwargs)
...

Add the Data Source in Documentation#

Add your new data source into Data Sources section for reference.