Extending EvaDB#
This document details the steps involved in adding support for a new operator (or command) in EvaDB. We illustrate the process using a DDL command.
Command Handler#
An input query string is handled by Parser, StatementTOPlanConverter, PlanGenerator, and PlanExecutor. We discuss each part separately.
def execute_query(query) -> Iterator[Batch]:
"""
Execute the query and return a result generator.
"""
#1. parser
stmt = Parser().parse(query)[0]
#2. statement to logical plan
l_plan = StatementToPlanConverter().visit(stmt)
#3. logical to physical plan
p_plan = PlanGenerator().build(l_plan)
#4. parser
return PlanExecutor(p_plan).execute_plan()
1. Parser#
The parser firstly generate syntax tree from the input string, and then transform syntax tree into statement.
The first part of Parser is build from a LARK grammar file.
parser/evadb#
evadb.lark
- add keywords(eg. CREATE, TABLE) under Common KeywordsAdd new grammar rule (eg. create_table)
Write a new grammar, for example:
create_table: CREATE TABLE if_not_exists? table_name create_definitions
The second part of parser is implemented as parser visitor.
parser/lark_visitor#
_[cmd]_statement.py
- eg. class CreateTable(evaql_parserVisitor)Write functions to transform each input data from syntax tree to desired type. (eg. transform Column information into a list of ColumnDefinition)
Write a function to construct [cmd]Statement and return it.
__init__.py
- import_[cmd]_statement.py
and add its class toParserVisitor
’s parent class.
from src.parser.parser_visitor._create_statement import CreateTable
class ParserVisitor(CommonClauses, CreateTable, Expressions,
Functions, Insert, Select, TableSources,
Load, Upload):
parser/#
[cmd]_statement.py
- class [cmd]Statement. Its constructor is called in_[cmd]_statement.py
types.py
- register new StatementType
2. Statement To Plan Converter#
The part transforms the statement into corresponding logical plan.
Optimizer#
operators.py
Define class Logical[cmd], which is the logical node for the specific type of command.
class LogicalCreate(Operator): def __init__(self, video: TableRef, column_list: List[DataFrameColumn], if_not_exists: bool = False, children=None): super().__init__(OperatorType.LOGICALCREATE, children) self._video = video self._column_list = column_list self._if_not_exists = if_not_exists # ...
Register new operator type to class OperatorType, Notice that must add it before LOGICALDELIMITER !!!
statement_to_opr_convertor.py
import resource
from src.optimizer.operators import LogicalCreate from src.parser.rename_statement import CreateTableStatement
implement visit_[cmd]() function, which converts statement to operator
# May need to convert the statement into another data type. # The new data type is usable for executing command. # For example, column_list -> column_metadata_list def visit_create(self, statement: AbstractStatement): video_ref = statement.table_ref if video_ref is None: LoggingManager().log("Missing Table Name In Create Statement", LoggingLevel.ERROR) if_not_exists = statement.if_not_exists column_metadata_list = create_column_metadata(statement.column_list) create_opr = LogicalCreate( video_ref, column_metadata_list, if_not_exists) self._plan = create_opr
modify visit function to call the right visit_[cmd] function
def visit(self, statement: AbstractStatement): if isinstance(statement, SelectStatement): self.visit_select(statement) #... elif isinstance(statement, CreateTableStatement): self.visit_create(statement) return self._plan
3. Plan Generator#
The part transformed logical plan to physical plan. The modified files are stored under Optimizer and Planner folders.
plan_nodes/#
[cmd]_plan.py
- class [cmd]Plan, which stored information required for rename table.
class CreatePlan(AbstractPlan):
def __init__(self, video_ref: TableRef,
column_list: List[DataFrameColumn],
if_not_exists: bool = False):
super().__init__(PlanOprType.CREATE)
self._video_ref = video_ref
self._column_list = column_list
self._if_not_exists = if_not_exists
#...
types.py
- register new plan operator type to PlanOprType
optimizer/rules#
rules.py
-Import operators
Register new ruletype to RuleType and Promise (place it before IMPLEMENTATION_DELIMITER !!)
implement class
Logical[cmd]ToPhysical
, its member function apply() will construct a corresponding[cmd]Plan
object.
class LogicalCreateToPhysical(Rule): def __init__(self): pattern = Pattern(OperatorType.LOGICALCREATE) super().__init__(RuleType.LOGICAL_CREATE_TO_PHYSICAL, pattern) def promise(self): return Promise.LOGICAL_CREATE_TO_PHYSICAL def check(self, before: Operator, context: OptimizerContext): return True def apply(self, before: LogicalCreate, context: OptimizerContext): after = CreatePlan(before.video, before.column_list, before.if_not_exists) return after
rules_base.py
-Register new ruletype to RuleType and Promise (place it before IMPLEMENTATION_DELIMITER !!)
rules_manager.py
-Import rules created in
rules.py
Add imported logical to physical rules to
self._implementation_rules
4. Plan Executor#
PlanExecutor
uses data stored in physical plan to run the command.
executor/#
[cmd]_executor.py
- implement an executor that make changes in catalog, metadata, or storage engine to run the command.May need to create helper function in CatalogManager, DatasetService, DataFrameMetadata, etc.
class CreateExecutor(AbstractExecutor): def exec(self): if (self.node.if_not_exists): # check catalog if we already have this table return table_name = self.node.video_ref.table_info.table_name file_url = str(generate_file_path(table_name)) metadata = CatalogManager().create_metadata(table_name, file_url, self.node.column_list) StorageEngine.create(table=metadata)
Additional Notes#
Key data structures in EvaDB:
Catalog: Records
DataFrameMetadata
for all tables.data stored in DataFrameMetadata:
name
,file_url
,identifier_id
,schema
file_url
- used to access the real table in storage engine.
For the
RENAME
table command, we use theold_table_name
to access the corresponding entry in metadata table, and themodified name
of the table.
Storage Engine:
API is defined in
src/storage
, currently only supports create, read, write.