Basic API#

To begin your querying session, get a connection with a cursor to EvaDB using connect and cursor function calls:

connect([evadb_dir, sql_backend])

Connects to the EvaDB server and returns a connection object.

cursor()

Retrieves a cursor associated with the connection.

import evadb

cursor = evadb.connect().cursor()

After getting a cursor, you can load documents and run queries using the EvaDBCursor interface. To construct the queries with pandas-like API, use the EvaDBQuery interface.

### load the pdfs in a given folder into the "office_data" table
cursor.load(
    file_regex=f"office_data/*.pdf", format="PDF", table_name="office_data_table"
).df()

### load a given video into the "youtube_videos" table
cursor.load("movie.mp4", "youtube_videos", "video").df()

Warning

It is important to call df to run the actual query and get the result dataframe. EvaDB does lazy query execution to improve performance.

Calling cursor.query("...") will only construct and not run the query. Calling cursor.query("...").df() will both construct and run the query.

EvaDBCursor Interface#

Using the cursor, you can refer to a table, load documents, create functions, create vector index, and several other tasks.

After connecting to a table using table, you can construct a complex query using the EvaDBQuery interface.

table(table_name[, chunk_size, chunk_overlap])

Retrieves data from a table in the database.

load(file_regex, table_name, format, **kwargs)

Loads data from files into a table.

query(sql_query)

Executes a SQL query.

create_function(udf_name[, if_not_exists, ...])

Create a udf in the database.

create_table(table_name[, if_not_exists, ...])

Create a udf in the database.

create_vector_index(index_name, table_name, ...)

Creates a vector index using the provided expr on the table.

drop_table(table_name[, if_exists])

Drop a table in the database.

drop_function(udf_name[, if_exists])

Drop a udf in the database.

drop_index(index_name[, if_exists])

Drop an index in the database.

df()

Returns the result as a pandas DataFrame.

show(object_type, **kwargs)

Shows all entries of the current object_type.

insert(table_name, columns, values, **kwargs)

Executes an INSERT query.

explain(sql_query)

Executes an EXPLAIN query.

rename(table_name, new_table_name, **kwargs)

Executes a RENAME query.

EvaDBQuery Interface#

select(expr)

Projects a set of expressions and returns a new EvaDBQuery.

cross_apply(expr, alias)

Execute a expr on all the rows of the relation

filter(expr)

Filters rows using the given condition.

df()

Execute and fetch all rows as a pandas DataFrame

alias(alias)

Returns a new Relation with an alias set.

limit(num)

Limits the result count to the number specified.

order(order_expr)

Reorder the relation based on the order_expr

show()

Execute and fetch all rows as a pandas DataFrame

sql_query()

Get the SQL query that is equivalent to the relation

execute()

Transform the relation into a result set