Prediction#
Introduction#
In this tutorial, we present how to to create and train a machine learning model for predicting the rental price of a home using the built-in Prediction AI Engines in EvaDB. EvaDB makes it easy to do prediction queries over data in your database using its built-in Prediction
engines.
Prerequisites#
To follow along, you will need to set up a local instance of EvaDB via pip.
Connect to EvaDB#
After installing EvaDB, use the following Python code to establish a connection and obtain a cursor
for running EvaQL
queries.
import evadb
cursor = evadb.connect().cursor()
Connect EvaDB to PostgreSQL Database Server#
We will assume that you have a PostgreSQL database running locally that contains the data needed for analysis. Follow these instructions to install PostgreSQL.
Note
If find it challenging to install the PostgreSQL
database on your machine, here is an alternative for quick prototyping.
You can use an embedded SQLite database. If you go with the sqlite
database, alter the SQL commands in this tutorial to use the sqlite
engine and the evadb.db
SQLite database file as explained in the SQLite page.
EvaDB lets you connect to your favorite databases, data warehouses, data lakes, etc., via the CREATE DATABASE
statement. In this query, we connect EvaDB to an existing PostgreSQL
server:
CREATE DATABASE postgres_data
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "eva",
"password": "password",
"host": "localhost",
"port": "5432",
"database": "evadb"
}
We will assume that the input data is loaded into a PostgreSQL
database.
To load the home rental data into your database, see the complete home rental prediction notebook on Colab.
Preview the Home Rental Price Data#
The home_rentals
table contains eight columns: number_of_rooms
, number_of_bathrooms
, sqft
, location
, days_on_market
, initial_price
, neighborhood
, and rental_price
.
SELECT * FROM postgres_data.home_rentals LIMIT 3;
This query presents a subset of data in the home_rentals
table:
+------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+
| home_rentals.number_of_rooms | home_rentals.number_of_bathrooms | home_rentals.sqft | home_rentals.location | home_rentals.days_on_market | home_rentals.initial_price | home_rentals.neighborhood | home_rentals.rental_price |
|------------------------------|----------------------------------|-------------------|-----------------------|-----------------------------|----------------------------|---------------------------|---------------------------|
| 1 | 1 | 674 | good | 1 | 2167 | downtown | 2167 |
| 1 | 1 | 554 | poor | 19 | 1883 | westbrae | 1883 |
| 0 | 1 | 529 | great | 3 | 2431 | south_side | 2431 |
+------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+
Train a Home Rental Price Prediction Model#
Let’s next train a prediction model over the home_rental
table using EvaDB’s CREATE FUNCTION
statement. We will use the built-in Ludwig engine for this task.
CREATE OR REPLACE FUNCTION PredictHouseRent FROM
( SELECT * FROM postgres_data.home_rental )
TYPE Ludwig
PREDICT 'rental_price'
TIME_LIMIT 3600;
In the above query, we use all the columns (except rental_price
) from home_rental
table to predict the rental_price
column.
We set the training time out to be 3600
seconds.
Note
The Model Training with Ludwig page lists all the configurable parameters for the model training framework.
This query returns the trained model:
+----------------------------------------------+
| Function PredictHouseRent successfully added |
+----------------------------------------------+
Predict using the Trained Model#
Next we use the trained PredictHouseRent
to predict the home rental price.
SELECT PredictHouseRent(*)
FROM postgres_data.home_rentals
LIMIT 3;
We use *
to pass all the columns in the table as input into the PredictHouseRent
function.
+-------------------------------------------+
| predicthouserent.rental_price_predictions |
+-------------------------------------------+
| 2087.763672 |
| 1793.570190 |
| 2346.319824 |
+-------------------------------------------+
We next do a LATERAL JOIN
to compare the actual rental prices in the home_rentals
dataset against the predicted rental prices generated by the trained model, PredictHouseRent
.
SELECT rental_price, predicted_rental_price
FROM postgres_data.home_rentals
JOIN LATERAL PredictHouseRent(*) AS Predicted(predicted_rental_price)
LIMIT 3;
Here is the query’s output:
+---------------------------+----------------------------------+
| home_rentals.rental_price | Predicted.predicted_rental_price |
+---------------------------+----------------------------------+
| 2167 | 2087.763672 |
| 1883 | 1793.570190 |
| 2431 | 2346.319824 |
+------------------ --------+----------------------------------+
What’s Next?#
👋 If you are excited about our vision of bringing AI inside databases, consider:
📟 joining our Slack: https://evadb.ai/slack
🐙 following us on Github: https://evadb.ai/github
🐦 following us on Twitter: https://evadb.ai/twitter
📝 following us on Medium: https://evadb.ai/blog
🖥️ contributing to EvaDB: https://evadb.ai/github

Language Models (🦙) and Databases#