Prediction#

Run on Google Colab View source on GitHub Download notebook


Introduction#

In this tutorial, we present how to to create and train a machine learning model for predicting the rental price of a home using the built-in Prediction AI Engines in EvaDB. EvaDB makes it easy to do prediction queries over data in your database using its built-in Prediction engines.

Prerequisites#

To follow along, you will need to set up a local instance of EvaDB via pip.

Connect to EvaDB#

After installing EvaDB, use the following Python code to establish a connection and obtain a cursor for running EvaQL queries.

import evadb
cursor = evadb.connect().cursor()

Connect EvaDB to PostgreSQL Database Server#

We will assume that you have a PostgreSQL database running locally that contains the data needed for analysis. Follow these instructions to install PostgreSQL.

Note

If find it challenging to install the PostgreSQL database on your machine, here is an alternative for quick prototyping.

You can use an embedded SQLite database. If you go with the sqlite database, alter the SQL commands in this tutorial to use the sqlite engine and the evadb.db SQLite database file as explained in the SQLite page.

EvaDB lets you connect to your favorite databases, data warehouses, data lakes, etc., via the CREATE DATABASE statement. In this query, we connect EvaDB to an existing PostgreSQL server:

CREATE DATABASE postgres_data
WITH ENGINE = 'postgres',
PARAMETERS = {
    "user": "eva",
    "password": "password",
    "host": "localhost",
    "port": "5432",
    "database": "evadb"
}

We will assume that the input data is loaded into a PostgreSQL database. To load the home rental data into your database, see the complete home rental prediction notebook on Colab.

Preview the Home Rental Price Data#

The home_rentals table contains eight columns: number_of_rooms, number_of_bathrooms, sqft, location, days_on_market, initial_price, neighborhood, and rental_price.

SELECT * FROM postgres_data.home_rentals LIMIT 3;

This query presents a subset of data in the home_rentals table:

+------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+
| home_rentals.number_of_rooms | home_rentals.number_of_bathrooms | home_rentals.sqft | home_rentals.location | home_rentals.days_on_market | home_rentals.initial_price | home_rentals.neighborhood | home_rentals.rental_price |
|------------------------------|----------------------------------|-------------------|-----------------------|-----------------------------|----------------------------|---------------------------|---------------------------|
|                            1 |                                1 |               674 |                  good |                           1 |                       2167 |                  downtown |                      2167 |
|                            1 |                                1 |               554 |                  poor |                          19 |                       1883 |                  westbrae |                      1883 |
|                            0 |                                1 |               529 |                 great |                           3 |                       2431 |                south_side |                      2431 |
+------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+

Train a Home Rental Price Prediction Model#

Let’s next train a prediction model over the home_rental table using EvaDB’s CREATE FUNCTION statement. We will use the built-in Ludwig engine for this task.

CREATE OR REPLACE FUNCTION PredictHouseRent FROM
( SELECT * FROM postgres_data.home_rental )
TYPE Ludwig
PREDICT 'rental_price'
TIME_LIMIT 3600;

In the above query, we use all the columns (except rental_price) from home_rental table to predict the rental_price column. We set the training time out to be 3600 seconds.

Note

The Model Training with Ludwig page lists all the configurable parameters for the model training framework.

This query returns the trained model:

+----------------------------------------------+
| Function PredictHouseRent successfully added |
+----------------------------------------------+

Predict using the Trained Model#

Next we use the trained PredictHouseRent to predict the home rental price.

SELECT PredictHouseRent(*)
FROM postgres_data.home_rentals
LIMIT 3;

We use * to pass all the columns in the table as input into the PredictHouseRent function.

+-------------------------------------------+
| predicthouserent.rental_price_predictions |
+-------------------------------------------+
|                               2087.763672 |
|                               1793.570190 |
|                               2346.319824 |
+-------------------------------------------+

We next do a LATERAL JOIN to compare the actual rental prices in the home_rentals dataset against the predicted rental prices generated by the trained model, PredictHouseRent.

SELECT rental_price, predicted_rental_price
FROM postgres_data.home_rentals
JOIN LATERAL PredictHouseRent(*) AS Predicted(predicted_rental_price)
LIMIT 3;

Here is the query’s output:

+---------------------------+----------------------------------+
| home_rentals.rental_price | Predicted.predicted_rental_price |
+---------------------------+----------------------------------+
|                      2167 |                      2087.763672 |
|                      1883 |                      1793.570190 |
|                      2431 |                      2346.319824 |
+------------------ --------+----------------------------------+

What’s Next?#

👋 If you are excited about our vision of bringing AI inside databases, consider:

Language Models (🦙) and Databases

Language Models (🦙) and Databases#