Home Sale Forecasting#

Run on Google Colab View source on GitHub Download notebook


Introduction#

In this tutorial, we present how to use Forecasting AI Engines in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function.

Prerequisites#

To follow along, you will need to set up a local instance of EvaDB via pip.

Connect to EvaDB#

After installing EvaDB, use the following Python code to establish a connection and obtain a cursor for running EvaQL queries.

import evadb
cursor = evadb.connect().cursor()

Connect EvaDB to PostgreSQL Database Server#

We will assume that you have a PostgreSQL database server running locally that contains the data needed for analysis. Follow these instructions to install PostgreSQL.

EvaDB lets you connect to your favorite databases, data warehouses, data lakes, etc., via the CREATE DATABASE statement. In this query, we connect EvaDB to an existing PostgreSQL server:

CREATE DATABASE postgres_data
WITH ENGINE = 'postgres',
PARAMETERS = {
    "user": "eva",
    "password": "password",
    "host": "localhost",
    "port": "5432",
    "database": "evadb"
}

We will assume that the input data is loaded into a PostgreSQL database. To load the home sales data into your database, see the complete home sale forecasting notebook on Colab.

Preview the Home Sales Data#

We use the raw_sales.csv of the House Property Sales Time Series in this usecase. The data contains five columns: postcode, price, bedrooms, datesold, and propertytype.

SELECT * FROM postgres_data.home_sales LIMIT 3;

This query previews the data in the home_sales table:

+---------------------+------------------+---------------------+---------------------+-------------------------+
| home_sales.postcode | home_sales.price | home_sales.bedrooms | home_sales.datesold | home_sales.propertytype |
|---------------------|------------------|---------------------|---------------------|-------------------------|
|                2607 |           525000 |                   4 |          2007-02-07 |                   house |
|                2906 |           290000 |                   3 |          2007-02-27 |                   house |
|                2905 |           328000 |                   3 |          2007-03-07 |                   house |
+---------------------+------------------+---------------------+---------------------+-------------------------+

Train a Home Sale Forecasting Model#

Let’s next train a time-series forecasting model from the home_sales table using EvaDB’s CREATE FUNCTION query. Particularly, we are interested in the price of the properties that have three bedrooms and are in the postcode 2607 area.

CREATE FUNCTION IF NOT EXISTS HomeSaleForecast FROM
  (
    SELECT propertytype, datesold, price
    FROM postgres_data.home_sales
    WHERE bedrooms = 3 AND postcode = 2607
  )
TYPE Forecasting
PREDICT 'price'
TIME 'datesold'
ID 'propertytype'
FREQUENCY 'W';

In the home_sales dataset, we have two different property types, houses and units, and price gap between them are large. We’d like to ask EvaDB to analyze the price of houses and units independently. To do so, we specify the propertytype column as the ID of the time series data, which represents an identifier for the series. Here is the query’s output DataFrame:

Note

Go over Time Series Forecasting page on exploring all configurable paramters for the forecast model.

+----------------------------------------------+
| Function HomeSaleForecast successfully added |
+----------------------------------------------+

Predict the Home Price using the Trained Model#

Next we use the trained HomeSaleForecast to predict the home sale price for next 3 weeks.

SELECT HomeSaleForecast(3);

The input of the trained model is the horizon (i.e., week in this case), the steps we want to forecast in the future. Here is the query’s output DataFrame:

+-------------------------------+---------------------------+------------------------+
| homesaleforecast.propertytype | homesaleforecast.datesold | homesaleforecast.price |
+-------------------------------+---------------------------+------------------------+
|                         house |                2019-07-21 |                 766572 |
|                         house |                2019-07-28 |                 766572 |
|                         house |                2019-08-04 |                 766572 |
|                          unit |                2018-12-23 |                 417229 |
|                          unit |                2018-12-30 |                 409601 |
|                          unit |                2019-01-06 |                 402112 |
+-------------------------------+---------------------------+------------------------+

We can further use ORDER BY to find out which month in the following year has the lower price.

SELECT *
FROM (SELECT HomeSaleForecast(12)) AS HomeSale
ORDER BY price
LIMIT 1;

Here is the query’s output:

+-----------------------+-------------------+----------------+
| HomeSale.propertytype | HomeSale.datesold | HomeSale.price |
+-----------------------+-------------------+----------------+
|                  unit |        2019-03-10 |         340584 |
+-----------------------|-------------------|----------------|

What’s Next?#

👋 EvaDB’s vision is to bring AI inside your database system and make it easy to build fast AI-powered apps. If you liked this tutorial and are excited about our vision, show some ❤️ by: