Automate Materialized View Refresh

Kuldeep Kumar
4 min readJun 13, 2024

--

Have you ever struggled with long-running queries? I understand the frustration. It’s not always possible to increase your database’s CPU capacity, and query optimization can only go so far. In this article, I’ll share my approach to reducing the execution time of lengthy queries in Postgres.

Implementing business logic in SQL can be challenging, especially when your query involves joining multiple tables, leading to slow performance.

Without Materialized Views

Here’s how I addressed this issue—

In various datasets, I noticed several common CTEs (Common Table Expressions) performing similar tasks, with only minor differences in the columns used. To optimize this, I grouped these CTEs and converted them into tables or materialized views. The main challenge was that these CTEs were executing the same joins on tables with slight column variations.

I created unified CTEs containing all the necessary columns and experimented with different methods to create tables and populate them using triggers. However, this approach was ineffective due to the frequent updates in our raw tables. Ultimately, I resolved this by using materialized views. Advantages?

  1. You don’t have to rewrite CTEs with extensive joins in every dataset.
  2. Allows materialized views and datasets to be refreshed on different schedules.
  3. You can create indexes in Materialized Views.

Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. The main differences between Materialized views and tables are that you cannot directly update or insert data into a materialized view. Instead, you must regenerate and refresh the entire view to update the data.

Create Materialized views —

CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM mytable;

Refresh Materialized views —

REFRESH MATERIALIZED VIEW mymatview;

The data will be updated as per your provided query whenever you run the Refresh Materialized View query.

Now another task was to schedule the refresh of these Materialized views. I used crontab for this. Commands used —

crontab -e

# m h  dom mon dow   command
0 0 * * * psql -h HOST -U USER -d DB -c 'REFRESH MATERIALIZED VIEW mymatview;'

I initially scheduled the refresh of four materialized views using Crontab, but this approach had several drawbacks:

  1. Manually adding crons for each new view.
  2. Scheduling each refresh at different times to avoid overloading the database.
  3. Lacking automated verification of whether the refreshes ran successfully.
  4. Difficulty in tracking the last successful refresh and the time taken for each refresh when dealing with numerous views.

Solution —
Create a log table and push the data for every refresh. I created a bash script for this task. Named it as ‘refresh_mat_views.sh’.

#!/bin/bash

# Database connection details
HOST=""
USER=""
DB=""

# Fetch the list of materialized views in the schemaname schema
views=$(psql -h $HOST -U $USER -d $DB -t -c "SELECT matviewname FROM pg_matviews WHERE schemaname = 'schemaname';")

# Loop through each materialized view
for view in $views
do
# Log the start time
start_time=$(psql -h $HOST -U $USER -d $DB -t -c "SELECT CURRENT_TIMESTAMP;")

# Refresh the materialized view
psql -h $HOST -U $USER -d $DB -c "REFRESH MATERIALIZED VIEW schemaname.$view;"

# Log the end time
end_time=$(psql -h $HOST -U $USER -d $DB -t -c "SELECT CURRENT_TIMESTAMP;")

# Calculate the duration
duration=$(psql -h $HOST -U $USER -d $DB -t -c "SELECT '$end_time'::timestamptz - '$start_time'::timestamptz;")

# Insert the log into the database
psql -h $HOST -U $USER -d $DB -c "INSERT INTO materialized_view_refresh_log (view_name, refresh_date, start_time, end_time, duration) VALUES ('$view', '$start_time'::date, '$start_time', '$end_time', '$duration');"
done

I kept all my Materialized views in a specific schema and used a loop to Refresh all the views one by one. This automated a few things —

  1. Just create a Materialized view in the specified schema and that will be registered to refresh.
  2. No need to schedule the refresh for every view.
  3. No need to worry about scheduling the refresh at different times. Views will be refreshed one by one.
  4. Refresh logs will be present in materialized_view_refresh_log. you can create an analysis on that as well.
  5. Directly schedule this bash script and the refresh will be taken care of.
0 0 * * * bash refresh_mat_views.sh

Now my dataset queries run faster as they are not picking data from views but from Materialized views with proper indexes.

With Materialized Views

Happy Learning!!! ✌🏻

--

--

Kuldeep Kumar
Kuldeep Kumar

Written by Kuldeep Kumar

Data Engineer. Expert in architecting scalable data solutions on AWS platform.Passionate about driving actionable insights through innovative cloud technologies

No responses yet