# Sync PSQL Materialized Views After Debounce Period

## Sync PSQL Materialized View After Updates and Debounce Period

Abstract:
You have a PSQL database utilizing a “MV” materialized view and want an automated process to determine if the MV should be refreshed because the underlying tables have changed. It should also wait a period of time between the last known update to a table and then execute the REFRESH MATERIAL VIEW command.

This article will show you how to set up a couple of functions to handle that process on as many MVs as you need. Materialized Views are like actual tables in that their data is stored in a table rather than queried each time it is used, and they are like views in that you can’t modify the data in the MV… and so in order for a materialized view to properly reflect the data in the tables it gets the original data from there has to be an explicit instruction to refresh. Refreshing takes both time and resources, refreshing a materialized view may take long seconds or minutes to complete. In older versions of Postgresql this would even cause data to become temporarily inaccessible (and still will without the use of the CONCURRENTLY parameter circa Postgresql >= 9.4).

Note on Postgresql versions: I am using PostgreSQL 9.5 while writing this article, I’ll assume you’re using Postgresql ≥9.5. The reason for this is that the REFRESH MATERIALIZED VIEW parameter CONCURRENTLY wasn’t added until approx 9.4 I believe and ON CONFLICT wasn’t available until 9.5 as I’ve come to understand. The ON CONFLICT can be worked around super simple-easy enough read my notes at end of article

Note See my notes at the bottom of the page for a JavaScript function that can build out the entire SQL for everything, including setting up triggers on all your tables. All it requires is a JavaScript array of your table names and the material views they effect. (this is optional, you can write it all by hand too)

### The Solution (step by step):

We’ll want to take into account:

• Does the materialized view need to be refreshed?
• How long since the last event occurred that might cause the materialized view to becoming out of sync with the data it represents?
• Is that amount of time sufficient to conclude that changes are not likely to be made again? Because more changes mean more refreshes, more resources measurable by units of time/CPU/money. (We’ll label the minimum amount of time to wait after the most recent change was made to initiate a refresh as the “debounce” time)
• Refresh the materialized view.
• Whenever changes are made to relevant tables, the timestamp will be recorded as an update to a table we’ll build with the sole purpose of reaching the above conclusions.

The project I used this technique for has two materialized views in our project, but one uses the other and so I don’t need to track them separately. I appreciate that is not the case for everyone and so I wrote this to account for the possibility of multiple independent materialized views.

##### Step 1. Create a Table To Record Latest Updates

We’ll create a table to keep track of when specific tables have been changed. This table “table_update_records” will be an audit of the latest changes to all tables that contain data our materialized view/s are meant to represent (as well as the update time of the materialized view). We’ll use these times later to determine if the debounce time has passed, which is our buffer to reduce the likelihood that we’ll need to make subsequent refreshes to an MV.

Columns in the table_update_records table:

id table_name rel udpated
Type: SERIAL VARCHAR(50) VARCHAR(50) TIMESTAMP
Purpose: Primary Key When was the update Table that was updated MV that would be effected by update
Notes: only required to track multiple MV refresh plans

If you’re not using Postgresql >= 9.5 then you will need to insert a row for each table that is a dependency of some materialized view (and the MV needs a row as well). Otherwise, you don’t need to do this manually as we’ll be using “upserts”. I feel having to do it manually is more prone to error, however, let’s insert some rows by hand to the table_update_records table in order to check out that it’s what we expect it to be and quickly review the difference between an update and an upsert before continuing. Imagine 2 tables and a materialized view respectively as (employees, profiles, and mv_profiles)

##### Step 2. Create a Function To Trigger Record Updates

The table table_update_records won’t update itself, let’s write a trigger function to do the updating for us. This function will trigger on any tables related to specific materialized views after tables perform some INSERT, UPDATE or DELETE.

I wrote two versions of this function. Both versions take a single argument (the materialized views table name), PSQL Trigger Functions can’t have explicit arguments so we’ll pass the argument when we set the trigger and access it on the TG_ARGV[] arguments array at “runtime”. Both also run after any INSERT, UPDATE, DELETE query and update the appropriate table_update_records.updated value to current_time.

Version A just assumes that if an UPDATE/INSERT/DELETE was made to a table then something was actually changed (no need to compare OLD and NEW row values for each row:

ALTERNATIVE

Version B will run after any INSERT, UPDATE, DELETE query, but it will check each row that was touched by the query and test that something is actually different.

If you want to make sure that a change was actually made, and that an update didn’t occur which simply left the table in the same state it was in before, then you can replace the above function with the following alternative version.

version B doesn’t assume that every UPDATE/INSERT/DELETE that is ran actually changes something, it checks each row involved for changes (I didn’t use this version because the likelihood of it being useful in my apps case is almost zero, if you feel that false positives “ie: refreshing materialized views for no reason” could be a problem then this version is for you):

#### Step 4. Create a function to compare update times and refresh materialized views if needed

Basically, we are going to write a Postgresql function that will check the table_update_records that audits most-recent updates and for some materialized view mat_view:

• Query the table_update_records table and see if the last updated table with rel = mat_view was a table, not the actual materialized view mat_view.
• If so… has it has been longer than the debounce time (ie: 1 minute, thirty-minutes, whatever) from the time of the last update and current time now?
• If so… Refresh the materialized view mat_view, and then update table_update_records.updated value where table_update_records.table_name is mat_view. So essentially after a refresh, we record an updated time of a change in the Materialized View row on table_update_records,

The way we know if the materialized view is out of sync is if the time of its last update is earlier than the update of a table with rel equal to the materialized view. So this means that we’ll need a row in the table_update_records row for the materialized view. I’ve made sure this will be added if not present with an upsert in the below function, you may enter it manually if you need to replace the upsert in the function with a normal insert.

The following function is not a trigger, it’s just a function, you will need to set up some mechanism to run it on a schedule. You may want to use a cron job, or if running on AWS a Lambda function.

GOTCHAS:

• Use SECURITY DEFINER in the declaration so that whichever user you use to
execute the function will have correct permissions to refresh tables. In
other words, you want a role/user on the cron job with low low priviledges,
but you need those priviledges to be able to run the commands inside the
function (duh’), so SECURITY DEFINER runs the function as the user who
defined it.
• Make sure to watch the order of the age() function, think of it like subtraction, the left value must be larger than the right or you’ll end up with a negative. Since the current_timestamp will always be greater than a moment from the past, age(current_timestamp, some_old_timestamp) should always be a positive time value. You can compare it directly to a string representation of some time unit, like “> ‘10 minutes’” or “< 1 hour”.
• Materialized Views cannot REFRESH with the CONCURRENTLY parameter if the materialized view hasn’t been populated yet or if it doesn’t have a unique index. The latter being the more likely point of failure. So if you didn’t include a unique index on your materialized view, then you can add one now. It can be a combination of columns, but it cannot include a WHERE clause. You might even want to alter the view, add a new column that stands as a unique index if the alternative is making a 5 column unique index.
So something like:

##### Step 5. Creating a cron job.

Now all that is left is to schedule the following line of SQL to run every 10 minutes, or 1 hour, or once a day. The two arguments to the function should be the ‘debounce_time’ and the ‘rel’ value (materialized view table name).

Remember, the debounce time isn’t the same as the cron job time. You might have this function run every 1 minutes with a debounce time of 15 minutes, which results in a 15 - 16-minute window between when the last change related to the data in our materialized view was made and the view refreshing.

I will leave the “how” up to you, basically, you need to create some script (bash, python, javascript) and a cron job or pgAdmin scheduled procedure. Since my app is on AWS I don’t have pgAdmin setup and so I chose AWS Lambda and a Python function. Easy peasy lemon squeezy.

###### Tracking multiple tables

You can set up multiple cron jobs or just make multiple SQL queries that change the arguments passed into the fn_cron_smart_mv_sync so it can track multiple tables.

###### The output from function return

Expect a call like select fn_cron_smart_mv_sync('10 minutes', 'mv_profiles'); will return results such as: