You have a PSQL database utilizing a “MV” materialized view and want an automated process to determine if the MV should be refreshed because the underlying tables have changed. It should also wait a period of time between the last known update to a table and then execute the
REFRESH MATERIAL VIEW command.
This article will show you how to set up a couple of functions to handle that process on as many MVs as you need. Materialized Views are like actual tables in that their data is stored in a table rather than queried each time it is used, and they are like views in that you can’t modify the data in the MV… and so in order for a materialized view to properly reflect the data in the tables it gets the original data from there has to be an explicit instruction to refresh. Refreshing takes both time and resources, refreshing a materialized view may take long seconds or minutes to complete. In older versions of Postgresql this would even cause data to become temporarily inaccessible (and still will without the use of the
CONCURRENTLY parameter circa Postgresql >= 9.4).
Note on Postgresql versions: I am using PostgreSQL 9.5 while writing this article, I’ll assume you’re using Postgresql ≥9.5. The reason for this is that the
REFRESH MATERIALIZED VIEWparameter
CONCURRENTLYwasn’t added until approx 9.4 I believe and
ON CONFLICTwasn’t available until 9.5 as I’ve come to understand. The
ON CONFLICTcan be worked around super simple-easy enough read my notes at end of article
We’ll want to take into account:
- Does the materialized view need to be refreshed?
- How long since the last event occurred that might cause the materialized view to becoming out of sync with the data it represents?
- Is that amount of time sufficient to conclude that changes are not likely to be made again? Because more changes mean more refreshes, more resources measurable by units of time/CPU/money. (We’ll label the minimum amount of time to wait after the most recent change was made to initiate a refresh as the “debounce” time)
- Refresh the materialized view.
- Whenever changes are made to relevant tables, the timestamp will be recorded as an update to a table we’ll build with the sole purpose of reaching the above conclusions.
The project I used this technique for has two materialized views in our project, but one uses the other and so I don’t need to track them separately. I appreciate that is not the case for everyone and so I wrote this to account for the possibility of multiple independent materialized views.
We’ll create a table to keep track of when specific tables have been changed. This table “table_update_records” will be an audit of the latest changes to all tables that contain data our materialized view/s are meant to represent (as well as the update time of the materialized view). We’ll use these times later to determine if the debounce time has passed, which is our buffer to reduce the likelihood that we’ll need to make subsequent refreshes to an MV.
Columns in the table_update_records table:
|Purpose:||Primary Key||When was the update||Table that was updated||MV that would be effected by update|
|Notes:||only required to track multiple MV refresh plans|
If you’re not using Postgresql >= 9.5 then you will need to insert a row for each table that is a dependency of some materialized view (and the MV needs a row as well). Otherwise, you don’t need to do this manually as we’ll be using “upserts”. I feel having to do it manually is more prone to error, however, let’s insert some rows by hand to the
table_update_records table in order to check out that it’s what we expect it to be and quickly review the difference between an update and an upsert before continuing. Imagine 2 tables and a materialized view respectively as (employees, profiles, and mv_profiles)
table_update_records won’t update itself, let’s write a trigger function to do the updating for us. This function will trigger on any tables related to specific materialized views after tables perform some INSERT, UPDATE or DELETE.
I wrote two versions of this function. Both versions take a single argument (the materialized views table name), PSQL Trigger Functions can’t have explicit arguments so we’ll pass the argument when we set the trigger and access it on the
TG_ARGV arguments array at “runtime”. Both also run after any INSERT, UPDATE, DELETE query and update the appropriate
table_update_records.updated value to
Version A just assumes that if an UPDATE/INSERT/DELETE was made to a table then something was actually changed (no need to compare
NEW row values for each row:
Version B will run after any
DELETE query, but it will check each row that was touched by the query and test that something is actually different.
If you want to make sure that a change was actually made, and that an update didn’t occur which simply left the table in the same state it was in before, then you can replace the above function with the following alternative version.
version B doesn’t assume that every UPDATE/INSERT/DELETE that is ran actually changes something, it checks each row involved for changes (I didn’t use this version because the likelihood of it being useful in my apps case is almost zero, if you feel that false positives “ie: refreshing materialized views for no reason” could be a problem then this version is for you):
Basically, we are going to write a Postgresql function that will check the
table_update_records that audits most-recent updates and for some materialized view
- Query the
table_update_recordstable and see if the last updated table with
rel = mat_viewwas a table, not the actual materialized view
- If so… has it has been longer than the debounce time (ie: 1 minute, thirty-minutes, whatever) from the time of the last update and current time now?
- If so… Refresh the materialized view
mat_view, and then update
mat_view. So essentially after a refresh, we record an updated time of a change in the Materialized View row on
The way we know if the materialized view is out of sync is if the time of its last update is earlier than the update of a table with
rel equal to the materialized view. So this means that we’ll need a row in the
table_update_records row for the materialized view. I’ve made sure this will be added if not present with an upsert in the below function, you may enter it manually if you need to replace the upsert in the function with a normal insert.
The following function is not a trigger, it’s just a function, you will need to set up some mechanism to run it on a schedule. You may want to use a cron job, or if running on AWS a Lambda function.
SECURITY DEFINERin the declaration so that whichever user you use to
execute the function will have correct permissions to refresh tables. In
other words, you want a role/user on the cron job with low low priviledges,
but you need those priviledges to be able to run the commands inside the
function (duh’), so
SECURITY DEFINERruns the function as the user who
- Make sure to watch the order of the
age()function, think of it like subtraction, the left value must be larger than the right or you’ll end up with a negative. Since the
current_timestampwill always be greater than a moment from the past,
age(current_timestamp, some_old_timestamp)should always be a positive time value. You can compare it directly to a string representation of some time unit, like “> ‘10 minutes’” or “< 1 hour”.
- Materialized Views cannot
CONCURRENTLYparameter if the materialized view hasn’t been populated yet or if it doesn’t have a unique index. The latter being the more likely point of failure. So if you didn’t include a unique index on your materialized view, then you can add one now. It can be a combination of columns, but it cannot include a
WHEREclause. You might even want to alter the view, add a new column that stands as a unique index if the alternative is making a 5 column unique index.
So something like:1234567-- This Unique Index would work:create unique index uniq_indx_mv_profiles_code_state ON mv_profiles (code, state);-- This Unique Index would work:create unique index uniq_indx_mv_profiles_some_col ON mv_profiles (some_uniq_col);-- This Unique Index would not work (because of WHERE constraint):create unique index uniq_indx_mv_profiles_guid ON mv_profiles (guid) WHERE id > 0;
Now all that is left is to schedule the following line of SQL to run every 10 minutes, or 1 hour, or once a day. The two arguments to the function should be the ‘debounce_time’ and the ‘rel’ value (materialized view table name).
Remember, the debounce time isn’t the same as the cron job time. You might have this function run every 1 minutes with a debounce time of 15 minutes, which results in a 15 - 16-minute window between when the last change related to the data in our materialized view was made and the view refreshing.
You can set up multiple cron jobs or just make multiple SQL queries that change the arguments passed into the
fn_cron_smart_mv_sync so it can track multiple tables.
Expect a call like
select fn_cron_smart_mv_sync('10 minutes', 'mv_profiles'); will return results such as: