Introduction
Modern data warehouses often struggle with performance bottlenecks caused by large fact tables and broadcast table joins. Even a single report execution can consume minutes—or hours—of processing time when datasets reach hundreds of millions of rows.
In one of our real-world warehouse systems at Reckitt Benckiser UK, we faced precisely this challenge: a 508-million-row fact table joined with a 111,000-row dimension table, resulting in report runtimes exceeding 19 minutes.
Through systematic analysis and optimization using Replicated Materialized Query Tables (MQTs), we achieved a dramatic improvement—from 1168 seconds to just 49 seconds, reducing query cost from 17.2 million timerons to 175,576 timerons
This blog post walks you through that transformation step-by-step.
Understanding the Problem
The query in question joined a large fact table with a relatively small dimension table:
SELECT
F.CUST_KEY,
F.PROD_SKU_KEY,
F.BUSS_UNIT_KEY,
F.UNIT_YTD_VOLUME,
D.CUST_NAME,
D.CUST_STRATEGIC_DIR_CODE,
D.TRD_TY_CODE
FROM
DATAMARTS.FACT_CUST_PROFTBLTY F,
DATAMARTS.DIM_CUSTOMER D
WHERE
F.CUST_KEY = D.CUST_KEY;
The execution time was 1168 seconds, with an estimated 17.2 million timerons.
Why? Because Db2’s Broadcast Table Queue (BTQ) mechanism replicated the small dimension table at runtime to every database partition—adding heavy inter-node communication overhead
In a partitioned data warehouse environment, each node processed part of the fact table, while the dimension table was broadcasted across the cluster for join resolution. This broadcast process was the major bottleneck.
Step 1: Analyze the Explain Plan
The first step for any DBA should always be understanding what Db2’s optimizer is doing.
db2 “SET CURRENT EXPLAIN MODE EXPLAIN”
db2 -tvf EXTERNALSQL.sql > EXTERNALSQL.log
db2 “SET CURRENT EXPLAIN MODE NO”
db2exfmt -d DBNAME -g TIC -w -1 -s % -n % -# 0 -o EXTERNALSQL.out
The output confirmed that the optimizer used a Broadcast Table Queue, causing significant inter-node data movement during join execution.
At this point, the plan cost stood at:
- 17.2 million timerons
- 1168 seconds runtime
Step 2: Introduce a Replicated MQT
To eliminate broadcasting, we converted the dimension table to a replicated MQT—a local copy that resides on every database partition. This ensures each node has its own version of the small dimension table, eliminating inter-node traffic altogether
Implementation Steps
- Create dedicated tablespaces in the same database partition group as the fact table:
- CREATE TABLESPACE TBS_REP_DATA IN DBPARTITIONNUMS(ALL);
- CREATE TABLESPACE TBS_REP_INDEX IN DBPARTITIONNUMS(ALL);
- Create a replicated MQT version of the dimension table:
- CREATE TABLE DATAMARTS.R_DIM_CUSTOMER AS
- (SELECT * FROM DATAMARTS.DIM_CUSTOMER)
- DATA INITIALLY DEFERRED
- REFRESH IMMEDIATE
- IN TBS_REP_DATA
- INDEX IN TBS_REP_INDEX
- REPLICATED;
- Refresh the table:
- REFRESH TABLE DATAMARTS.R_DIM_CUSTOMER;
- Run REORG and RUNSTATS on the new MQT for accurate optimizer statistics.
Step 3: Validate the Results
After implementing the replicated MQT, the optimizer switched from a Broadcast Table Queue (BTQ) to a Local Table Queue (LTQ)—completely removing cross-partition network movement.
Running the same query now yielded:
- 175,576 timerons (down from 17,187,700)
- 49 seconds runtime (down from 1168 seconds)
This represents a 23.8× performance improvement, achieved purely through data locality optimization
Visualizing the Difference
Before (Broadcast Table Queue)
- Dimension table broadcasted to all nodes.
- Each node waited for broadcast completion.
- Heavy inter-node network overhead.
After (Local Table Queue with Replicated MQT)
- Each node accesses a local copy of the dimension table.
- No data broadcast required.
- Parallel execution without inter-node blocking.
Result: Massive reduction in runtime and resource consumption.
Why Replicated MQTs Work
Replicated MQTs are particularly effective for star-schema and snowflake warehouse designs where:
- Fact tables are partitioned across nodes.
- Dimension tables are relatively small.
- Frequent joins occur between facts and dimensions.
By replicating dimensions locally, queries leverage intra-node joins instead of expensive distributed joins.
This strategy not only reduces runtime but also improves concurrency and query scalability under multi-user workloads.
Best Practices for Using Replicated MQTs
Area | Recommendation |
When to Use | Use for small, frequently joined dimension tables (< few hundred MB). |
Refresh Strategy | Schedule nightly or event-based refreshes using REFRESH TABLE. |
Statistics Maintenance | Run RUNSTATS on both MQT and base tables after refresh. |
Storage Placement | Keep replicated tablespaces in the same partition group as the fact table. |
Indexes | Create the same indexes as base tables to preserve access efficiency. |
Outcome
This optimization not only transformed a slow-performing report into a fast one but also enhanced the overall warehouse efficiency.
The execution time dropped from 19 minutes to under 1 minute, and report queues on the BI platform were cleared dramatically faster—improving analyst productivity and infrastructure utilization
Conclusion
For data warehouse architects and DBAs working on Db2 DPF (Database Partitioning Feature) environments, Replicated MQTs offer a simple yet powerful way to improve performance in join-heavy queries.
When combined with proper statistics, indexing, and refresh scheduling, they can deliver 10×–25× performance gains with minimal code changes.
In short, if you’re struggling with slow-running reports due to distributed joins—replicate your dimensions, and watch your timerons vanish.
About the Author
Mohankumar Saraswatipura
Database Architect | IBM Champion | Performance Optimization Specialist
Mohan has led major Db2 optimization initiatives across manufacturing, financial, and cloud environments. His work focuses on performance tuning, replication, and high availability solutions.
He has authored multiple technical articles for IBM Data Magazine and developerWorks, including “How to Improve Performance in Warehouse Systems Using Replicated MQTs.”