Technical note

The Join Mistakes That Make Dashboards Lie

  • Analytics Engineering
  • SQL
  • Dashboard Reconciliation
  • Source-to-Report Lineage

Some dashboard errors look obvious.

A chart fails to load. A refresh breaks. A field disappears. A number is blank.

The more dangerous errors look reasonable.

Revenue is higher than expected, but not impossible. Customer count has increased, but nobody knows whether it is growth or duplication. Conversion rate has changed, but the chart still renders and the BI tool shows no warning.

One common cause is a bad SQL join.

Not bad because the syntax is invalid. Bad because the join changes the grain of the data and silently duplicates the business event being measured.

This is a logic leak problem. The technical query still runs, but the business meaning has escaped.

The symptom: the dashboard spike nobody can reconcile

Imagine the Brightside Bikes weekly trading dashboard suddenly shows a revenue spike.

The data team checks the pipeline. The jobs ran. The dashboard refreshed. The source table has new orders. Nothing obviously failed.

But finance cannot reconcile the number to Stripe settlements or the board revenue bridge.

The issue may be fanout.

Fanout happens when a query joins one business event to multiple matching rows, then aggregates after the duplication has happened.

For example, Brightside’s Shopify order table may have one row per order. The order-line table may have several rows per order because one basket can contain a bike, a helmet, a lock, and a service-plan attachment. If the query joins orders to order lines and then sums order revenue, the order revenue can be counted once per product line.

The SQL runs.

The dashboard lies.

Grain is the first trust question

Before asking whether a query is “right”, ask what one row represents.

That is the grain.

In reporting trust terms, grain is one of the most important trust signals because it tells users what the number is counting.

Brightside examples:

  • One row per Shopify order
  • One row per Shopify order line
  • One row per Stripe charge or refund
  • One row per customer per week
  • One row per SKU per stock snapshot
  • One row per weekly board-report adjustment

Many reporting disputes come from joining tables at different grains without controlling the result.

Finance may be measuring invoice-date revenue. Growth may be measuring Shopify order-time revenue. Operations may be measuring shipped units. The dashboard may be joining all of them and grouping by week as if they naturally mean the same thing.

They do not.

That is how a semantic gap becomes a SQL problem.

A simple fanout example

This query looks plausible:

select
  date_trunc('week', o.order_date) as trading_week,
  sum(o.order_revenue) as revenue
from stg_shopify__orders o
left join stg_shopify__order_lines l
  on o.order_id = l.order_id
group by 1;

If every order has one line item, the result may look fine.

If some Brightside orders include a bike, accessories, and a maintenance subscription, order_revenue is duplicated before the sum().

The safer pattern is to aggregate or deduplicate the order-line data before joining it, depending on the business question.

with order_line_summary as (
  select
    order_id,
    count(*) as line_count,
    sum(line_revenue) as product_revenue
  from stg_shopify__order_lines
  group by 1
)

select
  date_trunc('week', o.order_date) as trading_week,
  sum(o.order_revenue) as revenue,
  count(distinct o.order_id) as orders,
  sum(ols.product_revenue) as product_revenue
from stg_shopify__orders o
left join order_line_summary ols
  on o.order_id = ols.order_id
group by 1;

This does not make the query automatically correct.

It makes the grain decision visible.

The left join filter trap

Another common issue is a left join that accidentally becomes an inner join.

This query appears to keep all Brightside orders:

select
  o.order_id,
  o.order_revenue,
  r.refunded_at
from stg_shopify__orders o
left join stg_shopify__refunds r
  on o.order_id = r.order_id
where r.refunded_at >= date '2026-01-01';

But the where clause removes rows where no refund exists. That can turn the result into “orders with refunds after this date”, not “all orders with refund information where available”.

If the business question is about all orders, the filter belongs in the join condition or in a pre-filtered refund model:

select
  o.order_id,
  o.order_revenue,
  r.refunded_at
from stg_shopify__orders o
left join stg_shopify__refunds r
  on o.order_id = r.order_id
  and r.refunded_at >= date '2026-01-01';

This is not just a SQL style issue.

It changes which business events are included in the report.

Reconciliation queries should become controls

Many teams use reconciliation queries only during debugging.

That is a missed opportunity.

If a join can silently duplicate revenue, the check should become part of the reporting path. It should be run repeatedly, not only after an executive complains.

Useful checks include:

  • Does the row count increase after the join?
  • Does the sum of revenue change after adding descriptive fields?
  • Are join keys unique where the model expects uniqueness?
  • Are there source records that fail to match a required dimension?
  • Are there duplicate keys in a dimension table that should be one row per entity?

For example:

with base as (
  select
    count(*) as row_count,
    sum(order_revenue) as revenue
  from stg_shopify__orders
),

joined as (
  select
    count(*) as row_count,
    sum(o.order_revenue) as revenue
  from stg_shopify__orders o
  left join stg_shopify__order_lines l
    on o.order_id = l.order_id
)

select
  base.row_count as base_rows,
  joined.row_count as joined_rows,
  base.revenue as base_revenue,
  joined.revenue as joined_revenue
from base
cross join joined;

If revenue changes after joining a table that should only add product-line context, the model needs inspection.

What to inspect when a dashboard number looks wrong

When a dashboard cannot be reconciled, do not start by blaming the chart.

Start with the path.

Ask:

  • What is the grain of the source table?
  • What is the grain of the final model?
  • Which joins happen between source and report?
  • Are the join keys unique on the expected side?
  • Are filters applied before or after joins?
  • Are aggregations happening before or after fanout risk?
  • Is the dashboard recreating joins that should live upstream?
  • Does the metric have a written definition and a reporting contract?

These questions connect SQL mechanics to source-to-report lineage.

The aim is not to turn every executive into a SQL reviewer. The aim is to make the risk visible enough that the data team can protect important metrics.

What good looks like

For high-value reporting models, good join practice looks like this:

  • The model grain is written down.
  • Join assumptions are tested.
  • Dimensions expected to be unique are checked for uniqueness.
  • Fact tables are aggregated before joining when needed.
  • Dashboard calculations do not recreate fragile warehouse logic.
  • Reconciliation checks compare source totals to final model totals.
  • Known caveats are visible to users.

This is how analytics engineering lowers the Invisible Data Tax.

The business should not need to rediscover join problems during every reporting dispute. The system should catch the common failure modes before they reach the meeting.

That is the practical difference between SQL that runs and reporting logic that can be trusted.

Technical note

Use this as supporting context

This article is part of a low-profile technical series behind the Reporting Trust framework. For the executive-level problem, start with the business-facing articles on mismatched numbers and source-to-report lineage.

Start with the business problem