Technical note

Data Modelling for Reporting Trust

By Gavin Somers · 8 min read · Published 18 June 2026 · Updated 25 June 2026

Data modelling is often treated like a technical design exercise.

Tables. Keys. Schemas. Relationships. Naming conventions. Materialisations.

Those details matter.

But for Reporting Trust, a model is something more practical:

It is a set of business decisions made explicit in data.

When a company models revenue, margin, active customers, pipeline, fulfilment, or retention, it is deciding what counts, what does not count, when an event belongs to a period, which source is authoritative, and which level of detail is safe for the decision.

If those decisions are not made deliberately, they get made accidentally in SQL, spreadsheets, dashboards, and meeting explanations.

That is where the Invisible Data Tax begins.

It is also where the semantic layer starts to matter. A semantic layer can expose controlled metric meaning, but the underlying models still need clear grain, source logic, caveats, and ownership.

The symptom: the warehouse has data, but not answers

Brightside Bikes may have all the raw data it needs:

Shopify orders.
Stripe charges and refunds.
Product and SKU data.
Inventory snapshots.
Ad spend exports.
Customer records.
Finance adjustments.

The data is loaded.

The warehouse is working.

The dashboards render.

But leadership still asks why revenue, margin, and customer counts do not match across reports.

That usually means the company does not have a data availability problem.

It has a modelling problem.

Operational shape is not reporting shape

Source systems store data in the shape needed to run the business process.

Analytical models store data in the shape needed to understand the business process.

Those are different shapes.

A Shopify order record exists so the store can process a transaction. A reporting order fact exists so the business can count, group, reconcile, and explain orders consistently.

The source record may include fields that are perfect for operations but awkward for reporting:

Current fulfilment state.
Payment status.
Discount applications.
Shipping lines.
Refund objects.
Customer references.
Test flags.
Timestamps in different zones.

A reporting model needs to turn that operational detail into a controlled analytical event.

For example:

select
  order_id,
  customer_id,
  date_trunc('day', order_created_at) as order_date,
  date_trunc('week', order_created_at) as order_week,
  net_revenue_amount,
  refund_amount,
  discount_amount,
  tax_amount,
  shipping_amount,
  is_test_order,
  is_cancelled
from stg_shopify_orders
where is_test_order = false;

This is still not the final business model.

But it begins to separate operational mess from reporting intent.

Three levels of modelling decisions

A useful model usually contains three kinds of decisions.

First, conceptual decisions.

These define the business idea. What is an order? What is a customer? What is active? What is revenue? What is margin?

Second, logical decisions.

These define how the business idea relates to data. Which source systems contribute? What is the grain? What keys connect the entities? What timing rule applies?

Third, physical decisions.

These define how the model is implemented. Is it a table, view, incremental model, snapshot, aggregate, or semantic model? How is it partitioned? How often does it refresh?

Reporting disputes often happen because teams jump straight to the physical layer.

They ask whether a table exists before agreeing what the table means.

Model around decisions, not just sources

If Brightside Bikes models only around sources, it may produce tables like:

shopify_orders
stripe_charges
stripe_refunds
google_ads_campaigns
inventory_file

Those are useful staging inputs.

They are not enough for executive reporting.

Leadership asks questions like:

What was net revenue last week?
Which product categories drove margin?
Did paid media acquire profitable customers?
Which stock constraints limited sales?
How much of the Finance variance is timing versus logic?

Those questions need business models:

fct_orders
fct_refunds
dim_customer
dim_sku
mart_weekly_revenue_bridge
mart_weekly_operating_report

The model names should reflect the reporting job, not only the source system.

Grain is the first trust decision

Every model needs a grain.

The grain answers: one row represents what?

For Brightside Bikes:

One row per Shopify order.
One row per order line.
One row per customer per week.
One row per SKU per warehouse snapshot.
One row per metric per reporting week.

If the grain is not explicit, every join and aggregation becomes risky.

A model can have correct source data and still produce wrong reporting if users aggregate it at the wrong level.

That is why grain should be documented near the model, checked in tests, and visible in review.

A model turns a definition into a controlled asset

Suppose Brightside Bikes defines Weekly Net Revenue as completed Shopify order revenue, net of discounts and refunds, excluding test orders, grouped by order week.

That definition should not live only in a dashboard field.

It should become a controlled model:

with orders as (
  select *
  from fct_orders
  where is_test_order = false
    and is_cancelled = false
),

weekly_revenue as (
  select
    order_week as reporting_week,
    sum(net_revenue_amount) as shopify_net_revenue,
    sum(refund_amount) as refund_amount,
    count(distinct order_id) as completed_orders
  from orders
  group by 1
)

select *
from weekly_revenue;

This model is not just a query.

It is a business rule in executable form.

That means it needs ownership, tests, lineage, and review.

If the metric will be reused through dashboards, self-serve analytics, or AI tools, it also needs to pass the Semantic Layer Gate: the point where transformed data is checked for trusted meaning before interpretation.

Modeling choices create commercial consequences

Modeling can feel abstract until the number reaches a meeting.

Then the choices become commercial.

If returns are assigned to refund date rather than order date, the weekly trend changes.

If customers are counted at email level rather than account level, retention changes.

If product cost is joined at the wrong SKU grain, margin changes.

If ad spend is allocated by campaign date but revenue is grouped by order date, CAC changes.

These are not minor technical preferences.

They shape the business story.

Reporting Trust means those choices are visible before they become arguments.

What to inspect first

For any important model, inspect:

What business question does this model support?
What is the grain?
Which source systems feed it?
Which definitions are applied?
Which records are excluded?
Which timing rule is used?
Which downstream dashboards depend on it?
What tests prove the model is safe?
Who owns changes to the model?

If the team cannot answer those questions quickly, the model may be technically useful but commercially fragile.

What good looks like

Good data modelling does not require perfect architecture.

It requires explicit choices.

For Brightside Bikes, a good reporting model makes it clear that Shopify Net Revenue, Finance Revenue, and Campaign Revenue are different views of the business. It gives each one a defined grain, source path, caveat, and owner.

The business can then disagree productively.

Instead of asking, “Which number is right?”, the team can ask, “Which definition is appropriate for this decision?”

That is the purpose of modelling for Reporting Trust.

Not to create elegant tables for their own sake.

To make business logic inspectable, repeatable, and safe enough to use.

For the broader AI and semantic-layer context, read why AI needs a semantic layer and Reporting Trust vs Semantic Layer Trust.

Data Modelling for Reporting Trust

The symptom: the warehouse has data, but not answers

Operational shape is not reporting shape

Three levels of modelling decisions

Model around decisions, not just sources

Grain is the first trust decision

A model turns a definition into a controlled asset

Modeling choices create commercial consequences

What to inspect first

What good looks like

Useful terms for this article

Data Value Chain

Semantic Layer

Source-to-Report Lineage

Keep reading

How to Model JSON Event Data With SQL

How to Use AI in Analytics Engineering Without Damaging the Repo

Use this as supporting context