Technical note
Data Modeling for Reporting Trust
Data modeling is often treated like a technical design exercise.
Tables. Keys. Schemas. Relationships. Naming conventions. Materializations.
Those details matter.
But for Reporting Trust, a model is something more practical:
It is a set of business decisions made explicit in data.
When a company models revenue, margin, active customers, pipeline, fulfilment, or retention, it is deciding what counts, what does not count, when an event belongs to a period, which source is authoritative, and which level of detail is safe for the decision.
If those decisions are not made deliberately, they get made accidentally in SQL, spreadsheets, dashboards, and meeting explanations.
That is where the Invisible Data Tax begins.
The symptom: the warehouse has data, but not answers
Brightside Bikes may have all the raw data it needs:
- Shopify orders.
- Stripe charges and refunds.
- Product and SKU data.
- Inventory snapshots.
- Ad spend exports.
- Customer records.
- Finance adjustments.
The data is loaded.
The warehouse is working.
The dashboards render.
But leadership still asks why revenue, margin, and customer counts do not match across reports.
That usually means the company does not have a data availability problem.
It has a modeling problem.
Operational shape is not reporting shape
Source systems store data in the shape needed to run the business process.
Analytical models store data in the shape needed to understand the business process.
Those are different shapes.
A Shopify order record exists so the store can process a transaction. A reporting order fact exists so the business can count, group, reconcile, and explain orders consistently.
The source record may include fields that are perfect for operations but awkward for reporting:
- Current fulfilment state.
- Payment status.
- Discount applications.
- Shipping lines.
- Refund objects.
- Customer references.
- Test flags.
- Timestamps in different zones.
A reporting model needs to turn that operational detail into a controlled analytical event.
For example:
select
order_id,
customer_id,
date_trunc('day', order_created_at) as order_date,
date_trunc('week', order_created_at) as order_week,
net_revenue_amount,
refund_amount,
discount_amount,
tax_amount,
shipping_amount,
is_test_order,
is_cancelled
from stg_shopify_orders
where is_test_order = false;
This is still not the final business model.
But it begins to separate operational mess from reporting intent.
Three levels of modeling decisions
A useful model usually contains three kinds of decisions.
First, conceptual decisions.
These define the business idea. What is an order? What is a customer? What is active? What is revenue? What is margin?
Second, logical decisions.
These define how the business idea relates to data. Which source systems contribute? What is the grain? What keys connect the entities? What timing rule applies?
Third, physical decisions.
These define how the model is implemented. Is it a table, view, incremental model, snapshot, aggregate, or semantic model? How is it partitioned? How often does it refresh?
Reporting disputes often happen because teams jump straight to the physical layer.
They ask whether a table exists before agreeing what the table means.
Model around decisions, not just sources
If Brightside Bikes models only around sources, it may produce tables like:
shopify_ordersstripe_chargesstripe_refundsgoogle_ads_campaignsinventory_file
Those are useful staging inputs.
They are not enough for executive reporting.
Leadership asks questions like:
- What was net revenue last week?
- Which product categories drove margin?
- Did paid media acquire profitable customers?
- Which stock constraints limited sales?
- How much of the Finance variance is timing versus logic?
Those questions need business models:
fct_ordersfct_refundsdim_customerdim_skumart_weekly_revenue_bridgemart_weekly_operating_report
The model names should reflect the reporting job, not only the source system.
Grain is the first trust decision
Every model needs a grain.
The grain answers: one row represents what?
For Brightside Bikes:
- One row per Shopify order.
- One row per order line.
- One row per customer per week.
- One row per SKU per warehouse snapshot.
- One row per metric per reporting week.
If the grain is not explicit, every join and aggregation becomes risky.
A model can have correct source data and still produce wrong reporting if users aggregate it at the wrong level.
That is why grain should be documented near the model, checked in tests, and visible in review.
A model turns a definition into a controlled asset
Suppose Brightside Bikes defines Weekly Net Revenue as completed Shopify order revenue, net of discounts and refunds, excluding test orders, grouped by order week.
That definition should not live only in a dashboard field.
It should become a controlled model:
with orders as (
select *
from fct_orders
where is_test_order = false
and is_cancelled = false
),
weekly_revenue as (
select
order_week as reporting_week,
sum(net_revenue_amount) as shopify_net_revenue,
sum(refund_amount) as refund_amount,
count(distinct order_id) as completed_orders
from orders
group by 1
)
select *
from weekly_revenue;
This model is not just a query.
It is a business rule in executable form.
That means it needs ownership, tests, lineage, and review.
Modeling choices create commercial consequences
Modeling can feel abstract until the number reaches a meeting.
Then the choices become commercial.
If returns are assigned to refund date rather than order date, the weekly trend changes.
If customers are counted at email level rather than account level, retention changes.
If product cost is joined at the wrong SKU grain, margin changes.
If ad spend is allocated by campaign date but revenue is grouped by order date, CAC changes.
These are not minor technical preferences.
They shape the business story.
Reporting Trust means those choices are visible before they become arguments.
What to inspect first
For any important model, inspect:
- What business question does this model support?
- What is the grain?
- Which source systems feed it?
- Which definitions are applied?
- Which records are excluded?
- Which timing rule is used?
- Which downstream dashboards depend on it?
- What tests prove the model is safe?
- Who owns changes to the model?
If the team cannot answer those questions quickly, the model may be technically useful but commercially fragile.
What good looks like
Good data modeling does not require perfect architecture.
It requires explicit choices.
For Brightside Bikes, a good reporting model makes it clear that Shopify Net Revenue, Finance Revenue, and Campaign Revenue are different views of the business. It gives each one a defined grain, source path, caveat, and owner.
The business can then disagree productively.
Instead of asking, “Which number is right?”, the team can ask, “Which definition is appropriate for this decision?”
That is the purpose of modeling for Reporting Trust.
Not to create elegant tables for their own sake.
To make business logic inspectable, repeatable, and safe enough to use.