Technical note

Star Schema, One Big Table, Data Vault: How to Choose Without Dogma

  • Analytics Engineering
  • Data Modeling
  • Star Schema
  • Reporting Trust

Data modeling debates can become strangely religious.

Star schema. One big table. Data Vault. Normalised models. Wide marts. Semantic layers.

Each pattern has serious practitioners and real use cases.

But the useful question is not, “Which pattern is best?”

The useful question is, “Which pattern makes this reporting path more trustworthy for this business decision?”

For Reporting Trust, modeling patterns are not ideology.

They are tradeoffs.

The symptom: the architecture debate replaces the reporting problem

At Brightside Bikes, the leadership team does not care whether the weekly operating report is powered by a star schema, one big table, or a vault-inspired integration layer.

They care whether the numbers are trusted.

They want to know:

  • Why Shopify Net Revenue differs from Finance Revenue.
  • Which product categories are driving margin.
  • Whether paid media is acquiring profitable customers.
  • Whether inventory constraints are suppressing sales.
  • Whether the board pack can be reconciled to operational reporting.

If the modeling debate does not improve those answers, it is a distraction.

The architecture should serve the reporting contract.

Star schema: strong for repeated business questions

A star schema separates facts from dimensions.

The fact table records the event. Dimension tables describe the event.

For Brightside Bikes, a simple star might include:

  • fct_order_lines
  • dim_customer
  • dim_sku
  • dim_calendar
  • dim_channel

This pattern works well when the business repeatedly asks similar questions across stable dimensions.

For example:

select
  c.reporting_week,
  s.product_category,
  ch.channel_group,
  sum(ol.net_revenue_amount) as net_revenue,
  sum(ol.quantity) as units_sold
from fct_order_lines ol
left join dim_calendar c
  on ol.order_date = c.calendar_date
left join dim_sku s
  on ol.sku_id = s.sku_id
left join dim_channel ch
  on ol.channel_id = ch.channel_id
group by 1, 2, 3;

The benefit is clarity.

Users can understand the event and the context separately. Tests can protect keys and grain. Dimensions can be reused across reports.

The risk is that star schemas require discipline.

If dimensions are poorly maintained, keys are duplicated, or the fact grain is unclear, the star does not protect trust by itself.

One big table: strong for usability, risky for ambiguity

One big table means creating a wide model that contains many of the fields a dashboard or analyst needs in a single place.

For Brightside Bikes, that might be:

  • One row per order line.
  • Revenue, cost, margin, customer segment, product category, channel, and week all included.

This can be useful.

It reduces join mistakes for downstream users. It can make BI dashboards faster. It gives analysts one obvious place to start.

Example:

select
  order_line_id,
  order_id,
  customer_id,
  order_week,
  product_category,
  channel_group,
  customer_lifecycle_stage,
  net_revenue_amount,
  standard_unit_cost,
  contribution_margin_amount
from mart_order_line_commercials;

The benefit is accessibility.

The risk is hidden complexity.

If the wide table mixes grains, includes stale dimension attributes, or silently bakes in multiple definitions, users may trust it because it is convenient rather than because it is well governed.

One big table can be a good consumption layer.

It should not become a dumping ground for unresolved modeling decisions.

Data Vault: strong for auditability and change, heavy for direct reporting

Data Vault patterns separate business keys, relationships, and descriptive history.

The approach can be useful when source systems change often, auditability is critical, or the business needs to preserve historical relationships carefully.

For Brightside Bikes, a vault-inspired integration layer might track:

  • Customer business keys across Shopify, HubSpot, and support systems.
  • Product keys across Shopify and inventory files.
  • Order relationships across Shopify, Stripe, and fulfilment.
  • Historical changes to customer or SKU attributes.

The benefit is traceability.

The risk is complexity.

Data Vault structures are usually not friendly for direct executive reporting. They often need downstream business marts, star schemas, or wide tables before users can consume them confidently.

That does not make the pattern wrong.

It means the pattern solves a different problem.

The decision is about trust risk

The right modeling choice depends on what can go wrong.

PatternUseful whenWatch out for
Star schemaRepeated reporting across stable facts and dimensionsPoorly defined grain or duplicate dimension keys
One big tableDashboard usability and fewer downstream joins matterHidden business rules and mixed grains
Data VaultAuditability, source change, and historical traceability matterToo much complexity for direct reporting users

The best architecture may use more than one pattern.

Brightside Bikes might use staging models for raw source cleanup, a vault-like integration layer for customer identity history, star schemas for core order reporting, and wide marts for specific dashboards.

That is not inconsistency.

It is architecture serving different jobs.

Model layers can have different audiences

One mistake is expecting one model shape to satisfy every user.

Engineers may need traceability.

Analysts may need flexible facts and dimensions.

Executives may need stable marts.

Finance may need reconciliation and audit trails.

The same source data can support multiple layers:

  • Raw and staging models preserve source meaning.
  • Integration models resolve keys and relationships.
  • Business models define facts, dimensions, and approved logic.
  • Marts present decision-ready views.

The important thing is that the path between layers is clear.

Without lineage, multiple model shapes can become another source of confusion.

Brightside Bikes: choosing by reporting question

For weekly trading performance, Brightside Bikes may use a star schema.

The business needs to slice order lines by product, channel, week, and customer segment. Facts and dimensions fit that job well.

For the executive operating dashboard, the team may publish a wide mart.

The dashboard should not have to join six models every time someone opens it. It needs approved metrics at the right grain.

For customer identity resolution, the team may need a more historical integration pattern.

Shopify, HubSpot, and support records may identify the same customer differently. A simple dashboard table may hide those changes.

The correct answer is not one pattern everywhere.

The correct answer is controlled translation between patterns.

SQL should reveal the modeling intent

Good model SQL should make the intended pattern obvious.

For a star-style mart:

select
  ol.order_line_id,
  ol.order_id,
  ol.customer_id,
  ol.sku_id,
  ol.channel_id,
  ol.order_date,
  ol.net_revenue_amount,
  ol.quantity
from fct_order_lines ol;

For a dashboard-ready wide mart:

select
  ol.order_week,
  s.product_category,
  ch.channel_group,
  count(distinct ol.order_id) as orders,
  sum(ol.net_revenue_amount) as net_revenue,
  sum(ol.contribution_margin_amount) as contribution_margin
from fct_order_lines ol
left join dim_sku s
  on ol.sku_id = s.sku_id
left join dim_channel ch
  on ol.channel_id = ch.channel_id
group by 1, 2, 3;

Both can be valid.

They serve different users.

What to inspect first

Before choosing a modeling pattern, inspect:

  • Who will use the model?
  • What decision does it support?
  • How stable are the source systems?
  • How often do definitions change?
  • Does the business need audit history?
  • Is the main risk usability, traceability, performance, or correctness?
  • Where should joins happen?
  • What model grain does the dashboard need?
  • How will tests protect keys and relationships?

The answer should come from the reporting risk, not personal preference.

What good looks like

Good architecture does not mean one pattern wins every argument.

It means each layer has a clear job.

For Brightside Bikes:

  • Source-aligned models preserve operational facts.
  • Integration models resolve identities and relationships.
  • Star schemas make core business processes reusable.
  • Wide marts make priority dashboards fast and clear.
  • Reconciliation models explain differences between legitimate definitions.

That is how modeling patterns reduce the Invisible Data Tax.

They stop teams from rebuilding meaning in every dashboard and meeting.

The goal is not a perfect schema.

The goal is a reporting path the business can inspect, trust, and maintain.

Technical note

Use this as supporting context

This article is part of a low-profile technical series behind the Reporting Trust framework. For the executive-level problem, start with the business-facing articles on mismatched numbers and source-to-report lineage.

Start with the business problem