Skip to content
RiverCore
Back to articles→ANALYTICS
Your Warehouse Isn't Your CDP: The 50x Compute Bill Nobody Saw
composable CDPwarehouse computeaudience refreshcomposable CDP hidden compute costswarehouse compute bill audience refresh

Your Warehouse Isn't Your CDP: The 50x Compute Bill Nobody Saw

3 May 20267 min readJames O'Brien

Think of your data warehouse like the main reservoir behind a city: enormous, well-governed, perfect for keeping the supply honest. Your CDP is the pressurised mains under the streets. They're related, they're connected, but if you try to turn the reservoir itself into the plumbing in every kitchen, the pipes burst and the water bill becomes someone else's problem. That's the conversation a lot of platform leads are having right now, and the numbers are starting to land.

The headline figure: pushing audience refreshes from daily to hourly can drive warehouse compute up 25x, and going near real-time pushes it 50x or more. Those costs don't appear on the CDP invoice. They show up on the warehouse bill, which is a different team, a different budget, and usually a different argument.

The Numbers

The 25x and 50x figures are the part that should make any CTO sit up straight. As Oracle Blogs laid out on April 30, 2026, those multipliers describe what happens when an organisation tries to turn a zero-copy or composable CDP into the operational layer for real-time marketing. Daily to hourly is 25x. Hourly to near real-time is 50x or more. And critically, that compute hits the warehouse line item, not the CDP vendor's invoice.

Anyone who has explained a surprise Snowflake or BigQuery bill to a CFO knows how this conversation goes. The marketing team wanted faster triggers. The data team built more frequent refreshes. The warehouse spun up more virtual warehouses, more concurrency, more autosuspend cycles that never got to suspend. Six months later, finance is asking why analytics costs doubled when "we didn't add any new dashboards".

The piece, written by Jake Spencer, Senior Outbound Product Manager at Oracle Fusion Marketing, frames the past 18 months in MarTech and CX as a period of significant change, with AI agent strategies, cost optimisation drives, and old questions about data ownership all colliding. Two technologies keep ending up in the middle of those conversations: the data lake or warehouse, and the CDP. Oracle itself, the article notes, integrates Oracle Unity Data Platform with the Oracle Autonomous AI Lakehouse to unify, manage, and activate enterprise-wide first-party data, so the vendor isn't claiming the warehouse is irrelevant. Quite the opposite.

What the numbers really tell you is that the unit economics of "just query the warehouse" are non-linear. Latency requirements compress and compute costs expand on something close to an exponential curve. If you've ever read the Snowflake docs closely on warehouse sizing and concurrency scaling, none of this is surprising in theory. In practice, most teams price the architecture against a daily refresh assumption, then quietly let product managers ask for hourly, then real-time, without ever rerunning the numbers.

What's Actually New

The composable CDP debate has been doing the rounds for two or three years now. What's genuinely different this cycle is the rise of embedded AI agents in marketing and sales workflows, and the latency profile they demand. The article is explicit: these agents need sub-second access to precomputed profiles. That's not a "tune your warehouse" problem. That's a "you need a different system" problem.

Here's the guts of it. Analytical queries and operational lookups are different shapes of work. A warehouse, even a fast one, is optimised for scanning lots of rows to answer a question. An activation layer needs to grab one profile, fully resolved, in single-digit milliseconds, while ten thousand other agents are doing the same thing concurrently. You can build that on top of a warehouse. You'll just pay for it twice: once in compute, once in engineering hours keeping it from falling over.

The B2B angle is also sharper than it used to be. The article points out the obvious-but-frequently-ignored truth that in B2B, the "customer" is a group, not an individual. That means account hierarchies, buying group mapping, multi-contact engagement tracking, and sales and marketing alignment, all of which want persistent data structures, not ad hoc joins against raw tables. If you've ever tried to model a buying group in pure SQL on top of a Bronze layer, you know it's the part where it all falls over.

The other genuinely new bit is the explicit acknowledgement that there are four integration methods, not one. Zero-copy or federated query. Batch ingestion. Real-time streaming. Precomputed profile persistence. The argument isn't "pick the right one". It's "you'll need all four, matched to specific use cases". Batch ingestion is fine when freshness is measured in hours. Real-time streaming is essential for sub-second use cases. Treating any single pattern as a complete CDP strategy is where teams optimise for data location instead of business outcomes.

What's Priced In for Data Teams

Most senior data engineers already know zero-copy is great for governance and compliance. That part is priced in. Keeping data inside the warehouse means existing controls (access, encryption, residency) apply automatically, and you stop arguing about which system holds the canonical customer record. Reduced duplication, lower storage costs, and the ability to activate curated data without replicating it: all real, all welcome.

What's not priced in, in my experience, is the gap between "we can query the warehouse from the activation layer" and "we should query the warehouse from the activation layer, every time, for every decision, in production". The boring bit nobody wants to model is concurrency. One pricing-page trigger is cheap. Ten thousand concurrent triggers across a Black Friday campaign is a different beast entirely, and the warehouse autoscaling behaviour under that load is where budgets go to die.

Identity resolution at scale is the other under-appreciated cost. Stitching identities across devices and channels on demand via warehouse queries is compute-intensive in a way that compounds with every additional source. Pre-resolved identity graphs sitting in a purpose-built store are dramatically cheaper per lookup, and the article makes the point cleanly. Teams using dbt to materialise resolution logic into wide profile tables on a schedule are already halfway to the right answer; they just need somewhere fast to serve those tables from at activation time.

Contrarian View

Now the other side of the bridge. There's a real argument that for a meaningful chunk of organisations, the zero-copy-only approach is fine, and the 50x compute warning is a vendor-flavoured scare story. If your activation scenarios are mostly hourly email sends, weekly lookalike audience builds, and quarterly campaign analysis, you do not need sub-second precomputed profiles. You need a clean warehouse, well-modelled marts, and a thin activation layer that pulls segments out on a schedule.

For those teams, adding a separate CDP with its own profile store is duplication for the sake of an architecture diagram. The warehouse compute bill at hourly refresh, on a properly sized cluster with reasonable caching, is not 25x of daily in practice for every workload. It's 25x for the worst-case interpretation. A team that knows what it's doing with materialisation, incremental models, and result caching can flatten that curve significantly.

The honest read is that the answer depends almost entirely on your activation scenarios. Which is exactly why the article's recommendation to define five to ten concrete activation scenarios before evaluating CDP architecture is the most useful sentence in it. Real-time triggers, buying group engagement, predictive lead scoring, campaign orchestration, sales alerts: write them down, attach a latency requirement and a volume estimate to each, and the architecture mostly designs itself.

Key Takeaways

  • Daily to hourly audience refreshes can push warehouse compute 25x; near real-time can push it 50x or more, and the bill lands on the warehouse line item, not the CDP invoice.
  • Zero-copy architectures genuinely win on governance, compliance, reduced duplication, and activating existing investments, but break down for sub-second AI agent workloads and identity resolution at scale.
  • There are four integration methods worth knowing (zero-copy or federated query, batch ingestion, real-time streaming, precomputed profile persistence), and mature architectures use all of them, matched to use cases.
  • B2B activation needs persistent structures for account hierarchies, buying groups, and multi-contact engagement, not ad hoc queries against raw tables.
  • Define five to ten activation scenarios with explicit latency and volume requirements before picking architecture; the right pattern falls out of the requirements, not the other way round.

Back to the reservoir. Nobody sensible argues you don't need the reservoir. The argument is whether you also need the pressurised mains, and the answer for most enterprise customer data strategies in 2026 is yes, you do. The warehouse and the CDP are not competing for the same job. They're two parts of the same water system, and the engineering decision is which pipe carries which load. Get that wrong and the bill arrives in a place nobody was watching.

Frequently Asked Questions

Q: Why does moving from daily to hourly audience refreshes increase warehouse compute costs by 25x?

Each refresh spins up warehouse capacity for audience builds, segment recalculation, and profile lookups. Increasing frequency multiplies the number of compute cycles, and concurrency requirements compound the effect. The relationship between latency targets and compute cost is non-linear, which is why pushing further toward real-time can drive costs up 50x or more.

Q: When does a zero-copy or composable CDP architecture actually make sense?

Zero-copy works well for governance-sensitive reference data, enrichment attributes that change infrequently, analytical workloads, and cases where the warehouse already serves as the operational system of record. It struggles with sub-second AI agent lookups, real-time triggers, identity resolution at scale, and B2B buying group modelling, where precomputed and persistent profile stores perform better.

Q: How should engineering teams decide between batch ingestion, streaming, and zero-copy for CDP integration?

Match the method to the use case. Batch ingestion suits large historical loads where freshness is measured in hours. Real-time streaming is essential for sub-second behavioural use cases. Zero-copy fits governance and reference scenarios. The recommended starting point is to define five to ten concrete activation scenarios with latency and volume requirements, then choose patterns per scenario rather than committing to one method across the whole stack.

JO
James O'Brien
RiverCore Analyst · Dublin, Ireland
SHARE
// RELATED ARTICLES
HomeSolutionsWorkAboutContact
News06
Dublin, Ireland · EUGMT+1
LinkedIn
🇬🇧EN▾