Skip to content
RiverCore
Back to articles→ANALYTICS
Databricks Bets $0 on ETL: LTAP Collapses OLAP and OLTP
Databricks LTAPlakehouseETL pipelinesDatabricks OLAP OLTP unified storagelakehouse real-time analytics speedup

Databricks Bets $0 on ETL: LTAP Collapses OLAP and OLTP

19 Jun 20267 min readSarah Chen

Databricks walked into its San Francisco Data + AI Summit, an event it now sizes at more than 30,000 in-person attendees, and announced three things that, taken together, are an attempt to delete an entire category of data infrastructure spend: ETL pipelines, specialized real-time serving stacks, and standalone customer data platforms. The headline number is 16x. That is the upper-bound speedup customers report from the new Lakehouse//RT engine versus their existing specialized real-time serving stacks, on Databricks' own framing. The headline architectural claim is LTAP, which collapses transactional and analytical processing into one governed copy of data.

What Happened

Three launches landed in the same keynote. First, LTAP (Lake Transactional/Analytical Processing), a new architecture that puts Lakebase, described as serverless Postgres on open object storage, under the same Unity Catalog governance and storage layer as the Lakehouse. As Blocks & Files reported, LTAP stores data directly in Unity Catalog using open formats, so it works with any Postgres-speaking application on the write side and any Iceberg or Delta reader on the analytical side. LTAP is "coming soon" as part of Lakebase.

Second, Lakehouse//RT, the real-time tier, powered by a new compute engine called Reyden. Databricks claims millisecond query latency on governed Delta Lake and Apache Iceberg tables, with support for tens of thousands of concurrent users and agents. Customers report response times as low as 10ms on smaller datasets, sub-100ms on larger ones, and sub-100ms at 12,000 queries per second on standard analytical benchmarks. Lakehouse//RT is in Beta.

Third, CustomerLake, an agentic Customer Data Platform built natively on the Lakehouse, currently in Private Preview with HP, Circle K, AB InBev, and Getnet by Santander as named customers. It is Databricks' second push into a defined enterprise software market after Lakewatch in security, and it will be priced on a consumption model rather than a traditional software license.

CEO Ali Ghodsi framed the whole bundle in agent terms: "Organizations effectively doubled their workforce, just not with humans. Agents write code, make calls, and run loops at a pace human teams never could." His argument is that infrastructure built for human-paced analytics is now the bottleneck. Lakebase, by way of context, launched only last year and already reports thousands of customers including Block, Ensemble, Superhuman, and Zillow, plus 12 million database launches per day.

Technical Anatomy

The interesting engineering claim in LTAP is not "we did HTAP." Plenty of vendors have tried single-engine OLTP+OLAP, with mixed results on isolation and concurrency. Databricks is doing something narrower and arguably more honest: keep two execution engines (Postgres for transactions, the Lakehouse stack for analytics), but force them to read and write a single physical copy of data in Unity Catalog using open table formats. No CDC pipeline, no ETL job, no second-copy lag.

If that holds up under load, it changes the cost model in two places. One, the storage bill stops doubling for every dataset that needs both transactional and analytical access. Two, the freshness SLA on analytics goes from "minutes behind production" to "is production." Compare that against the prevailing pattern: Postgres or MySQL feeding Debezium or Fivetran into Delta or Snowflake, with dbt transformations downstream. That stack works, but it's three vendors and a perpetual reconciliation problem.

Reyden, the engine behind Lakehouse//RT, is the bigger technical question mark. The pitch is millisecond latency on open formats at 12,000 QPS, directly against Delta and Iceberg tables, with no proprietary format and no separate serving layer. That puts it in conversation with ClickHouse, Pinot, and Druid, which have spent years optimizing for exactly that latency profile. The source does not disclose the benchmark dataset size, the row width, the predicate selectivity, or the cache state, which matters because "sub-100ms at 12,000 QPS" can mean anything from "trivial point lookups on a warm cache" to "complex aggregations on cold storage." Until Databricks publishes a reproducible TPC-H or similar run with these numbers, the 16x customer figure is directional, not a benchmark.

The Lakebase additions are the underrated part: cross-cloud, cross-region disaster recovery, Git-style branching and snapshots, and autonomous database operations where AI agents monitor health, propose indexes, and assist with recovery. Git-style branching against production data is the feature engineering teams will actually use on day one.

Who Gets Burned

The most exposed category is specialized real-time analytics vendors. If Lakehouse//RT delivers anything close to the 16x figure against existing serving stacks, the procurement question for a Databricks shop becomes uncomfortable: why pay for a second engine, a second permissions model, and a CDC pipeline to feed it? ClickHouse, Pinot, Rockset-style serving layers, and the various "real-time OLAP" startups all have to articulate what they do that Reyden does not. We do not know yet how Reyden behaves at high cardinality joins or on workloads with heavy mutation, and the bound there matters: if it degrades at the kind of point-lookup-heavy workload that powers product analytics dashboards, the 16x claim collapses to a narrow segment.

The next category is standalone CDP vendors. CustomerLake is a direct shot at Segment, mParticle, Treasure Data, and the activation layer of Adobe and Salesforce. The "infinity campaigns" framing, continuous agentic loops that personalize 1:1 a billion times a day, is marketing language, but the architectural argument under it is solid: if customer data already lives in the Lakehouse, bolting on a CDP that copies it back out is silly. Enterprises like AB InBev and HP being named as customers signals Databricks is going after CDP budgets at the Global 2000, not the mid-market.

iGaming and fintech teams should watch this closely. Both verticals run hot OLTP (bets, transactions, KYC events) feeding hungry analytics (fraud models, risk dashboards, personalization). The current pattern is Postgres or Aurora plus a streaming pipe to a warehouse. LTAP, if it delivers on the isolation promise, removes the streaming pipe. The unknown is concurrency under regulated-workload audit constraints: the source does not say how Unity Catalog audit logs scale at sustained transactional write rates, and that's the question a fintech CISO will ask first.

Playbook for Data Teams

This week, three concrete moves. First, audit how many copies of your hottest customer or transaction tables exist across your stack. Count the OLTP primary, every replica, every CDC sink, every warehouse copy, and every reverse-ETL endpoint. That number is your LTAP business case. If it is four or higher, the storage and reconciliation savings alone justify a pilot.

Second, do not migrate production to Lakehouse//RT Beta or LTAP "coming soon." Build a parallel measurement harness instead. Pick one real-time dashboard or one customer-facing feature flag service, mirror the workload, and measure p50, p95, and p99 latency on Reyden against your current serving layer with your data, not Databricks' benchmark data. The 16x is a customer-reported ceiling. Your number will be different and is the only one that matters.

Third, if you operate a CDP today, ask your account team what the migration path looks like before CustomerLake exits Private Preview. Pricing is consumption-based rather than per-seat licensed, which makes TCO modeling non-trivial. Get a usage estimate based on your actual event volume before the GA pricing sheet locks in.

If LTAP and Reyden hold up at the claimed performance, we should see at least one major specialized real-time analytics vendor announce a Databricks-native integration or a competitive pricing reset within the next two quarters. If neither happens by year-end 2026, that is the signal the technical claims are narrower than the keynote suggested.

Key Takeaways

  • LTAP's real innovation is one physical copy of data under Unity Catalog, with Postgres and Lakehouse engines reading the same storage. No CDC, no ETL, no second copy.
  • Lakehouse//RT claims sub-100ms at 12,000 QPS and up to 16x speedup versus existing real-time serving stacks, but the benchmark dataset, query mix, and cache state are not disclosed.
  • Lakebase grew from launch to thousands of customers and 12 million database launches per day in one year, including Block, Superhuman, and Zillow.
  • CustomerLake (Private Preview, HP, Circle K, AB InBev, Getnet) is Databricks' second vertical software push after Lakewatch, priced on consumption rather than seats.
  • Testable prediction: if Reyden's claims hold, expect a Databricks-native integration or pricing reset from at least one specialized real-time analytics vendor within two quarters.

Frequently Asked Questions

Q: What is LTAP and how is it different from HTAP?

LTAP (Lake Transactional/Analytical Processing) is Databricks' architecture that runs Postgres-based transactional workloads (via Lakebase) and Lakehouse analytical workloads against a single physical copy of data stored in Unity Catalog using open formats like Delta and Iceberg. Traditional HTAP systems use a single engine for both workloads. LTAP keeps two engines but unifies the storage and governance layer, which avoids the isolation and performance trade-offs single-engine HTAP typically hits.

Q: How fast is Lakehouse//RT compared to ClickHouse or Pinot?

Databricks reports sub-100ms latency at 12,000 queries per second on standard analytical benchmarks, with customers seeing up to 16x better performance than their existing specialized real-time serving stacks. The source does not disclose dataset size or query mix, so direct comparison to ClickHouse or Pinot requires running your own workload against Lakehouse//RT Beta. The relevant comparison is your p95 on your data, not the keynote number.

Q: Should I migrate off my current CDP to CustomerLake?

Not yet. CustomerLake is in Private Preview with a small set of named customers (HP, Circle K, AB InBev, Getnet by Santander) and uses a consumption-based pricing model rather than a traditional license. Wait for general availability and a published pricing sheet, then model TCO against your current Segment, mParticle, or Adobe spend using your actual event volume before committing.

SC
Sarah Chen
RiverCore Analyst · Dublin, Ireland
SHARE
// RELATED ARTICLES
HomeSolutionsWorkAboutContact
News06
Dublin, Ireland · EUGMT+1
LinkedIn
🇬🇧EN▾