Skip to content
RiverCore
Dell and Nvidia Bet On-Prem Inference Beats Rented AI Factories
on-prem inferenceagentic AIedge infrastructureon-premises AI inference cost savingsDell Nvidia enterprise AI strategy

Dell and Nvidia Bet On-Prem Inference Beats Rented AI Factories

18 Apr 20266 min readSarah Chen

Two executives, one pitch: the agentic wave is the second ChatGPT moment, and the economics of "always-on inference at scale" are pushing enterprise AI workloads back toward owned infrastructure. That is the core claim Dell's Varun Chhabra and Nvidia's Anne Hecht made on theCUBE this week, and it is a direct challenge to the assumption, dominant for roughly three years, that frontier AI would live permanently in hyperscaler clouds.

The interview itself is light on hard numbers, which is worth flagging upfront. What it gives us instead is a clear read on how two of the largest infrastructure vendors are positioning against the public-cloud-by-default narrative that shaped 2023 through 2025.

The Numbers

The quantitative content of this segment is thin, and that matters. Chhabra and Hecht spoke to theCUBE's John Furrier as part of the AI Factories interview series broadcast from SiliconANGLE's studio, and as SiliconANGLE reported, the framing centers on "agentic" as the dominant enterprise concern, with Chhabra citing "OpenClaw" and Nvidia's "NemoClaw" announcements as the trigger points. Neither executive disclosed deployment counts, revenue attached to the Dell Automation Platform, attach rates on Nvidia-powered Dell SKUs, or token throughput benchmarks on the confidential computing stack that now hosts Google's Gemini model on-premises.

That absence is itself the signal. When vendors pitch a category shift without shipping benchmarks, the honest read is that the thesis is still qualitative. Compare this to the DeepSeek moment Hecht referenced from the prior year, which arrived with published reasoning benchmarks that forced a retooling of cost assumptions across every inference provider. The agentic pitch, by contrast, is being sold on developer sentiment ("everybody's asking us about how to adopt agentic faster than ever") rather than throughput-per-dollar figures.

The one concrete architectural claim with teeth: Gemini now runs on-prem on a Dell server via confidential computing. That is a meaningful departure from the cloud-tenancy default Google has maintained for its frontier models. We do not know from the source which Gemini tier, what the attestation model looks like, or what the performance delta is versus the managed Gemini API. Those gaps matter because the entire "own your AI factory" economic argument collapses if on-prem inference runs materially slower or more expensive per token than the hyperscaler equivalent.

If this positioning is real, we should see Dell disclose at least one named enterprise deployment with token throughput figures within two quarters. If nothing ships by Q4 2026, treat the agentic pitch as marketing overlay on existing ISG hardware cycles.

What's Actually New

Strip away the "ChatGPT moment for agentic" language and three things are genuinely different from the 2024 enterprise AI conversation.

First, the workload profile. Hecht's description of agents that run overnight, generate reports, take actions, and "burn through a bunch of tokens" is not the same shape as the request-response chatbot workload that defined the last two years. Agentic workloads look more like batch jobs with unpredictable fan-out. A single user kickoff can spawn dozens of model calls across multiple agents, and if those agents spawn sub-agents (the "agents that create other agents" pattern Hecht cited), token consumption becomes combinatorially hard to forecast. That breaks the per-seat pricing assumptions that most enterprise AI budgets were built on in 2024 and 2025.

Second, the confidential computing story. Frontier models running on customer-owned silicon with attestation is an architectural shift, not a marketing one. It changes the regulatory calculus for finance, healthcare, and any workload touching PII or trading data. For the iGaming and fintech verticals specifically, confidential computing on-prem is the difference between "we can evaluate this model" and "legal has blocked deployment pending data residency review." The details of the stack matter, and the source does not disclose the TEE implementation, whether it is CPU-based (Intel TDX, AMD SEV-SNP) or GPU-based (Nvidia's H100/Blackwell confidential compute modes), or what the performance overhead is. Historically confidential computing has added 5 to 15 percent overhead on compute-heavy workloads. If that holds here, the TCO math still favors on-prem for high-utilization inference.

Third, the blueprint packaging. Chhabra's reference to the Dell Automation Platform plus Nvidia blueprints is a tacit acknowledgment that "buy the boxes, figure out the software" failed as a go-to-market for enterprise AI. The admission itself is new. Whether the blueprints are more than reference architectures is the question the source doesn't answer.

What's Priced In for AI Development

Most of this is already expected by anyone paying attention to token economics. The shift toward distributed inference across on-prem, edge, and workstation was visible by late 2025 as soon as the first wave of enterprise buyers saw their OpenAI and Anthropic bills after deploying coding assistants at scale. The "rent versus own your AI factory" framing has been the Nvidia and Dell talking point for at least three quarters.

What isn't priced in: the speed at which agentic systems will break consumption-based pricing models. If Hecht's description of overnight autonomous agents becomes the default interaction pattern, the gap between predictable SaaS budgets and actual token burn will force either aggressive caps (which break the product experience) or a capex flip (which favors Dell and Nvidia). The interesting question for platform leads is which one happens first, and whether model vendors like Anthropic adjust their pricing ladders fast enough to keep the managed path competitive. The Anthropic docs already hint at tiered approaches for tool-use and computer-use patterns, but pricing structure hasn't caught up to agent-of-agents topologies.

Also not priced in: governance. Chhabra flagged the tension between productivity and oversight, and this is where most 2026 enterprise deployments will stall. Defining what authority an agent has to take actions, and auditing what it did, is still a solved-on-paper problem. The MCP spec helps on the integration side but doesn't answer the authorization question.

Contrarian View

The consensus read on this interview will be: agentic is the new workload, on-prem is back, Dell and Nvidia are well positioned. I'd argue the contrarian take is harder to dismiss.

Enterprise infrastructure pitches have a recurring pattern. Every two years, a vendor partnership announces that the workload shape has fundamentally changed and that owning the stack is the answer. Hadoop. Private cloud. Edge computing. Each cycle had a legitimate thesis, and each cycle ended with hyperscalers absorbing most of the workload anyway because operational burden beat theoretical TCO for all but the top quintile of enterprises by scale.

The agentic pitch has the same shape. Yes, token economics look painful at current managed-service prices. But the managed vendors have every incentive to cut prices faster than Dell can ship racks, and they have already done so twice in the last eighteen months. Unless confidential computing delivers a regulatory moat that hyperscalers genuinely cannot match (and Google running Gemini on Dell hardware suggests the hyperscalers have noticed), the on-prem AI factory story is more likely to serve the top 200 enterprises than to become the default.

Key Takeaways

  • The agentic-as-ChatGPT-moment framing from Dell and Nvidia is a qualitative pitch with no published throughput or deployment numbers in the source. Demand benchmarks before rewriting your infrastructure roadmap.
  • Gemini running on-prem on Dell via confidential computing is the single most concrete technical claim and the one worth tracking. The TEE implementation and performance overhead are undisclosed and determine whether the economics work.
  • Agentic workloads break per-seat and per-request pricing assumptions. Platform leads in fintech and iGaming should model token burn under agent-spawning-agent topologies before signing multi-year managed contracts.
  • Dell Automation Platform plus Nvidia blueprints are an admission that hardware-only go-to-market failed for enterprise AI. Whether the blueprints are operationally useful or marketing artifacts is the open question.
  • Unknown to watch: if Dell cannot disclose a named enterprise deployment with real token throughput figures by Q4 2026, the agentic infrastructure pitch should be discounted to a standard ISG hardware cycle story.

Frequently Asked Questions

Q: What did Dell and Nvidia actually announce in this interview?

No product launches were announced. Varun Chhabra and Anne Hecht described how agentic AI workloads are changing enterprise infrastructure decisions, referenced OpenClaw and Nvidia's NemoClaw, and highlighted that Google's Gemini model can now run on-premises on a Dell server via confidential computing. The segment was positioning, not a product reveal.

Q: Why does confidential computing matter for running Gemini on-prem?

Confidential computing uses hardware-level trusted execution environments so frontier model weights and customer data stay encrypted even during inference. That lets regulated industries run models like Gemini on owned hardware without exposing either the model IP or the input data, which is the legal blocker that has kept many fintech and healthcare workloads out of managed AI services.

Q: Does "owning your AI factory" actually cost less than renting from a hyperscaler?

It depends entirely on utilization. For high-volume always-on inference workloads like agentic systems running overnight tasks, capex on dedicated infrastructure can stabilize costs versus consumption pricing. For bursty or low-utilization workloads, managed services almost always win on TCO. The source does not publish comparison figures, so enterprises need to model their own token volumes before committing.

SC
Sarah Chen
RiverCore Analyst · Dublin, Ireland
SHARE
// RELATED ARTICLES
HomeSolutionsWorkAboutContact
News06
Dublin, Ireland · EUGMT+1
LinkedIn
🇬🇧EN▾