Video

Why Enterprises Are Moving AI On-Prem | Built to Scale: Secure AI Factory Ep. 1 (The Stack)

play

May 20, 2026

Cloud AI bills are getting out of hand, and most enterprises can’t even tell you where the money went. A single bad pull request merged into production can burn through an entire month’s token budget overnight. That’s not a hypothetical. It’s happening.

In this episode, Rajeev Khanolkar and Matt break down Cisco’s Secure AI Factory: what it is, why on-prem AI is quickly becoming a strategic priority, and how the full stack (UCS + NVIDIA GPUs, HyperFabric AI, AI Defense, and smart networking) gives enterprises the control, security, and cost predictability that cloud alone can’t deliver.

What we get into:

  • Why cloud AI costs are unsustainable (IBM’s 5-year TCO data puts on-prem at roughly 30% of cloud cost)
  • Token governance and how unmanaged spend quietly destroys ROI
  • GPU utilization reality check: enterprise GPUs average just 2% utilization and how to actually fix that
  • How Cisco’s Secure AI Factory wraps security guardrails around the entire AI workload stack
  • Agentic AI and identity: how do you authenticate an agent with no face in a Kubernetes environment?
  • Cisco x NVIDIA GTC expansion: Nexus 9100 switches, Hypershield, Cilium network mesh, and eBPF-level security
  • How Cisco AMs can help enterprise customers move from cloud pilots to on-prem production

Gruve is a Cisco partner specializing in AI infrastructure deployment through its CAISS practice, helping enterprises move from pilot to production using Cisco Validated Designs and AI PODs.

Unlock your
true speed to scale

Accelerate what data and AI can do together.