Video

Why Enterprises Are Moving AI On-Prem | Built to Scale: Secure AI Factory Ep. 1 (The Stack)

Cloud AI bills are getting out of hand, and most enterprises can’t even tell you where the money went. A single bad pull request merged into production can burn through an entire month’s token budget overnight. That’s not a hypothetical. It’s happening.

In this episode, Rajeev Khanolkar and Matt break down Cisco’s Secure AI Factory: what it is, why on-prem AI is quickly becoming a strategic priority, and how the full stack (UCS + NVIDIA GPUs, HyperFabric AI, AI Defense, and smart networking) gives enterprises the control, security, and cost predictability that cloud alone can’t deliver.

Rajeev Khanolkar (00:07.118)

I think cloud is proving to be very expensive because here is what people don’t understand or they don’t get a feel for it until they get a big bill. Suddenly you end up with three, $400,000 bill and they don’t even know what they spent that money on. If you don’t manage tokens properly, that is what AI factory does. You know what you have spent and that’s the reason you need on-prem solution.

Matt Locknane (00:38.03)

The token issue is a huge one. We’ve heard plenty of stories come out where a development team has pushed through a single PR request. It got merged into production and all of a sudden they burned all their tokens for the token allocation for the month overnight. They didn’t even realize that that was going to be a problem. Then you look at, we’ve seen some studies from like IBM did a study last year, I think that showed that on-prem AI costs are about 30 % of what cloud costs are over the course of five years. And that’s including the CapEx expenditure for buying the hardware and putting it on-prem. So you’re exactly right there. Customers are getting these giant bills from the cloud. And they’re finding out that the ROI that they’re getting from these AI use cases is seriously tempered by the cost of that cloud environment. So governance is going to be key.

Rajeev Khanolkar (01:35.608)

Secure Air Factory is very important because there are a lot of LLMs. There’s some large ones, small ones. You don’t know whether those LLMs are poison. Are they giving you the right information? So you need re-teaming of LLMs. And I think this is where AI defense, which is part of Secure Air Factory, brings a huge value.

Matt Locknane (01:58.156)

And you talked about the next thing, which is the LLM capabilities, right? And so when you look at the security footprint inside of an AI workload, that is a massive black hole right now because from a development standpoint, you’re not worried about security. You’re worried about does this AI use case work, right? So with Secure AI Factory, and on-prem hardware, you control the guardrails, you control all the security all the way up through the stack, and you can have your developers go out and build a product that works. You don’t have to worry about the security because it’s encompassing the entire AI workload.

Rajeev Khanolkar (02:39.118)

Agent-like AI applications may outnumber the number of people. At least looking at the market, what pundits are predicting is going to be a path to these agents, and security is going to be the main concern. Humans have an identity when you log into a company, right? You have a passkey, maybe face recognition. What should agents have as an identity, you know? Should these agents have individual identity to verify themselves? Because they may take actions on behalf of you. So should that be a separate ID? And how do you authenticate it? mean, agents don’t have a face, right? How does an agent work?

Matt Locknane (03:28.002)

Well, especially in a Kubernetes environment where pods are spun up and spun down. You can’t do traditional security based on IP addresses because these are allocated dynamically as the system requires a scale out and scale down.

Matt Locknane (03:46.062)

And some of the expansion of the NVIDIA Cisco relationship that was announced at GTC, for example, like the inclusion of the Nexus 9100 switches, which are the Spectrum X ASICs in the Nexus platform switches. That’s exciting. You’ve also got things like the HyperShield integration with the secure AI factory stack, but that gives you very granular level security with EBPF programming. To be able to control your security posture basically anywhere where that workload lives, whether it be cloud, whether it be on-prem, and even through the Selium network mesh, you can extend your fabric from your on-prem environment to your cloud to dynamically scale as needed.

Rajeev Khanolkar (04:34.124)

Yeah, I think I’m seeing different things like some of the people are not, they’re just waiting and saying, think, okay, let’s play down in the cloud. So sellers are waiting and watching where, whereas, that the, you know, others are saying, okay, let’s take a look at security factory and start some pilots. And so what I’m seeing is very interesting. whole model, right?

People are more open to looking at something like AI defense because everybody has an AI project. So AI defense is absolutely necessary to do the red teaming. they like that.

Rajeev Khanolkar (06:08.544)

Yeah, I mean, talking to several people for the last few weeks at, for example, RSA conference and other events, the two leading ones seem to be RTX 600 and H100 is what people have been using. Here is another interesting stat that I was talking to somebody about GPU utilization. And it was shocking. The question was, what percentage of GPU utilization you have. And some people say, hey, is it 50 %? Is it 20 %? And it was found to be, on average, 2%, which is kind of shocking.

Matt Locknane (06:59.788)

Yeah, look, this is the same exact problem that we had in compute virtualization 20 years ago, right? This is what the VMWare’s of the world were trying to solve at that time, is you had these big expensive boxes that were sitting in your data center, and they were only 2 % utilized, right? And you had plenty of memory, had plenty of CPU, you had plenty of disk space, but it was used for a single workload. And GPUs are becoming now, with the supply chain bottlenecks, even more rare than those compute resources were back then.

And so customers are looking for ways to optimize their GPU utilization. When you bring it on-prem, you have full control over that. You can use MIG, you can use time slicing to carve up those GPUs and distribute that workload better so that you get more utilization out of the hardware that you purchased.

Matt Locknane (07:51.19)

Let me ask you one final question. So, Rajeev, how do Cisco sellers, the AMs and the SEs, how do they help accelerate the adoption of Secure AI Factory by customers? What are the biggest areas of opportunity that you see there?

Rajeev Khanolkar (08:12.398)

I think I’m seeing different things. Some of the people, they’re just waiting and saying, they think, okay, let’s play down in the cloud. So sellers are waiting and watching, whereas others are saying, okay, let’s take a look at Secure AI factory and start some pilots. so what I’m seeing is very interesting, the whole model, right? People are more open to looking at something like AI defense because everybody has an AI project. So AI defense is absolutely necessary to do the red teaming. So they like that. I think that’s, that’s a very good sign. And, you know, as an entry point, I see that common across almost all Cisco reps, what they’re talking about. But I think in terms of going with other security AI factory or not, that is really dependent on the customers.

Matt Locknane (09:14.03)

Yeah. You know, this hybrid model where we’re doing development in cloud is never going to go away. It’s easy to spin up. It’s easy to scale. You you can basically just, you know, expand at will, which is really nice for the development team. However, when it comes to, you know, actual data security and privacy concerns, then we need to start thinking about bringing it on-prem. And of course, we get the cost savings that come along with that.

So I think that that’s a discussion that as more and more companies are building out these AI initiatives, that the sellers are going in and talking to them about, what do your costs look like at the end of the day? What’s your security posture look like in your cloud environment? We’ve seen plenty of security issues with cloud environments where the cloud environment doesn’t meet their specific regulatory needs for the industry that they’re in. That could be the same thing in AI, right?

We bring that on-prem, we wrap the Secure AI Factory guardrails around it, and we have a complete package that the customer can sleep well at night.

Platform

Services

Video

Why Enterprises Are Moving AI On-Prem | Built to Scale: Secure AI Factory Ep. 1 (The Stack)

Blog

Learn how Gruve drives impact

Digital forensics services

Learn more →

The next frontier of automation: from playbooks to autonomous AIOps

Learn more →

Unlock your
true speed to scale

Video

Why Enterprises Are Moving AI On-Prem | Built to Scale: Secure AI Factory Ep. 1 (The Stack)

Blog

Learn how Gruve drives impact

Digital forensics services

Learn more →

The next frontier of automation: from playbooks to autonomous AIOps

Learn more →

Unlock your true speed to scale

Unlock your
true speed to scale