Ask Runable forDesign-Driven General AI AgentTry Runable For Free
Runable
Back to Blog
Technology7 min read

Inference pushes AI out of the data center | TechRadar

How AI is evolving away from datacenters Discover insights about inference pushes ai out of the data center | techradar...............................

TechnologyInnovationBest PracticesGuideTutorial
Inference pushes AI out of the data center | TechRadar
Listen to Article
0:00
0:00
0:00

Inference pushes AI out of the data center | Tech Radar

Overview

News, deals, reviews, guides and more on the newest computing gadgets

Start exploring exclusive deals, expert advice and more

Details

Unlock and manage exclusive Techradar member rewards.

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Unlock instant access to exclusive member features.

Get full access to premium articles, exclusive features and a growing list of member rewards.

In the early 2000s, the architects of the internet faced a familiar-sounding modern problem: How do you build a system that handles massive, unpredictable demand without it breaking when any single part of it fails?

Their answer was to build a system of peer-to-peer networking. Rather than routing everything through central servers, P2P systems distributed load across thousands of individual nodes with no single point of failure, intelligence closer to the user, and resilience baked into the architecture rather than bolted on top.

It was a successful solution. P2P networks proved faster, more resilient, and more scalable than anything centralized IT infrastructure could match for distributed workloads.

How network modernization enables AI success and quantum readiness

Then, as the cloud computing era took hold, the hyperscale model became the dominant infrastructure logic of the last fifteen years. Its premise — aggregate everything into the largest possible data centers, optimize for unit cost, centralize without limit — made sense for many workloads.

But AI inference, the phase of AI that is now exploding in enterprise environments, operates on exactly the same principles that made P2P compelling in the first place.

Understanding why requires separating two phases of AI that are often conflated. Training a large model is a one-time, compute-intensive process. It runs well on centralized, aggregated infrastructure, and the hyperscale logic holds there. Inference is different.

Inference is every time the model is actually used: a fraud detection system flagging a transaction, a predictive maintenance system identifying a fault on the factory floor, a logistics platform recalculating routes in real time. These decisions happen continuously, in milliseconds, at the point where operations actually run.

Routing inference workloads to a distant hyperscale facility introduces latency that is simply incompatible with many of these use cases. A surgical assistance system cannot wait for a round trip to a data center in another region. Neither can an industrial safety system, an autonomous inspection drone, or a real-time customer service agent running on retail floor infrastructure.

Mc Kinsey projects that global data center demand will more than triple by 2030, driven overwhelmingly by inference rather than training, and the infrastructure serving that demand needs to be built around what inference actually requires, which is compute close to where the decision happens.

P2P systems’ answer was to stop treating distribution as a problem and start treating it as the architecture. Bit Torrent did not try to solve file transfer by building faster central servers, but it distributed the problem across thousands of nodes: each one close to a user, each one handling local demand locally.

AI vs. AI: Using intelligence to solve the energy strain of data centers

Antimatter plans global AI network with 1,000 micro data centers by 2030

When individual nodes dropped off, the system degraded at the margin. No central server going down took the whole network with it. The architecture assumed failure and built around it, outperforming centralized alternatives on speed, resilience, and scale simultaneously.

Edge computing applies the same logic to AI infrastructure. Smaller, modular compute facilities positioned close to where data is generated and consumed distribute the inference workload the way P2P distributed file transfer. Each site handles local decisions locally. The network as a whole becomes more resilient because no single facility carries the entire load.

Running that inference centrally also carries a cost that compounds with scale: Every time data moves out of a hyperscale cloud provider's network, organizations pay egress fees.

For AI workloads that require continuous data transfer between a central facility and distributed operational environments, those charges accumulate in ways that are easy to underestimate at the planning stage. Processing data locally at the edge — close to where it is generated — reduces the volume crossing the network in the first place.

A hardware shift is also changing the feasibility calculation at the device level. Neural processing units (NPUs) designed specifically for AI inference tasks are now embedded in smartphones, laptops, and industrial edge devices.

The compute required to run capable inference workloads has been falling steadily, and hardware that would have required a server rack a few years ago now fits in a handheld device.

As inference-capable hardware becomes cheaper and more physically compact, the assumption that every workload needs to route back to a centralized facility becomes harder to sustain.

As data sovereignty regulation is tightening across the EU, Southeast Asia, Latin America, and beyond, centralizing inference in a small number of facilities creates legal exposure.

For organizations operating across multiple jurisdictions, edge infrastructure resolves this by design: data is processed locally, within the relevant jurisdiction, without requiring complex legal and technical workarounds after the fact.

Finally, another important element is that power availability — not price — is becoming the binding constraint on data center capacity. In Northern Virginia, the world's densest cloud hub, utilities have projected connection timelines for large projects stretching up to seven years due to grid congestion.

Ireland's data centers now consume more than 20% of national electricity. These problems are the predictable result of concentrating enormous compute into a small number of locations, but the megawatt problem is more tractable when it does not need solving in one place.

Edge deployments, by distributing workloads across many smaller sites, spread the energy demand in a way that aligns better with available grid capacity.

None of this means hyperscale infrastructure is going away. Training workloads, large-scale data processing, and many enterprise applications will continue to run efficiently in centralized cloud environments.

The case for edge is not a case against cloud, but rather for matching infrastructure architecture to what workloads actually need.

The engineers who built P2P networks understood that distributing intelligence across the network made it stronger, not weaker.

As inference pushes AI out of the data center and into the places where businesses actually operate, that lesson is becoming increasingly relevant again.

This article was produced as part of Tech Radar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.

The views expressed here are those of the author and are not necessarily those of Tech Radar Pro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit

You must confirm your public display name before commenting

1i OS 26.4.2 fixes an i Phone security flaw exploited by the FBI

2 These E Ink fridge magnets display Polaroid-like photos that you can change from your phone

3 Microsoft says it's 'directly influenced' by feedback from Windows 11 users when it comes to fixing the OS

4 Mozilla says Anthropic’s Mythos is ‘every bit as capable’ as ‘the world’s best security researchers’ after Firefox experiment — and says the ‘zero-days are numbered’

5 Proton CEO warns global age verification push will mean "the death of anonymity online"

Tech Radar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site.

© Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY 10036.

Key Takeaways

  • News, deals, reviews, guides and more on the newest computing gadgets
  • Start exploring exclusive deals, expert advice and more
  • Unlock and manage exclusive Techradar member rewards
  • When you purchase through links on our site, we may earn an affiliate commission
  • Unlock instant access to exclusive member features

Cut Costs with Runable

Cost savings are based on average monthly price per user for each app.

Which apps do you use?

Apps to replace

ChatGPTChatGPT
$20 / month
LovableLovable
$25 / month
Gamma AIGamma AI
$25 / month
HiggsFieldHiggsField
$49 / month
Leonardo AILeonardo AI
$12 / month
TOTAL$131 / month

Runable price = $9 / month

Saves $122 / month

Runable can save upto $1464 per year compared to the non-enterprise price of your apps.