At the Source: Reducing Energy Consumption in AI Data Centers

Written by: Heather Srigley

May 9, 2024

In the ever-evolving world of artificial intelligence (AI), one big concern on everyone’s mind is how much energy AI data centers are devouring as awesome applications built on large language models (ChatGPT), mixture of experts (DALL-E), constitutional AI (Claude), retrieval augmented generation (RAG), and video generative systems (RunwayAI) move from training to production.

With the skyrocketing demand for AI capabilities, we urgently need AI technology infrastructure that can handle power-hungry models efficiently. The real trick isn’t just tweaking GPUs here and there—it’s about completely revamping how data centers work to save as much energy as possible.

The Energy Dilemma: Rising Demand and Potential Shortages

Recent warnings from industry leaders such as Rene Haas, CEO of Arm, highlight the stark reality of AI’s energy demands. Without greater efficiency, “by the end of the decade, AI data centers could consume as much as 20% to 25% of U.S. power requirements. Today that’s probably 4% or less,” said Haas in a Wall Street Journal article, on April 9, 2024. “That’s hardly very sustainable, to be honest with you.”

Barron’s also shares cautionary words from Elon Musk – multi-company CEO of SpaceX, Neuralink, Tesla, and X – about the rising appetite for power-hungry AI chips leading to potential electricity shortages and the need for new, large energy infrastructure. AES, a Virginia utility company, predicts that data centers could consume up to 7.5% of the total U.S. electricity supply by 2030. These forecasts assume that the answer is a massive rethink of our electrical infrastructure, with significant financial implications.

NeuReality has another view on how to solve the problem, which focuses on fixing energy efficiency at the source – the AI data center infrastructure that consumes all that power. Here are just four ways CEO Moshe Tanach listed for innovators and engineers at the recent San Jose AIAI Summit:

• Optimizing the end-to-end system efficiency for AI Inference at scale – specialized AI chips, hardware and software

• Improving the cooling and power delivery to this infrastructure

• Innovating on data science and algorithms

• Disaggregation virtualization and composability is key

It’s a multi-angled challenge, to be sure. But building out more and more power supply alone is likely not the best solution for the planet or your profitability. While hyperscalers such as Amazon Web Services, Google Cloud, Microsoft, Oracle, IBM, and Meta may own and operate their power plants in the U.S., the main source remains electricity with some good progress in geothermal, hydrogen, wind, solar, and nuclear energy. However, the majority of government and business rely on local utility companies and host on-premise data centers at major campuses all over the U.S. While, many are shifting some compute to cloud services, it remains the minority.

On top of that, half of U.S. hydroelectricity capacity is concentrated in three states: Washington, California, and Oregon which happens to be where many hyperscalers are headquartered, i.e. Amazon, Meta, Google, Microsoft. Enterprise customers want control over their data and so remain at the mercy of local energy providers – the majority of which are electrical. Those operating costs add up to millions per year.

Challenging the Status Quo: The Inefficiency of Current Data Centers

With Generative AI consuming even more power – and AI Inference predicted to become 8x more expensive than training – the urgency for better-designed AI data centers is now. NeuReality customers and partners say they cannot wait two, five, or 10 years. AI is a strategic and competitive advantage, and they are looking for solutions urgently to fortify their AI technology infrastructure.

Here’s one example: the low utilization rate of very expensive GPUs some up to $100,000 a piece. Many AI data centers operate these and other AI accelerator chips (GPUs, TPUs, LPUs….FPGAs, ASIC) at a dismal 30% utilization rate, resulting in a massive waste of both silicon and power. Inefficient AI data centers not only strain our energy resources, but also contribute to higher operational costs.

Prohibitive costs are a significant market barrier to many lower-margin industries – from retail and restaurants to banking and healthcare. So a low 25% AI adoption rate in the U.S. is hardly shocking. Globally, it’s better at 35% adoption but cost and complexity continue to get in the way.

The Path to Efficiency: NeuReality’s Approach

Amidst these challenges, NeuReality offers an alternative. We’re all about flipping the script. We’ve dug deep into what makes AI Inference tick, why data center requirements are so different from Training, and how we can reshape the game to save energy and costs.

Inference was a blind spot in our industry when we started in 2019. That’s exactly why NeuReality’s experienced system engineers came together with the foresight to re-engineer AI Inference at scale. The team meticulously unpacked what happens in all stages of AI Inference – from software, to hardware, to networking, to storage – then created an entirely new AI system architecture and NAPU exclusively designed for AI Inference.

Unlike conventional thinking, Inference data centers must be designed differently from those used for AI Training.

For example: AI Inference handles high-variety, high-volume business AI applications that require less power – whereas Training typically handles one big model at a time. And have you noticed that our AI industry is obsessed with tech leadership and raw performance? That might serve Training well – but is overkill for AI inference and businesses where data centers are an operating expense not their core business, such as financial services, healthcare, and automotive.

	AI Training	AI Inference
Purpose	R&D of Capabilities	Production
Budget	R&D Spending	Cost of Sale (impact profit margins)
Computation	Supercomputers, Gigantic Datasets, Splitbetween thousands of GPUs	Single Node, Single/Batch Query/input,Single GPU (or a few)
Accuracy	High precision for accuracy and convergence – FP64/FP32	Can tradeoff Accuracy for cost – FP8, Int8/4/1
Memory	High memory bandwidth (HBM)	Cost-effective (LPDDR/GDDR)
Networking	Front-end & Back-end Networks	Front-end Network with QoS
Storage	Fast high BW storage for data retrieval	Storage only for RAG + KV caching
Driving Metrics	Accuracy and Fast training	Cost and Accuracy

Contrary to common misconceptions, building energy- and cost-efficient AI data centers is entirely possible. NeuReality’s compelling NR1 AI Inference Solution offers a triple win: higher performance, more affordable production-level AI, and smaller carbon footprint by halving the energy and real estate costs companies suffer now.

Embracing Energy Efficiency as a Solution

The call to action is clear: it’s time to stop chasing shadows and address the crux of the issue—energy-guzzling data centers. Instead of pouring billions into acquiring more and more underutilized AI chips or reinventing entire energy infrastructures, focus on optimizing existing data center systems for maximum efficiency.

It’s not about extravagant spending; it’s about embracing high-efficiency and sustainable practices. The technology to do this is available now by pairing NR1 with today’s AI accelerator chips, whether GPU, TPU, LPU…ASIC…FPGA. NeuReality’s NR1 runs on any deep learning accelerator.

Join the Movement for Sustainable AI Infrastructure

While industry leaders discuss the future of energy-hungry AI, we want to rally everyone around AI technology infrastructure efficiency and sustainability. It’s not just good for your pocketbook. It’s good for our planet too.

Let’s not wait for a crisis. Let’s proactively shift our focus to AI Inference, in order to accelerate mass business AI adoption and unleash the next great human achievements in public safety, medical diagnoses and cures, or more convenient, personal banking.

Key Takeaways:

Energy Efficiency in AI Data Centers: A critical need for addressing rising energy consumption in AI operations.
NeuReality’s Innovative Approach: Delivering faster, higher-performance AI with reduced energy and real estate costs.
Call to Action: Shift focus from high AI capability to high AI efficiency in deploying sustainable AI infrastructure. Advocate for more cost and energy-efficient AI solutions by building your AI hardware and software applications on top of NR1’s end-to-end efficient system architecture