Wherever Generative AI is deployed it will change the IT ecosystem within the data centre. From processing to memory, networking to storage, and systems architecture to systems management, no layer of the IT stack will remain unaffected.
For those on the engineering side of data centre operations tasked with providing the power and cooling to keep AI servers operating both within existing data centres and in dedicated new facilities, the impact will be played out over the next 12 to 18 months.
Starting with the most fundamental IT change – and the one that has been enjoying the most publicity – AI is closely associated with the use of GPUs (Graphics Processing Units). The GPU maker Nvidia has been the greatest beneficiary – according to Reuters “analysts estimate that Nvidia has captured roughly 80% of the AI chip market. Nvidia does not break out its AI revenue, but a significant portion is captured in the company’s data center segment. So far this year (2023), Nvidia has reported data center revenue of $29.12 billion.”
But even within the GPU universe, it will not be a case of one size or one architecture fits every AI deployment in every data centre. GPU accelerators built for HPC and AI are common, as are Field Programmable Gate Arrays, adaptive System on Chips (SOCs) or ‘smart mics’ and highly dense CUDA (Compute Unified Device Architectures) GPUs.
An analysis from the Center for Security and Emerging Technology, entitled “AI chips, what they are and why they matter” says: “Different types of AI chips are useful for different tasks. GPUs are most often used for initially training and refining AI algorithms. FPGAs are mostly used to apply trained AI algorithms to “inference.” ASICs can be designed for either training or inference.”
As with all things AI, what’s happening at chip level is an area of rapid development where there is growing competition between with traditional chip makers, cloud operators and new market entrants who are racing to produce chips for their own use, for the mass market or both.
As an example of disruption in the chip market, AWS announced ‘Trainium2’ as a next generation chip designed for training AI systems in Summer 2023. The company proclaimed the new chip to be four times faster while using half the energy of its predecessor. Elsewhere firms such as ARM are working with cloud providers to produce chips for AI, while AMD has invested billions of dollars in AI chip R&D. Intel, the world’s largest chip maker is not standing still. Its product roadmap announced in December 2023 was almost entirely focused on AI processors from PCs to servers.
Why more GPU Servers?
The reason for the chip boom is the sheer number of power-hungry GPUs (Graphics Processing Units) or TPUs (Tensor Processing Units were developed by Google specifically for AI workloads) needed for Generative AI workloads.
A single AI model will run across hundreds of thousands of processing cores in ten of thousands of servers mounted in racks drawing 60 – 100kW per rack. As AI use scales and expands this kind of rack power density will be common. The power and cooling implications for data centres are clear.
There are several factors that set GPU servers apart from other types of server. According to Run:ai these include: “Parallelism: “GPUs consist of thousands of small cores optimized for simultaneous execution of multiple tasks. This enables them to process large volumes of data more efficiently than CPUs with fewer but larger cores.”
“Floating-point Performance: The high-performance floating-point arithmetic capabilities in GPUs make them well-suited for scientific simulations and numerical computations commonly found in AI workloads.”
And “Data transfer speeds: Modern GPUs come equipped with high-speed memory interfaces like GDDR6 or HBM2 which allow faster data transfer between the processor and memory compared to conventional DDR4 RAM used by most CPU-based systems.”
Parallel processing – AI computing and AI supercomputing
Like AI supercomputing, traditional supercomputing runs on parallel processing using neural networks. Parallel processing is using more than one microprocessor to handle separate parts of an overall task. It was first used in traditional supercomputers – machines where vast arrays of traditional CPU servers with 100,000s of processors are set up as a single machine within a standalone data centre.
Because GPUs were invented to handle graphics rendering their chip parallel architecture makes them more suitable for breaking down complex tasks and working on them simultaneously. It is the nature of all AI that large tasks need to be broken down.
The announcements about AI supercomputers from the cloud providers and other AI companies are revealing: Google is saying it will build for customers its AI A3 supercomputers of 26,000 GPUs pushing 26 exaFLOPS of AI throughput. AWS said it will build GPU clusters called Ultrascale that will deliver 20 exaFLOPS. Inflection AI said its AI cluster will consist of 22,000 NVIDIA H100 Tensor Core GPUs.
With such emphasis on GPU supercomputers you could be forgiven for thinking all AI will only run on GPUs in the cloud. In fact, AI will reside not just in the cloud but across all types of existing data centres and on different types of server hardware.
Intel points out: “Bringing GPUs into your data center environment is not without challenges. These high-performance tools demand more energy and space. They also create dramatically higher heat levels as they operate. These factors impact your data center infrastructure and can raise power costs or create reliability problems.”
Of course, Intel wants to protect its dominance in the server CPU market. But the broader point is that data centre operators must prepare for an even greater mix of IT equipment living under the one roof.
For data centre designers, even where being asked to accommodate running several thousand GPU machines at relatively small scale within an existing facility, be prepared to find more power and take more heat.