AI's Future May Depend on Distributed Supercomputing

编辑者: Veronika Nazarova

As artificial intelligence (AI) models continue to expand rapidly, industry experts suggest that the future of AI may rely on a new type of supercomputer that connects multiple datacenters across vast distances.

With the increasing demand for computational power, analysts predict that traditional datacenters may no longer suffice. "Distribution is inevitable," stated Sameh Boujelbene, an analyst at Dell'Oro.

Companies like Nvidia are exploring ways to integrate remote datacenters into a cohesive virtual supercomputer. This approach could address power limitations and enhance AI training efficiency.

Current technologies, such as Nvidia's InfiniBand and dense wave division multiplexing, allow for data transfer across distances of up to 40 kilometers. However, research is underway to extend these capabilities significantly, potentially facilitating connections over thousands of kilometers.

Despite advancements, challenges such as latency and bandwidth remain. AI workloads require high bandwidth and low latency, with up to 30% of training time often spent waiting for data transfers. New technologies, including hollow core fiber, aim to reduce latency by minimizing the need for repeaters.

Experts emphasize that software optimization can mitigate some of these challenges, allowing for more efficient data handling across distributed networks. However, achieving a uniform compute architecture across datacenters is crucial to avoid performance bottlenecks.

As AI models grow increasingly complex, the industry may need to embrace multi-datacenter training to keep pace. While power constraints currently limit the number of GPUs in a single datacenter, the need for distributed workloads may soon become essential.

你发现了错误或不准确的地方吗?

我们会尽快考虑您的意见。