Amazon Web Services (AWS) is building an Ultracluster, a massive AI supercomputer made up of hundreds of thousands of its homegrown Trainium chips in collaboration with Anthropic. This cluster will be used for training AI models and is expected to be one of the largest in the world when ready in 2025.
Amazon Is Building a Mega AI Supercomputer With Anthropic
Amazon Web Services (AWS) is building an Ultracluster, a massive AI supercomputer made up of hundreds of thousands of its homegrown Trainium chips. This cluster, called Project Rainier, will be used by the AI startup Anthropic and will be one of the largest in the world for training AI models when ready in 2025.
The Ultracluster will utilize Amazon’s latest AI training chip, Trainium 2. Each server in the cluster contains 16 Trainium chips linked together by Amazon’s NeuronLink technology, allowing for faster communication between servers and reaching up to 83.2 petaflops of compute.
AWS has been working on its own hardware for customers since 2018 and aims to run the same playbook that made Graviton a success—proving to customers that it is a lower cost but no less capable option than the market leader. The company has poured $8 billion into Anthropic and has quietly pushed out a range of tools through an AWS platform called Bedrock to help companies harness and wrangle generative AI.
The Ultracluster will be five times larger than the cluster used to build Anthropic’s current most powerful model. It is expected to be one of the largest in the world for training AI models when ready in 2025.
AWS has announced plans to build an ‘Ultracluster’, a massive AI supercomputer made up of hundreds of thousands of its homegrown Trainium chips. The company aims to prove that its chips are a lower cost but no less capable option than the market leader, Nvidia.
Amazon also announced new tools to help customers build generative AI programs, including:
-
Model Distillation: A tool that produces smaller, faster models with similar capabilities.
-
Bedrock Agents: A system for managing hundreds of different AI agents.
-
Automated Reasoning: A tool that ensures chatbot outputs are accurate using logical reasoning.
AWS also showcased its next-generation training chip, Trainium 3, which it says will offer four times the performance of its current chip. The company also announced a new server called Ultraserver, made up of 64 interconnected Trainium chips.