At last year’s AWS re:Invent conference, Amazon’s cloud computing unit launched SageMaker HyperPod, a platform for building foundation models. It’s no surprise, then, that at this year’s re:Invent, the company is announcing a number of updates to the platform, with a focus on making model training and fine-tuning on HyperPod more efficient and cost-effective for enterprises.
HyperPod is now in use by companies like Salesforce, Thomson Reuters, and BMW and AI startups like Luma, Perplexity, Stability AI and Hugging Face. It’s the needs of these customers that AWS is now addressing with today’s updates, Ankur Mehrotra, the GM in charge of HyperPod at AWS, told me.
One of the challenges these companies face is that there often simply isn’t enough capacity for running their LLM training workloads.
“Oftentimes, because of high demand, capacity can be expensive as well as it can be hard to find capacity when you need it, how much you need, and exactly where you need it,” Mehrotra said. “Then, what may happen is you may find capacity in specific blocks, which may be split across time and also location. Customers may need to start at one place and then move their workload to another place and all that — and then also set up and reset their infrastructure to do that again and again.”
To make this easier, AWS is launching what it calls ‘flexible training plans.’ With this, HyperPod users can set a timeline and budget. Say they want to complete the training of a model within the next two month and expect to need 30 full days of training with a specific GPU type to achieve that. SageMaker HyperPod can then go out, find the best combination of capacity blocks and create a plan to make this happen. SageMaker handles the infrastructure provisioning and runs the jobs (and pauses them when the capacity is not available).
Ideally, Mehrotra noted, this can help these businesses avoid overspending by overprovisioning servers for their training jobs.
Many times, though, these businesses aren’t training models from scratch. Instead, they are fine-tuning models using their own data on top of open weight models and model architectures like Meta’s Llama. For them, the SageMaker team is launching HyperPod Recipes. These are benchmarked and optimized recipes for common architectures like Llama and Mistral that encapsulate the best practices for using these models.
Mehrotra stressed that these recipes also figure out the right checkpoint frequency for a given workload to ensure that the progress of the training job is saved regularly.
As the number of teams working with generative AI in a company grows, different teams will likely provision their own capacity, which in return means that some of those GPUs will sit idle and eat into a company’s overall AI budget. To combat this, AWS is now allowing enterprises to essentially pool those resources and create a central command center for allocating GPU capacity based on a project’s priority. The system can then allocate resources automatically as needed (or determined by the internal pecking order, which may not always be the same thing).
Another capability this enables is for companies to use most of their allocation for running inference during the day to serve their customers and then allocate more of those resources to training during the night, when there is less demand for inferencing.
As it turns out, AWS first built this capability for Amazon itself and the company saw the utilization of its cluster go to over 90% because of this new tool.
“Organization really want to innovate, and they have so many ideas. Generative AI is such a new technology. There are so many new ideas. And so they do run into these resource and budget constraints. So it’s about doing the work more efficiently and we can really help customers reduce costs — and this overall helps reduce costs by, we’ve looked at it, up to 40% for organizations.”