Home > Research > AWS emphasizes customer choice in LLM strategy – now to address the pain of decision-making

AWS emphasizes customer choice in LLM strategy – now to address the pain of decision-making

Amazon Web Services’ (AWS) large language model (LLM) wants to give customers more choice.

Amazon Web Services’ (AWS) large language model (LLM) strategy is all about offering customers the most functionality and flexibility possible – and that means more choice.

Choice matters when it comes to LLMs. It’s why AWS competitors are also pushing to offer their customers a bevy of models in their platforms – look at Google’s Vertex Model Garden or Microsoft’s Azure AI Model Catalog as examples. It’s necessary simply because different models better suit different use cases. Even when many different models could achieve reasonable accuracy in outcomes, there are also costs and latency considerations for customers. So having a choice means customers can find the right model at the right price.

Unfortunately, with more choice comes more complexity. It’s a pain point that AWS will need to address for its customers as it pushes ahead with expanding its generative AI portfolio.

On stage during the AWS re:Invent 2024 opening keynote, the biggest annual conference the cloud services vendor hosts, Amazon CEO Andy Jassy announced the Amazon Nova family of LLMs. The LLMs are clearly targeted to compete with other best-in-class multi-modal models on the market. An executive later confirmed that Nova is intended as a replacement for Amazon’s Titan models.

Yet Amazon also continues to deepen its partnership with Anthropic, the maker of LLM benchmark leaderboard-topping Claude. Amazon finished a $4 billion investment in the AI firm back in the spring, and announced another $4 billion investment Nov. 22.

So how does Jassy square the strategy of partnering with a leading AI firm while also launching new Amazon-branded LLMs?

"We will give you the broadest and best functionality you can find anywhere. That's going to mean choice," says Jassy.

Announcements made at re:Invent show how AWS plans to deliver on offering you choices to train your own models, to connect off-the-shelf foundation models to your apps and data, and to roll out a managed AI assistant to workers. Amazon also understands that choice is hard because it means customers need to make decisions.

Responding to a question posed by Info-Tech, AWS CEO Matt Garman acknowledged it’s an area that AWS continues to improve upon. "Choice is super important, and we'll keep getting better about helping customers choose the right thing," he says.

In a way, you could look at AWS as the Coca-Cola in the LLM ecosystem, as it’s combining vertical integration with a broad product portfolio. It has the bottling technology (infrastructure) and the distribution system (Amazon Bedrock) to get the product to market. Like Coca-Cola, which offers different flavors through different brands of soda, AWS offers different types of LLMs from different brands. At the end of the day, no matter what customers choose, their decisions converge under the same banner: a customer that buys Diet Coke, Sprite, or Barq’s root beer is still paying Coca-Cola, and a customer that selects Claude, Mistral, or Llama as the LLM in Amazon Bedrock is still paying AWS.

So let's crack open a soda and look at how AWS is pursuing its strategy of LLM choice facilitation across the three layers of its AI stack that it defined at last year’s re:Invent:

  • Layer 1: infrastructure for model training and inference
  • Layer 2: building AI into products and process
  • Layer 3: applications using AI

Layer 1 – Infrastructure: Choose your chip and use a recipe

The choice: Train your AI model on AWS silicon or NVIDIA GPUs.

AWS announced the general availability of Trainium 2 instances, emphasizing its performance and cost-savings for LLM training compared to the first generation.

The AWS silicon is optimized for training large language models (LLMs) and delivers 30–40% better price-performance than existing GPU-based instances, AWS claims. At the same time, AWS also offers NVIDIA GPUs as part of its portfolio, catering to customers who want the industry standard for training.

Trainium 3, slated for release before the end of next year, is expected to build on this foundation with enhancements in power efficiency and performance scalability. AWS is forecasting a 20% reduction per training task.

So customers looking to train AI models can choose between the AWS in-house brand or use NVIDA GPUs. But how do they know which clusters to use and when?

The support: Optimize training with recipes that automate resource allocation and configuration in SageMaker HyperPod.

To help answer that question, AWS unveiled Amazon SageMaker HyperPod Recipes – a set of pre-configured templates that simplify distributed training of LLMs. These recipes automate resource allocation and configuration, eliminating the need for manual optimization, which can be time-consuming and error-prone. Customers using HyperPod Recipes can spend less time training their models overall.

AI teams will be able to collaborate on model training with Amazon SageMaker Unified Studio, an integrated data and development environment (IDE). It brings together SageMaker tools with Amazon Bedrock IDE and Amazon Q Developer and other governance tools.

One final note here is Project Rainier, a next-generation AI supercomputer being developed in collaboration with Anthropic. Expected to be one of the largest training clusters in the world, Project Rainier will provide the platform for producing future cutting-edge foundation models for AWS and Anthropic.

Layer 2 – Amazon Bedrock: Choose a model from the marketplace or bring your own

The choice: Use any LLM to solve your business problem by selecting it from Bedrock Marketplace or by uploading it to the hosted environment.

Amazon Bedrock, first launched in April 2023, serves as a managed service that facilitates access to various foundation models from leading AI companies, including Anthropic, Cohere, Meta, Stability AI, and Amazon's own models. This platform enables customers to select models that best fit their specific use cases. They can then call upon their LLM-enabled applications using a single API.

Updates to models available in Bedrock include:

  • Amazon Nova Models: Amazon's Nova foundation models include:
    • Nova Micro: A text-only model designed for low latency and low cost.
    • Nova Lite: A low-cost multi-modal model optimized for speed on text and image inputs.
    • Nova Pro: The best combination of speed, accuracy, and cost for a wide range of tasks.
    • Nova Premiere: Amazon announced this model at re:invent on Dec. 5 but details are forthcoming.
    • Nova Canvas: an image generation model meant to contend with the best-in-class.
    • Nova Reel: a video generation model that will start with videos up to six seconds long and eventually generate up to 2 minutes.
  • Marketplace Expansion: The Bedrock Marketplace now supports over 100 models, including Anthropic’s Claude, Stability AI’s suite of models, and Cohere’s foundational AI tools, providing an extensive range of options to suit diverse business needs.

The support: Optimize the model you select for your use case and understand responsible AI implications.

AWS announced latency-optimized Inference, a feature designed to deliver faster response times for applications requiring real-time interaction, improving overall efficiency for end users. Among the models available for use with this feature are Anthropic’s Claude and Meta's Llama models. Allowing customers to "quantize" models, or make them as efficient as possible while still performing well for their intended use, also expands the options available to them, Garman noted.

Also announced are AI Service Cards. They provide detailed documentation about each model's capabilities, limitations, and recommended use cases. This helps customers navigate the complexities of model selection and choose models that align with their compliance, ethical, and operational requirements.

Layer 3 – Amazon Q: AI automation for business users

The choice: Automate tasks using natural language across different applications and platforms.

Amazon Q Business, which was launched one year ago at re:Invent 2023, is Amazon's answer to Copilot. It is an AI-powered chatbot tailored for enterprise environments, capable of assisting with tasks such as troubleshooting cloud applications, summarizing documents, and responding to business inquiries through natural language prompts. There is also Q developer, which focuses on coding tasks.

Announced at re:Invent 2024, Q can now execute over 50 business actions across a wide range of enterprise applications and platforms. Q can automate routine tasks such as generating reports or sending notifications. While it offers deep integration with AWS services, the selection of models and functionalities within Amazon Q is more curated, with the LLMs powering the assistant obscured from the user.

The support: Abstracting away LLM operations to focus on outcomes.

While an LLM is behind Amazon Q’s capabilities, we don’t know which one (or which ones) it is. Instead, AWS curates that for the user, also interpreting their prompts to Q and passing on the instructions to the model.

When asked if it would make sense to give users choice about what LLM was working behind the scenes of Q's front-end, Garman said he wasn't philosophically opposed to it, but wondered if there would be much value in doing so.

Too bad LLMs are more complicated than cola

All of these announcements at re:Invent layer on top of decision-making aids that AWS already provides in its platform. For example, Model Evaluation features in Bedrock allow developers to A/B test different models and compare performance, and track a range of different metrics to determine the right model for their use case – at the right cost. After this year's show, it's clear that users now have even more choices available to them in their journey to reap value from generative AI.

Coca-Cola's strategy of providing customers with choice works in part because its product is fool proof. No matter what choice you make, you just pay the same price, then open the bottle and drink the contents. Such simplicity is only a dream when working with LLMs. Customers could endlessly iterate and make tweaks to a multi-step pipeline built around an LLM before even knowing exactly what they want it to do, and how much they're willing to pay for it.

Choice is essential to ensuring organizations can succeed with LLMs. But too much choice can be paralyzing and stymie progress. Will AWS continue to strike the right balance with its customer offerings?