Amazon Bedrock Takes the Pain out of AI Agent Orchestration – Mostly
So you want to build your own version of ChatGPT.
In the 18 months since OpenAI’s text-based interface with a power large language model (LLM) hit the web, every organization has been clamoring for their very own version of the supercharged chatbot. They want something with all of ChatGPT’s flexibility and reasoning capabilities but also with contextual knowledge about the organization: its people, its processes, its data. And hopefully we can cut out some of the hallucinations along the way, but let’s get moving.
Only, it turns out that orchestrating an LLM with enterprise data is a difficult task. Not only do you have the technical challenge of wrangling a trillion-parameter (rough estimate) model that’s hosted by a third party, but you also have to ensure data confidentiality. It’d also be nice if the new chatbot wasn’t rude to customers or employees. How are IT departments to solve all these problems and mitigate these risks to get their business colleagues the AI chatbot they deserve?
Enter Amazon Web Services. The cloud giant launched Amazon Bedrock last fall at its headline conference, re:Invent. It’s a fully managed service that allows you to marry your own data with an LLM of your choosing to create an agent to scratch that ChatGPT itch. Amazon has been updating it with access to new LLMs, evaluation features, and customization capabilities since then, and the combination of it all offers a lot of value to organizations that want to get started on working with LLM capabilities in their applications and move quickly. Still, unlocking value from the service will require users to make some decisions and solve other challenges along the way.
Build your AI foundation on Bedrock
Bedrock is an LLM orchestration platform offering users a variety of options for molding foundation models, including retrieval augmented generation (RAG), fine-tuning, and additional pretraining. It helps users retrieve their data, vectorize it into a format suitable for an LLM to ingest, and then generate responses from a foundation model with the added benefit of the new data. With this method, any number of task or subject-specific chatbots could be programmed and launched on the serverless infrastructure backbone that AWS provides.
Creating your first agent takes only a few minutes and requires very little specialized AWS knowledge. A few clicks in a graphical user interface are all you need to establish a connection to a model, create the necessary identities and permissions, upload and vectorize your data, and create a workflow for your agent. Data can be retrieved from elsewhere on your Amazon tenancy in S3 and must be formatted as a .txt, .md, .html, .doc, .docx, .csv, .xls, or .xlsx file.
Next, you select your foundation model to ingest the data. There are a lot of choices here for text generation, including Amazon’s own Titan models. Amazon says it has optimized for cost with its models, not to outcompete other options available from a quality and performance perspective. Third-party models are available from Anthropic, AI21, Cohere, Meta, Mistral, and Stability AI. Some of the highest performing models include Mistral 7B, Cohere’s Command R+, and Anthropic’s Claude 3. There are many models available for text generation and two or three available for image generation. Amazon also allows users to upload their own customized foundation models to be hosted in the cloud.
Bedrock also helps you select the right model for your data with some evaluation features. Users can request either an automatic or a human evaluation assessment. Users select the task they are trying to complete with the agent and what metric they want to evaluate. The metrics available for selection are different perspectives on output quality, including accuracy, robustness, or toxicity. In the Playgrounds, users can also see examples of different models providing outputs to the same question to help make their choice.
Once that’s done, it’s time to create any guardrails required around your model. Amazon Guardrails is the place to program these, allowing you to keep your chatbot agent focused on the task it’s supposed to perform. If you’ve created a chatbot to help a customer select what winter tires to buy for their car, you don’t want it giving advice on buying cryptocurrency. Users simply add a list of denied topics by using a name and a written description of what’s in the no-go zone. Users can also create filters with configurable thresholds for harmful content.
Once the agent is fully baked, users can access it through Bedrock’s API. An advantage to the modular approach to the interface is that users can swap out LLMs or input new data for their agent and maintain its function across applications without reprogramming API references.
Between a Bedrock and hard place
Overall, it’s a lot of orchestration executed with a simple user interface that guides users through the process of creating an AI agent from end to end. The value proposition is clear, and any organization with existing AWS footprint will want to consider this option for launching some use cases quickly. Still, it’s not going to solve all your LLM headaches.
Organizations will still need to prioritize what problems they want to solve with LLMs. Understanding where an AI agent is going to provide value isn’t always obvious. There will be many different ideas within the organization for potential use cases, and organizations will need to prioritize those that are likely to produce the most value. There’s also the consideration that building an agent isn’t the right path and buying a ready-made solution is better.
When building the LLM, the evaluation services offered by Amazon are focused on quality of output. What’s lacking are other important perspectives that organizations need to consider, such as cost and performance. For example, Amazon’s Titan model may be only 2% less accurate than a leading LLM for my task, but it’s not clear how much the user will save if they choose to live with that loss of accuracy. Also, we don’t know where latency of response is lower than another model. Some of these metrics can be determined using Amazon Bedrock Playgrounds. Other times, users will need to go to outside tools and benchmark indexes to find the answers for their specific use cases.
Further, it’s unclear what data was used to train these models. Amazon promises its users indemnification from any copyright lawsuits resulting from the content output of their own Titan models but not other third-party models. Beyond litigation concerns, organizations may not be comfortable associating their brand with model publishers that content creators are accusing of copyright infringement. But there’s no way to sort out how these foundation models were trained or what data went into them within Bedrock.
Amazon Guardrails is good at preventing a chatbot from going rogue, which will slot into an organization’s responsible AI approach. But it doesn’t acknowledge the decisions that come before the interaction with a completed agent. There will be compliance and ethical considerations to make around data confidentiality and what can be exposed to an LLM that’s hosted on Amazon’s cloud. In most cases organizations already comfortable with using their own AWS tenancy to host data will feel in the clear. But they now need to consider the possibility that sensitive data fed into the RAG process will resurface in AI agent output at an unwanted time, representing a data breach. Yes, the possibility is low, but it’s been shown to be possible, and we know tactics to hack these models have been discovered. Some organizations will not accept that risk.
Bedrock doesn’t ask users to dig deep to unearth value
At the end of the day, Amazon Bedrock unlocks the value of context-aware LLMs for a wide swath of enterprise use cases and provides the tools that most organizations need to get started quickly and securely. The platform will plug perfectly into any prototyping process easily, as spinning up an AI agent with a new data set can be accomplished in mere minutes. Organizations that are concerned about the risks described here can put in the extra work to mitigate them and still be confident they’re getting a lot of value out of Bedrock.
From a competition perspective, Bedrock is best compared to managed generative AI services from other cloud giants. Microsoft Azure offers an AI model catalog for building with models. Google Cloud offers Vertex AI Model Garden.
The question of where you build is likely going to be answered by another question – where is your data?