【Amazon Bedrock Intelligent Prompt Routing】使ってみた！

2024.12.05

この記事をシェアする

はじめに

はじめまして！
クラウドビルダーズのKawabataと申します

re:Inventにて何やら面白い発表が…

その名も「Amazon Bedrock Intelligent Prompt Routing」！！！

プロンプトを読み取って、適切なモデルにルーティングするだと！？

簡単にレスポンスできるものはHaikuで、複雑なコード生成とかはSonnetに飛ぶと予想！
やってみよう！

Amazon Bedrock Intelligent Prompt Routingとは

Amazon Bedrock Intelligent Prompt Routingは、同じモデルファミリー内の複数のFoundation Model (FM)を組み合わせて、品質とコストを最適化する機能です

主な特徴:

プロンプトの複雑さに応じて、適切なモデルに自動的にルーティング
コストを最大30%削減可能
現在はAnthropicのClaude系とMetaのLlama系モデルをサポート

注意点

プロンプトルーターは現在プレビュー段階で、us-east-1 (バージニア北部)とus-west-2 (オレゴン)リージョンでのみ利用可能
現在は英語のプロンプトのみサポート

最初英語サポートのみを見落としていて、日本語でリクエストしていたのですが、Sonnetばかり呼ばれました
気を付けましょうw

プロンプトルーティングの仕組み

使用するモデルファミリーを選択
各リクエストに対して、プロンプトルーティングの内部モデルがモデルファミリーの各モデルのパフォーマンスを予測
Amazon Bedrockが応答品質とコストの最適な組み合わせを提供するモデルを選択
Amazon Bedrockが選択したモデルにリクエストを送信
Amazon Bedrockが選択したモデルからのレスポンスと、選択したモデルに関する情報を返答

docs.aws.amazon.com

Understanding intelligent prompt routing in Amazon Bedrock - Amazon Bedrock

https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-routing.html

Amazon Bedrock intelligent prompt routing provides a single serverless endpoint for efficiently routing requests between different foundational models within the same model family. It can help you optimize for response quality and cost. They offer a comprehensive solution for managing multiple AI models through a single serverless endpoint, simplifying the process for you. Intelligent prompt routing predicts the performance of each model for each request, and dynamically routes each request t...

プロンプトルーターの構成的に、クロスリージョン推論プロファイルを使用して、Claude 3.5 Sonnet と Claude 3 Haiku の間でリクエストをルーティングしているっぽいですね

ルーターの内部モデルによって予測された、各モデルでの応答品質を判断してルーティングを決めているとのこと
各モデルの応答品質が基準に満たなかった場合は、フォールバックモデル（AnthropicならClaude 3.5 Sonnet）が呼ばれる仕組みらしい

これすごくないか…

SonnetではオーバースペックだったものをHaikuで処理させることで、コストを削減できるわけですね
めっちゃよい

料金

プロンプトルーターの使用に追加コストはかからない
使用されたモデルの料金のみが発生

やってみよう

今回のデモをGitHubにも公開しています

GitHub

GitHub - kawabata-mcl/bedrock-prompt-router-demo

https://github.com/kawabata-mcl/bedrock-prompt-router-demo.git

Contribute to kawabata-mcl/bedrock-prompt-router-demo development by creating an account on GitHub.

公式のブログを参考にしています

Amazon Web Services

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and p...

https://aws.amazon.com/jp/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/

Route requests and cache frequently used context in prompts to reduce latency and balance performance with cost efficiency.

事前準備

AWS SDKのインストール

$ pip install boto3

AWS認証情報の設定

$ aws configure

実行に必要なIAMポリシーの付与

bedrock:InvokeModel

コードの実行

以下のPythonコードで、プロンプトルーターを使用できます:

※プロンプトルーターのARNと質問内容は適宜変更してください

import json
import boto3

# Bedrockクライアントの初期化
bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1"
)

# プロンプトルーターのARNを指定
ROUTER_ARN = "<YOUR_PROMPT_ROUTER_ARN>"

# メッセージの作成
messages = [{
    "role": "user",
    "content": [
        {"text": "<YOUR_QUESTION>"}
    ]
}]

# プロンプトルーターを使用してモデルを呼び出し
response = bedrock_runtime.converse(
    modelId=ROUTER_ARN,
    messages=messages
)

# レスポンスの表示
print("Response:")
print(response["output"]["message"]["content"][0]["text"])

# 使用されたモデルの確認
print("\nUsed Model:")
print(json.dumps(response["trace"], indent=2))

以下コマンドでコードを実行します

$ python3 prompt-router-demo.py

いろいろ聞いてみた

以下質問を投げてみて、どのモデルが使用されたかを確認してみました

Hello
Tell me about Bedrock
How do you say “hello” in Japanese?
Translate Bedrock Intelligent Prompt Routing is amazing to Japanese
What is 1+1?
Show me the Python code to create a DynamoDB table

解答例

回答はこんな感じでした

Hello

Response:
Hello! How can I assist you today?

Used Model:
{
  "promptRouter": {
      "invokedModelId": "arn:aws:bedrock:us-east-1:414589897431:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
  }
}

Show me the Python code to create a DynamoDB table

Response:
Certainly! Here's a Python script that creates a DynamoDB table using the AWS SDK for Python (Boto3):

import boto3

def create_dynamodb_table(table_name):
    # Create a boto3 client for DynamoDB
    dynamodb = boto3.client('dynamodb')

    try:
        # Create the DynamoDB table
        response = dynamodb.create_table(
            TableName=table_name,
            KeySchema=[
                {
                    'AttributeName': 'id',
                    'KeyType': 'HASH'  # Partition key
                },
                {
                    'AttributeName': 'timestamp',
                    'KeyType': 'RANGE'  # Sort key
                }
            ],
            AttributeDefinitions=[
                {
                    'AttributeName': 'id',
                    'AttributeType': 'S'  # String
                },
                {
                    'AttributeName': 'timestamp',
                    'AttributeType': 'N'  # Number
                }
            ],
            ProvisionedThroughput={
                'ReadCapacityUnits': 5,
                'WriteCapacityUnits': 5
            }
        )

        print(f"Table {table_name} created successfully.")
        return response
    except dynamodb.exceptions.ResourceInUseException:
        print(f"Table {table_name} already exists.")
    except Exception as e:
        print(f"Error creating table: {str(e)}")

# Example usage
table_name = 'MyDynamoDBTable'
create_dynamodb_table(table_name)

This script does the following:

1. We import the boto3 library, which is the AWS SDK for Python.
2. We define a function create_dynamodb_table that takes a table_name as an argument.
3. Inside the function, we create a boto3 client for DynamoDB.
4. We use the create_table method of the DynamoDB client to create the table. We specify:
 - The table name
 - The key schema (in this example, we're using a composite key with 'id' as the partition key and 'timestamp' as the sort key)
 - The attribute definitions for the key attributes
 - The provisioned throughput for read and write capacity units

5. If the table is created successfully, we print a success message and return the response.
6. We include error handling for cases where the table already exists or if there's another error during creation.
7. At the end, we provide an example usage of the function.

To use this script:
1. Make sure you have the AWS CLI configured with your credentials or have the appropriate IAM role attached to your EC2 instance if you're running this on AWS.
2. Install boto3 if you haven't already: pip install boto3
3. Run the script. It will create a DynamoDB table with the name 'MyDynamoDBTable' (or whatever name you specify).

Remember to adjust the table schema, attribute definitions, and provisioned throughput according to your specific requirements.

Used Model:
{
  "promptRouter": {
  "invokedModelId": "arn:aws:bedrock:us-east-1:414589897431:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
  }
}

ルーティングで使用されたモデル

今回投げた質問に対してルーティングで使用されたモデルは以下の通りです

anthropic.claude-3-5-sonnet-20240620-v1:0
– Tell me about Bedrock
– How do you say “hello” in Japanese?
– Translate Bedrock Intelligent Prompt Routing is amazing to Japanese
– Show me the Python code to create a DynamoDB table
anthropic.claude-3-haiku-20240307-v1:0
– Hello
– What is 1+1?

簡単なレスポンスはHaiku
調査が必要なもの、翻訳やコード生成などはSonnet
って感じですね～

概ね最初の予想通り！

コスト削減をしつつパフォーマンスを維持できそうで素晴らしいサービス！！！

CLIでプロンプトルーターで利用できるモデルの確認方法

CLIでプロンプトルーターで利用できるモデルの確認方法を紹介します
※プロファイルのデフォルトリージョンがus-east-1以外の場合は、–region us-east-1をお忘れなく！

利用可能なプロンプトルーターの一覧表示:

$ aws bedrock list-prompt-routers

出力例:

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}

特定のプロンプトルーターの詳細確認:

$ aws bedrock get-prompt-router --prompt-router-arn arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1

出力例:

{
    "promptRouterName": "Meta Prompt Router",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
        },
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
        }
    ],
    "fallbackModel": {
        "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
    },
    "status": "AVAILABLE"
}

まとめ

Amazon Bedrock Intelligent Prompt Routingは、以下のような特徴を持つ非常に有用なサービスです：

プロンプトの複雑さに応じて、適切なモデルに自動的にルーティング
追加コストなしでコストを最大30%削減可能
簡単な質問はHaikuに、複雑な処理はSonnetに振り分けることで最適化
現在はAnthropicのClaude系とMetaのLlama系モデルをサポート
英語のプロンプトのみ対応（現在）
us-east-1とus-west-2リージョンで利用可能

実際の検証でも、シンプルな挨拶や計算はHaikuに、コード生成や翻訳などの複雑なタスクはSonnetにルーティングされることが確認できました
コスト最適化とパフォーマンスの両立を実現する画期的なサービスですね

さいごに

これは日本語版リリースされたら利用必須ですね
Sonnetを絶対使いたいという要件がなければ、ぜひコスト削減のために利用したい！