æ¬èšäºã¯ 2025 幎 4 æ 7 æ¥ã« AWS Machine Learning Blog ã§å
¬éããã Effectively use prompt caching on Amazon Bedrock ã翻蚳ãããã®ã§ãã翻蚳ã¯ãœãªã¥ãŒã·ã§ã³ã¢ãŒããã¯ãã®å·æžæžãæ
åœããŸããã Amazon Bedrock ã«ãããŠãããã³ãããã£ãã·ã¥ã®äžè¬æäŸãéå§ãããŸãããAnthropic ã® Claude 3.5 Haiku ãš Claude 3.7 Sonnet ã«å ãã Nova Microã Nova Liteã Nova Pro ã¢ãã«ã§å©çšå¯èœã§ããè€æ°ã® API åŒã³åºãã«ãããŠé »ç¹ã«äœ¿çšãããããã³ããããã£ãã·ã¥ããããšã§ãå¿çæéãæå€§ 85% ççž®ããã³ã¹ããæå€§ 90% åæžããŸãã ããã³ãããã£ãã·ã¥ã䜿çšãããšãç¹å®ã®é£ç¶ããããã³ããéšå ( ããã³ãããã¬ãã£ãã¯ã¹ãšåŒã°ããŸã ) ããã£ãã·ã¥å¯Ÿè±¡ãšããŠæå®ã§ããŸããæå®ãããããã³ãããã¬ãã£ãã¯ã¹ãå«ããªã¯ãšã¹ããéä¿¡ããããšãã¢ãã«ã¯å
¥åãåŠçãããã®ãã¬ãã£ãã¯ã¹ã«é¢é£ããå
éšç¶æ
ããã£ãã·ã¥ããŸãããã®åŸãåãããã³ãããã¬ãã£ãã¯ã¹ãå«ããªã¯ãšã¹ãããããšãã¢ãã«ã¯ãã£ãã·ã¥ããèªã¿åããå
¥åããŒã¯ã³ã®åŠçã«å¿
èŠãªèšç®ã¹ããããã¹ãããããŸããããã«ãããæåã®ããŒã¯ã³ãçæããããŸã§ã®æé (time to first token, TTFT) ãççž®ãããããŒããŠã§ã¢ãããå¹ççã«å©çšãããŸãããã®ããããŠãŒã¶ãŒã¯ããå®ãäŸ¡æ Œã§ãµãŒãã¹ãå©çšã§ããŸãã ãã®èšäºã§ã¯ãAmazon Bedrock ã®ããã³ãããã£ãã·ã¥ã«é¢ããç·åçãªèª¬æãšãã¬ã€ãã³ã·ãŒæ¹åãšã³ã¹ãåæžãå®çŸããããã®å¹æçãªæŽ»çšæ¹æ³ã解説ããŸãã ããã³ãããã£ãã·ã¥ã®ä»çµã¿ å€§èŠæš¡èšèªã¢ãã« (large language model, LLM) ã®åŠçã¯ã䞻㫠2 ã€ã®æ®µéã§æ§æãããŠããŸããå
¥åããŒã¯ã³åŠçãšåºåããŒã¯ã³çæã§ãã Amazon Bedrock ã®ããã³ãããã£ãã·ã¥ã¯ããã®å
¥åããŒã¯ã³åŠçã®æ®µéãæé©åããŸãã ãŸããããã³ããã®é¢é£éšåã«ãã£ãã·ã¥ãã§ãã¯ãã€ã³ããæå®ããŸãããã§ãã¯ãã€ã³ãããåã®ããã³ããå
šäœããã£ãã·ã¥ãããããã³ãããã¬ãã£ãã¯ã¹ã«ãªããŸãããã£ãã·ã¥ãã§ãã¯ãã€ã³ãã§æå®ããããã®ãšåãããã³ãããã¬ãã£ãã¯ã¹ãå«ããªã¯ãšã¹ããéä¿¡ãããšãLLM ã¯ãã®ãã¬ãã£ãã¯ã¹ããã£ãã·ã¥ã«æ¢ã«ä¿åãããŠãããã©ããã確èªããŸããäžèŽãããã¬ãã£ãã¯ã¹ãèŠã€ãã£ãå ŽåãLLM ã¯ãã£ãã·ã¥ããèªã¿åããæåŸã«ãã£ãã·ã¥ããããã¬ãã£ãã¯ã¹ããå
¥ååŠçãåéã§ããŸããããã«ãããããã³ãããã¬ãã£ãã¯ã¹ãåèšç®ããããã«å¿
èŠã ã£ãæéãšã³ã¹ããç¯çŽã§ããŸãã ãªããã¢ãã«ã«ãã£ãŠããã³ãããã£ãã·ã¥ã®å¯Ÿå¿ç¶æ³ãç°ãªããŸãã®ã§ã泚æãã ããã察å¿ããŠããã¢ãã«ããµããŒããããŠããã¢ãã«ããã£ãã·ã¥ãã§ãã¯ãã€ã³ããããã®æå°ããŒã¯ã³æ°ãšãªã¯ãšã¹ããããã®æå€§ãã£ãã·ã¥ãã§ãã¯ãã€ã³ãæ°ã®è©³çްã«ã€ããŠã¯ã é¢é£ããã¥ã¡ã³ã ã確èªããŠãã ããã ãã£ãã·ã¥ãããã¯ããã¬ãã£ãã¯ã¹ãå®å
šã«äžèŽããå Žåã«ã®ã¿çºçããŸããããã³ãããã£ãã·ã¥ã®ã¡ãªãããæå€§éã«æŽ»çšããã«ã¯ãæç€ºãäŸãªã©ã®éçã³ã³ãã³ããããã³ããã®å
é ã«é
眮ããããšããå§ãããŸãããŠãŒã¶ãŒåºæã®æ
å ±ãªã©ã®åçã³ã³ãã³ãã¯ãããã³ããã®æ«å°Ÿã«é
眮ããŠãã ããããã®ååã¯ç»åãããŒã«ã«ãé©çšããããã£ãã·ã³ã°ãæå¹ã«ããããã«ã¯ãªã¯ãšã¹ãéã§åäžã§ããå¿
èŠããããŸãã æ¬¡ã®å³ã¯ããã£ãã·ã¥ãããã®ä»çµã¿ã説æããŠããŸãã AãBãCãD ã¯ããã³ããã®ç°ãªãéšåã衚ããŠããŸãã AãBãC ãããã³ãããã¬ãã£ãã¯ã¹ãšããŠæå®ãããŠããŸããåŸç¶ã®ãªã¯ãšã¹ãã«åã AãBãC ã®ããã³ãããã¬ãã£ãã¯ã¹ãå«ãŸããŠããå Žåããã£ãã·ã¥ããããçºçããŸãã ããã³ãããã£ãã·ã¥ã䜿ãã¹ãå Žé¢ Amazon Bedrock ã®ããã³ãããã£ãã·ã¥ã¯ãè€æ°ã® API åŒã³åºãã§é »ç¹ã«åå©çšãããé·ãã³ã³ããã¹ãããã³ãããæ±ãã¯ãŒã¯ããŒãã«é©ããŠããŸãããã®æ©èœã䜿ããšãã¬ã¹ãã³ã¹ã®ã¬ã€ãã³ã·ãŒãæå€§ 85% ççž®ããæšè«ã³ã¹ããæå€§ 90% åæžã§ãããããç¹°ãè¿ã䜿çšãããé·ãå
¥åã³ã³ããã¹ããæã€ã¢ããªã±ãŒã·ã§ã³ã«ç¹ã«é©ããŠããŸããããã³ãããã£ãã·ã¥ããŠãŒã¹ã±ãŒã¹ã«æçãã©ããã倿ããã«ã¯ããã£ãã·ã¥ããããŒã¯ã³æ°ãåå©çšã®é »åºŠããªã¯ãšã¹ãéã®æéãèŠç©ããå¿
èŠããããŸãã ããã³ãããã£ãã·ã¥ã«é©ãããŠãŒã¹ã±ãŒã¹ã以äžã«ç€ºããŸãïŒ ããã¥ã¡ã³ãã䜿ã£ããã£ãã â æåã®ãªã¯ãšã¹ãã§ããã¥ã¡ã³ããå
¥åã³ã³ããã¹ããšããŠãã£ãã·ã¥ããããšã§ãåãŠãŒã¶ãŒã¯ãšãªã®åŠçãå¹çåãããŸããããã«ããããã¯ãã«ããŒã¿ããŒã¹ã®ãããªè€éãªãœãªã¥ãŒã·ã§ã³ã䜿ããªããŠããããã·ã³ãã«ãªã¢ãŒããã¯ãã£ãå®çŸã§ããŸãã ã³ãŒãã£ã³ã°ã¢ã·ã¹ã¿ã³ã â ããã³ããã§é·ãã³ãŒããã¡ã€ã«ãåå©çšããããšã§ãã»ãŒãªã¢ã«ã¿ã€ã ã®ã€ã³ã©ã€ã³ã³ãŒãææ¡ãå¯èœã«ãªããŸããããã«ãããã³ãŒããã¡ã€ã«ãäœåºŠãååŠçããæéã倧å¹
ã«åæžã§ããŸãã ãšãŒãžã§ã³ãã¯ãŒã¯ãã㌠â ããé·ãã·ã¹ãã ããã³ããã䜿çšããŠãšãŒãžã§ã³ãã®åäœãæŽç·ŽãããŠãããšã³ããŠãŒã¶ãŒã®äœéšãæãªãããšããããŸãããã·ã¹ãã ããã³ãããè€éãªããŒã«å®çŸ©ããã£ãã·ã¥ããããšã§ããšãŒãžã§ã³ããããŒã®åã¹ãããã®åŠçæéãççž®ã§ããŸãã Few-shot åŠç¿ â ã«ã¹ã¿ããŒãµãŒãã¹ãæè¡çãªãã©ãã«ã·ã¥ãŒãã£ã³ã°ãªã©ã倿°ã®é«å質ãªäŸãšè€éãªæç€ºãå«ããå Žåãããã³ãããã£ãã·ã¥ã圹ç«ã¡ãŸãã ããã³ãããã£ãã·ã¥ã®äœ¿ç𿹿³ ããã³ãããã£ãã·ã¥ã䜿çšããéã¯ãããã³ããã®æ§æèŠçŽ ããç¹°ãè¿ã䜿çšãããéçãªéšåããšãåçãªéšåãã® 2 ã€ã®ã°ã«ãŒãã«åé¡ããããšãéèŠã§ããããã³ãããã³ãã¬ãŒãã¯ã次ã®å³ã«ç€ºãæ§é ã«åŸãå¿
èŠããããŸãã 1 ã€ã®ãªã¯ãšã¹ãå
ã«è€æ°ã®ãã£ãã·ã¥ãã§ãã¯ãã€ã³ããäœæã§ããŸãããã ããã¢ãã«ããšã«å¶éããããŸããæ¬¡ã®å³ã«ç€ºãããã«ãéçãªéšåããã£ãã·ã¥ãã§ãã¯ãã€ã³ããåçãªéšåãšããåãæ§é ã«åŸãå¿
èŠããããŸãã ãŠãŒã¹ã±ãŒã¹äŸ ããã³ããã«ããã¥ã¡ã³ããå«ãããããã¥ã¡ã³ãã䜿ã£ããã£ãããã®ãŠãŒã¹ã±ãŒã¹ã¯ãããã³ãããã£ãã·ã¥ã«éåžžã«é©ããŠããŸãããã®äŸã§ã¯ãããã³ããã®éçãªéšåã¯ã¬ã¹ãã³ã¹ãã©ãŒãããã«é¢ããæç€ºãšããã¥ã¡ã³ãæ¬æã§æ§æãããŠããŸããåçãªéšåã¯ãŠãŒã¶ãŒã®ã¯ãšãªã§ãããããã¯ãªã¯ãšã¹ãããšã«å€ãããŸãã ãã®ã·ããªãªã§ã¯ãããã³ããã®éçãªéšåãããã³ãããã¬ãã£ãã¯ã¹ãšããŠæå®ããããã³ãããã£ãã·ã¥ãæå¹ã«ããŸãã以äžã®ã³ãŒãã¹ããããã¯ã Invoke Model API ã䜿çšããŠãã®ã¢ãããŒããå®è£
ããæ¹æ³ã瀺ããŠããŸããæ¬¡ã®å³ã«ç€ºãããã«ããªã¯ãšã¹ãå
ã« 2 ã€ã®ãã£ãã·ã¥ãã§ãã¯ãã€ã³ããäœæããŠããŸãã1 ã€ã¯æç€ºçšããã 1 ã€ã¯ããã¥ã¡ã³ãæ¬æçšã§ãã 以äžã®ããã³ããã䜿çšããŸãïŒ def chat_with_document(document, user_query): instructions = ( "I will provide you with a document, followed by a question about its content. " "Your task is to analyze the document, extract relevant information, and provide " "a comprehensive answer to the question. Please follow these detailed instructions:" "\n\n1. Identifying Relevant Quotes:" "\n - Carefully read through the entire document." "\n - Identify sections of the text that are directly relevant to answering the question." "\n - Select quotes that provide key information, context, or support for the answer." "\n - Quotes should be concise and to the point, typically no more than 2-3 sentences each." "\n - Choose a diverse range of quotes if multiple aspects of the question need to be addressed." "\n - Aim to select between 2 to 5 quotes, depending on the complexity of the question." "\n\n2. Presenting the Quotes:" "\n - List the selected quotes under the heading 'Relevant quotes:'" "\n - Number each quote sequentially, starting from [1]." "\n - Present each quote exactly as it appears in the original text, enclosed in quotation marks." "\n - If no relevant quotes can be found, write 'No relevant quotes' instead." "\n - Example format:" "\n Relevant quotes:" "\n [1] \"This is the first relevant quote from the document.\"" "\n [2] \"This is the second relevant quote from the document.\"" "\n\n3. Formulating the Answer:" "\n - Begin your answer with the heading 'Answer:' on a new line after the quotes." "\n - Provide a clear, concise, and accurate answer to the question based on the information in the document." "\n - Ensure your answer is comprehensive and addresses all aspects of the question." "\n - Use information from the quotes to support your answer, but do not repeat them verbatim." "\n - Maintain a logical flow and structure in your response." "\n - Use clear and simple language, avoiding jargon unless it's necessary and explained." "\n\n4. Referencing Quotes in the Answer:" "\n - Do not explicitly mention or introduce quotes in your answer (e.g., avoid phrases like 'According to quote [1]')." "\n - Instead, add the bracketed number of the relevant quote at the end of each sentence or point that uses information from that quote." "\n - If a sentence or point is supported by multiple quotes, include all relevant quote numbers." "\n - Example: 'The company's revenue grew by 15% last year. [1] This growth was primarily driven by increased sales in the Asian market. [2][3]'" "\n\n5. Handling Uncertainty or Lack of Information:" "\n - If the document does not contain enough information to fully answer the question, clearly state this in your answer." "\n - Provide any partial information that is available, and explain what additional information would be needed to give a complete answer." "\n - If there are multiple possible interpretations of the question or the document's content, explain this and provide answers for each interpretation if possible." "\n\n6. Maintaining Objectivity:" "\n - Stick to the facts presented in the document. Do not include personal opinions or external information not found in the text." "\n - If the document presents biased or controversial information, note this objectively in your answer without endorsing or refuting the claims." "\n\n7. Formatting and Style:" "\n - Use clear paragraph breaks to separate different points or aspects of your answer." "\n - Employ bullet points or numbered lists if it helps to organize information more clearly." "\n - Ensure proper grammar, punctuation, and spelling throughout your response." "\n - Maintain a professional and neutral tone throughout your answer." "\n\n8. Length and Depth:" "\n - Provide an answer that is sufficiently detailed to address the question comprehensively." "\n - However, avoid unnecessary verbosity. Aim for clarity and conciseness." "\n - The length of your answer should be proportional to the complexity of the question and the amount of relevant information in the document." "\n\n9. Dealing with Complex or Multi-part Questions:" "\n - For questions with multiple parts, address each part separately and clearly." "\n - Use subheadings or numbered points to break down your answer if necessary." "\n - Ensure that you've addressed all aspects of the question in your response." "\n\n10. Concluding the Answer:" "\n - If appropriate, provide a brief conclusion that summarizes the key points of your answer." "\n - If the question asks for recommendations or future implications, include these based strictly on the information provided in the document." "\n\nRemember, your goal is to provide a clear, accurate, and well-supported answer based solely on the content of the given document. " "Adhere to these instructions carefully to ensure a high-quality response that effectively addresses the user's query." ) document_content = f"Here is the document: <document> {document} </document>" messages_API_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 4096, "messages": [ { "role": "user", "content": [ { "type": "text", "text": instructions, "cache_control": { "type": "ephemeral" } }, { "type": "text", "text": document_content, "cache_control": { "type": "ephemeral" } }, { "type": "text", "text": user_query }, ] } ] } response = bedrock_runtime.invoke_model( body=json.dumps(messages_API_body), modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0", accept="application/json", contentType="application/json" ) response_body = json.loads(response.get("body").read()) print(json.dumps(response_body, indent=2)) response = requests.get("https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/") blog = response.text chat_with_document(blog, "What is the blog writing about?") äžèšã®ã³ãŒãã¹ããããã«å¯Ÿããã¬ã¹ãã³ã¹ã«ã¯ããã£ãã·ã¥ã®èªã¿åããšæžã蟌ã¿ã«é¢ããã¡ããªã¯ã¹ã瀺ã usage ã»ã¯ã·ã§ã³ããããŸãã以äžã¯ãæåã®ã¢ãã«åŒã³åºãããã®ã¬ã¹ãã³ã¹ã®äŸã§ãïŒ { "id": "msg_bdrk_01BwzJX6DBVVjUDeRqo3Z6GL", "type": "message", "role": "assistant", "model": "claude-3-7-sonnet-20250219â, "content": [ { "type": "text", "text": "Relevant quotes:\n[1] \"Today, Amazon Bedrock has introduced in preview two capabilities that help reduce costs and latency for generative AI applications\"\n\n[2] \"Amazon Bedrock Intelligent Prompt Routing \u2013 When invoking a model, you can now use a combination of foundation models (FMs) from the same model family to help optimize for quality and cost... Intelligent Prompt Routing can reduce costs by up to 30 percent without compromising on accuracy.\"\n\n[3] \"Amazon Bedrock now supports prompt caching \u2013 You can now cache frequently used context in prompts across multiple model invocations... Prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.\"\n\nAnswer:\nThe article announces two new preview features for Amazon Bedrock that aim to improve cost efficiency and reduce latency in generative AI applications [1]:\n\n1. Intelligent Prompt Routing: This feature automatically routes requests between different models within the same model family based on the complexity of the prompt, choosing more cost-effective models for simpler queries while maintaining quality. This can reduce costs by up to 30% [2].\n\n2. Prompt Caching: This capability allows frequent reuse of cached context across multiple model invocations, which is particularly useful for applications that repeatedly use the same context (like document Q&A systems). This feature can reduce costs by up to 90% and improve latency by up to 85% [3].\n\nThese features are designed to help developers build more efficient and cost-effective generative AI applications while maintaining performance and quality standards." } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 9, "cache_creation_input_tokens": 37209, "cache_read_input_tokens": 0, "output_tokens": 357 } } cache_creation_input_tokens ã®å€ã 37,209 ã§ããããšããããã£ãã·ã¥ãã§ãã¯ãã€ã³ããæ£åžžã«äœæããã 37,209 ããŒã¯ã³ããã£ãã·ã¥ãããããšãããããŸãããã®ç¶æ
ãæ¬¡ã®å³ã«ç€ºããŸãã æ¬¡åã®ãªã¯ãšã¹ãã§ã¯ãå¥ã®è³ªåãããããšãã§ããŸãïŒ chat_with_document(blog, "what are the use cases?") ããã³ããã®åçãªéšåã¯å€æŽãããŸããããéçãªéšåãšããã³ãããã¬ãã£ãã¯ã¹ã¯åããŸãŸã§ãããã®ãããç¶ãã¢ãã«åŒã³åºãã§ã¯ãã£ãã·ã¥ã掻çšãããããšãæåŸ
ã§ããŸãã以äžã®ã³ãŒããã芧ãã ããïŒ { "id": "msg_bdrk_01HKoDMs4Bmm9mhzCdKoQ8bQ", "type": "message", "role": "assistant", "model": "claude-3-7-sonnet-20250219", "content": [ { "type": "text", "text": "Relevant quotes:\n[1] \"This is particularly useful for applications such as customer service assistants, where uncomplicated queries can be handled by smaller, faster, and more cost-effective models, and complex queries are routed to more capable models.\"\n\n[2] \"This is especially valuable for applications that repeatedly use the same context, such as document Q&A systems where users ask multiple questions about the same document or coding assistants that need to maintain context about code files.\"\n\n[3] \"During the preview, you can use the default prompt routers for Anthropic's Claude and Meta Llama model families.\"\n\nAnswer:\nThe document describes two main features with different use cases:\n\n1. Intelligent Prompt Routing:\n- Customer service applications where query complexity varies\n- Applications needing to balance between cost and performance\n- Systems that can benefit from using different models from the same family (Claude or Llama) based on query complexity [1][3]\n\n2. Prompt Caching:\n- Document Q&A systems where users ask multiple questions about the same document\n- Coding assistants that need to maintain context about code files\n- Applications that frequently reuse the same context in prompts [2]\n\nBoth features are designed to optimize costs and reduce latency while maintaining response quality. Prompt routing can reduce costs by up to 30% without compromising accuracy, while prompt caching can reduce costs by up to 90% and latency by up to 85% for supported models." } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 10, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 37209, "output_tokens": 324 } } 37,209 ããŒã¯ã³ã¯ãã£ãã·ã¥ããèªã¿èŸŒãŸããããã¥ã¡ã³ããšæç€ºå
容ã«å¯Ÿå¿ãããŠãŒã¶ãŒã¯ãšãªéšå㯠10 ããŒã¯ã³ãšãªã£ãŠããŸãããã®ç¶æ
ãæ¬¡ã®å³ã«ç€ºããŸãã å¥ã®ããã°èšäºã«ããã¥ã¡ã³ãã倿ŽããŠã¿ãŸãããããã ããæç€ºå
容ã¯åããŸãŸã«ããŸããä»åã®ãªã¯ãšã¹ãã®æ§é ã¯æç€ºéšåãããã¥ã¡ã³ãæ¬æãããåã«é
眮ãããŠãããããæç€ºéšåã®ããã³ãããã¬ãã£ãã¯ã¹ã«ã€ããŠã¯ãã£ãã·ã¥ããããæåŸ
ã§ããŸãã以äžã®ã³ãŒããã芧ãã ããïŒ response = requests.get(https://aws.amazon.com/blogs/machine-learning/enhance-conversational-ai-with-advanced-routing-techniques-with-amazon-bedrock/) blog = response.text chat_with_document(blog, "What is the blog writing about?") { "id": "msg_bdrk_011S8zqMXzoGHABHnXX9qSjq", "type": "message", "role": "assistant", "model": "claude-3-7-sonnet-20250219", "content": [ { "type": "text", "text": "Let me analyze this document and provide a comprehensive answer about its main topic and purpose.\n\nRelevant quotes:\n[1] \"When you're designing a security strategy for your organization, firewalls provide the first line of defense against threats. Amazon Web Services (AWS) offers AWS Network Firewall, a stateful, managed network firewall that includes intrusion detection and prevention (IDP) for your Amazon Virtual Private Cloud (VPC).\"\n\n[2] \"This blog post walks you through logging configuration best practices, discusses three common architectural patterns for Network Firewall logging, and provides guidelines for optimizing the cost of your logging solution.\"\n\n[3] \"Determining the optimal logging approach for your organization should be approached on a case-by-case basis. It involves striking a balance between your security and compliance requirements and the costs associated with implementing solutions to meet those requirements.\"\n\nAnswer:\nThis document is a technical blog post that focuses on cost considerations and logging options for AWS Network Firewall. The article aims to help organizations make informed decisions about implementing and managing their firewall logging solutions on AWS. Specifically, it:\n\n1. Explains different logging configuration practices for AWS Network Firewall [1]\n2. Discusses three main architectural patterns for handling firewall logs:\n - Amazon S3-based solution\n - Amazon CloudWatch-based solution\n - Amazon Kinesis Data Firehose with OpenSearch solution\n3. Provides detailed cost analysis and comparisons of different logging approaches [3]\n4. Offers guidance on balancing security requirements with cost considerations\n\nThe primary purpose is to help AWS users understand and optimize their firewall logging strategies while managing associated costs effectively. The article serves as a practical guide for organizations looking to implement or improve their network security logging while maintaining cost efficiency [2]." } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 9, "cache_creation_input_tokens": 37888, "cache_read_input_tokens": 1038, "output_tokens": 385 } } ã¬ã¹ãã³ã¹ã確èªãããšãæç€ºéšå㯠1,038 ããŒã¯ã³ããã£ãã·ã¥ããèªã¿åã£ãŠãããæ°ããããã¥ã¡ã³ãæ¬æã«ã€ããŠã¯ 37,888 ããŒã¯ã³ããã£ãã·ã¥ã«æžã蟌ãã§ããã®ãããããŸãããã®ç¶æ
ããæ¬¡ã®å³ã«ç€ºããŸãã ã³ã¹ãåæžå¹æ ãã£ãã·ã¥ããããçºçãããšãAmazon Bedrock ã¯ã³ã³ãã¥ãŒãã£ã³ã°ã®ç¯çŽåããã£ãã·ã¥ãããã³ã³ããã¹ãã®ããŒã¯ã³ããšã®å²åŒãšããŠã客æ§ã«éå
ããŸããæœåšçãªã³ã¹ãåæžå¹æãèšç®ããã«ã¯ããŸã Amazon Bedrock ã®ã¬ã¹ãã³ã¹ã«ãããã£ãã·ã¥æžã蟌㿠/ èªã¿åãã¡ããªã¯ã¹ã䜿çšããŠãããã³ãããã£ãã·ã¥ã®äœ¿çšãã¿ãŒã³ãææ¡ããå¿
èŠããããŸãããã®åŸã1,000 å
¥åããŒã¯ã³ãããã®äŸ¡æ Œ (ãã£ãã·ã¥æžã蟌ã¿) ãš 1,000 å
¥åããŒã¯ã³ãããã®äŸ¡æ Œ (ãã£ãã·ã¥èªã¿åã) ã䜿ã£ãŠæœåšçãªã³ã¹ãåæžå¹æãèšç®ã§ããŸãã詳ããäŸ¡æ Œæ
å ±ã«ã€ããŠã¯ã Amazon Bedrock ã®æé ãã芧ãã ããã ã¬ã€ãã³ã·ãŒãã³ãããŒã¯ ããã³ãããã£ãã·ã¥ã¯ãç¹°ãè¿ã䜿çšãããããã³ããã® TTFT ããã©ãŒãã³ã¹ãåäžãããããã«æé©åãããŠããŸãããã®æ©èœã¯ããã£ãããã¬ã€ã°ã©ãŠã³ãã®ãããªè€æ°åã®ããåãã䌎ãäŒè©±åã¢ããªã±ãŒã·ã§ã³ã«éåžžã«é©ããŠããŸãããŸãã倧ããªããã¥ã¡ã³ããç¹°ãè¿ãåç
§ããå¿
èŠããããŠãŒã¹ã±ãŒã¹ã«ã圹ç«ã¡ãŸãã ãã ãã2,000 ããŒã¯ã³ã«ãåã¶é·å€§ãªã·ã¹ãã ããã³ããã®åŸã«ãé »ç¹ã«å
容ãå€ããé·ãããã¹ããç¶ããããªã¯ãŒã¯ããŒãã§ã¯ãããã³ãããã£ãã·ã¥ã®å¹æãããŸãçºæ®ãããªãå ŽåããããŸãããã®ãããªç¶æ³ã§ã¯ããã£ãã·ã¥ã«ããã¡ãªãããéå®çã«ãªã£ãŠããŸããŸãã ããã³ãããã£ãã·ã¥ã®äœ¿ç𿹿³ãšãã³ãããŒã¯æ¹æ³ã«ã€ããŠã¯ã GitHub ãªããžã㪠ã«ããŒãããã¯ãå
¬éããŠããŸãããã³ãããŒã¯çµæã¯ãå
¥åããŒã¯ã³æ°ããã£ãã·ã¥ãããããŒã¯ã³æ°ãåºåããŒã¯ã³æ°ãªã©ããŠãŒã¹ã±ãŒã¹ã«ãã£ãŠç°ãªããŸãã Amazon Bedrock ã¯ãã¹ãªãŒãžã§ã³æšè« ããã³ãããã£ãã·ã¥ã¯ã ã¯ãã¹ãªãŒãžã§ã³æšè« (CRIS) ãšçµã¿åãããŠäœ¿çšã§ããŸããã¯ãã¹ãªãŒãžã§ã³æšè«ã¯ãæšè«ãªã¯ãšã¹ããåŠçããããã«å°ççã«æé©ãª AWS ãªãŒãžã§ã³ãèªåçã«éžæãããªãœãŒã¹ãšã¢ãã«ã®å¯çšæ§ãæå€§åããŸããéèŠãé«ãææã«ã¯ããããã®æé©åã«ãããã£ãã·ã¥æžã蟌ã¿ãå¢å ããå¯èœæ§ããããŸãã ã¡ããªã¯ã¹ãšãªãã¶ãŒãããªã㣠ããã³ãããã£ãã·ã¥ã®ãªãã¶ãŒãããªãã£ã¯ãAmazon Bedrock ã䜿çšããã¢ããªã±ãŒã·ã§ã³ã®ã³ã¹ãåæžãšã¬ã€ãã³ã·ãŒæ¹åã«äžå¯æ¬ ã§ããäž»èŠãªããã©ãŒãã³ã¹ã¡ããªã¯ã¹ãã¢ãã¿ãªã³ã°ããããšã§ãéçºè
ã¯é·ãããã³ããã® TTFT ãæå€§ 85% åæžããã³ã¹ããæå€§ 90% ã«ãããããšãã£ã倧å¹
ãªå¹çæ¹åãéæã§ããŸãããããã®ã¡ããªã¯ã¹ã¯ãéçºè
ããã£ãã·ã¥ããã©ãŒãã³ã¹ãæ£ç¢ºã«è©äŸ¡ãããã£ãã·ã¥ç®¡çã«é¢ããæŠç¥çãªæ±ºå®ãè¡ãããã«éèŠã§ãã Amazon Bedrock ã§ã®ã¢ãã¿ãªã³ã° Amazon Bedrock 㯠API ã¬ã¹ãã³ã¹ã® usage ã»ã¯ã·ã§ã³ã§ãã£ãã·ã¥ããã©ãŒãã³ã¹ããŒã¿ãæäŸããŠããŸããããã«ããéçºè
ã¯ããã£ãã·ã¥ãããçãããŒã¯ã³æ¶è²»éïŒèªã¿åããšæžã蟌ã¿ã®äž¡æ¹ïŒãã¬ã€ãã³ã·ãŒæ¹åãªã©ã®éèŠãªã¡ããªã¯ã¹ã远跡ã§ããŸãããããã®æ
å ±ãæŽ»çšããããšã§ãããŒã ã¯ãã£ãã·ã³ã°æŠç¥ã广çã«ç®¡çããã¢ããªã±ãŒã·ã§ã³ã®å¿çæ§ãé«ããéçšã³ã¹ããåæžã§ããŸãã Amazon CloudWatch ã§ã®ã¢ãã¿ãªã³ã° Amazon CloudWatch 㯠AWS ãµãŒãã¹ã®å¥å
šæ§ãšããã©ãŒãã³ã¹ãã¢ãã¿ãªã³ã°ããããã®åŒ·åãªãã©ãããã©ãŒã ã§ãã Amazon Bedrock ã¢ãã«å°çšã®æ°ããèªåããã·ã¥ããŒããå«ãŸããŠããŸãããããã®ããã·ã¥ããŒãã¯äž»èŠãªã¡ããªã¯ã¹ã«çŽ æ©ãã¢ã¯ã»ã¹ããã¢ãã«ã®ããã©ãŒãã³ã¹ãããæ·±ãçè§£ã§ããããã«ãªã£ãŠããŸãã ã«ã¹ã¿ã ãªãã¶ãŒãããªãã£ããã·ã¥ããŒããäœæããã«ã¯ã以äžã®æé ãå®è¡ããŸãïŒ CloudWatch ã³ã³ãœãŒã«ã§æ°ããããã·ã¥ããŒããäœæããŸãã詳ããæé ã«ã€ããŠã¯ã Improve visibility into Amazon Bedrock usage and performance with Amazon CloudWatch ãåç
§ãã ããã ããŒã¿ãœãŒã¹ã¿ã€ãæ¬ã® CloudWatch ãéžæããåæã®ãŠã£ãžã§ããã®ã¿ã€ããšã㊠å ãéžæããŸã ( ããã¯åŸã§èª¿æŽå¯èœã§ã ) ã ã¡ããªã¯ã¹ã®æéç¯å² ( 1 æéã 3 æéã 1 æ¥ãªã© ) ãã¢ãã¿ãªã³ã°ããŒãºã«åãããŠæŽæ°ããŸã AWS åå空éã§ Bedrock ãéžæããŸã æ€çŽ¢ããã¯ã¹ã«ã cache ããšå
¥åããŠãã£ãã·ã¥é¢é£ã®ã¡ããªã¯ã¹ããã£ã«ã¿ãªã³ã°ããŸã ã¢ãã«ã«ã€ããŠã¯ã anthropic.claude-3-7-sonnet-20250219-v1:0 ãèŠã€ãã CacheWriteInputTokenCount ãš CacheReadInputTokenCount ã®äž¡æ¹ãéžæããŸã ããŠã£ãžã§ããã®äœæããéžæãããã®åŸãä¿åããéžãã§ããã·ã¥ããŒããä¿åããŸã 以äžã¯ããã®ãŠã£ãžã§ãããäœæããããã®ãµã³ãã« JSON èšå®ã§ãïŒ { "view": "pie", "metrics": [ [ "AWS/Bedrock", "CacheReadInputTokenCount" ], [ ".", "CacheWriteInputTokenCount" ] ], "region": "us-west-2", "setPeriodToTimeRange": true } ãã£ãã·ã¥ãããçã®ææ¡ ãã£ãã·ã¥ãããçãåæããã«ã¯ã CacheReadInputTokens ãš CacheWriteInputTokens ã®äž¡æ¹ã®ã¡ããªã¯ã¹ã確èªããå¿
èŠããããŸããäžå®æéã«ããã£ãŠãããã®ã¡ããªã¯ã¹ãéèšããããšã§ããã£ãã·ã³ã°æŠç¥ã®å¹çã«ã€ããŠã®æŽå¯ãåŸãããšãã§ããŸãã Amazon Bedrock æéããŒãž ã«æ²èŒãããŠããã¢ãã«åºæã® 1,000 å
¥åããŒã¯ã³ãããã®äŸ¡æ ŒïŒãã£ãã·ã¥æžã蟌ã¿ïŒãš 1,000 å
¥åããŒã¯ã³ãããã®äŸ¡æ ŒïŒãã£ãã·ã¥èªã¿åãïŒã䜿çšããã°ãç¹å®ã®ãŠãŒã¹ã±ãŒã¹ã®æœåšçãªã³ã¹ãåæžãèŠç©ããããšãã§ããŸãã ãŸãšã ãã®èšäºã§ã¯ã Amazon Bedrock ã®ããã³ãããã£ãã·ã¥ã«ã€ããŠããã®ä»çµã¿ã䜿çšã¹ãå Žé¢ã广çãªæŽ»çšæ¹æ³ã玹ä»ããŸãããããªãã®ãŠãŒã¹ã±ãŒã¹ããã®æ©èœã®æ©æµãåãããã©ãããæ
éã«è©äŸ¡ããããšãéèŠã§ããããã³ããã®æ§é ã工倫ããããšãéçã³ã³ãã³ããšåçã³ã³ãã³ãã®éããçè§£ããããšããããŠç¹å®ã®ããŒãºã«åã£ãé©åãªãã£ãã·ã³ã°æŠç¥ãéžæããããšãéèŠã§ããCloudWatch ã¡ããªã¯ã¹ã䜿çšããŠãã£ãã·ã¥ããã©ãŒãã³ã¹ãã¢ãã¿ãªã³ã°ãããã®èšäºã§èª¬æããå®è£
ãã¿ãŒã³ã«åŸãããšã§ãé«ãããã©ãŒãã³ã¹ãç¶æããªãããããå¹ççã§ã³ã¹ã广ã®é«ã AI ã¢ããªã±ãŒã·ã§ã³ãæ§ç¯ã§ããŸãã Amazon Bedrock ã®ããã³ãããã£ãã·ã¥ã®äœ¿ãæ¹ã®è©³çްã«ã€ããŠã¯ã Prompt caching for faster model inference ãåç
§ãã ããã èè
ã«ã€ã㊠Sharon Li ã¯ãããµãã¥ãŒã»ããå·ãã¹ãã³ãæ ç¹ãšãã Amazon Web Services (AWS) ã® AI/ML ã¹ãã·ã£ãªã¹ããœãªã¥ãŒã·ã§ã³ã¢ãŒããã¯ãã§ããæå
端æè¡ã®æŽ»çšã«æ
ç±ãæã¡ã AWS ã¯ã©ãŠããã©ãããã©ãŒã ã§é©æ°çãªçæ AI ãœãªã¥ãŒã·ã§ã³ã®éçºãšå°å
¥ã«åãçµãã§ããŸãã Shreyas Subramanian ã¯ãããªã³ã·ãã«ããŒã¿ãµã€ãšã³ãã£ã¹ããšããŠãçæ AI ãšãã£ãŒãã©ãŒãã³ã°ã掻çšã㊠AWS ãµãŒãã¹ã䜿çšããããžãã¹èª²é¡ã®è§£æ±ºãæ¯æŽããŠããŸããå€§èŠæš¡æé©åãšæ©æ¢°åŠç¿ã®ããã¯ã°ã©ãŠã³ããæã¡ãæé©åã¿ã¹ã¯ã®å éã«æ©æ¢°åŠç¿ãšåŒ·ååŠç¿ã䜿çšããŠããŸãã Satveer Khurpa ã¯ã Amazon Web Services ã®ã·ã㢠WW ã¹ãã·ã£ãªã¹ããœãªã¥ãŒã·ã§ã³ã¢ãŒããã¯ãã§ããã Amazon Bedrock ã»ãã¥ãªãã£ãå°éãšããŠããŸããã¯ã©ãŠãããŒã¹ã®ã¢ãŒããã¯ãã£ã«é¢ããå°éç¥èãæŽ»ãããããŸããŸãªæ¥çã®ã¯ã©ã€ã¢ã³ãåãã«é©æ°çãªçæ AI ãœãªã¥ãŒã·ã§ã³ãéçºããŠããŸããçæ AI æè¡ãšã»ãã¥ãªãã£ååãžã®æ·±ãçè§£ã«ãããå
ç¢ãªã»ãã¥ãªãã£äœå¶ãç¶æããªãããæ°ããªããžãã¹æ©äŒãéæããå®è³ªçãªäŸ¡å€ãæšé²ããã¹ã±ãŒã©ãã«ã§å®å
šãã€è²¬ä»»ããã¢ããªã±ãŒã·ã§ã³ã®èšèšãè¡ã£ãŠããŸãã Kosta Belz ã¯ã AWS Generative AI Innovation Center ã®ã·ãã¢å¿çšç§åŠè
ãšããŠãã客æ§ãäž»èŠãªããžãã¹èª²é¡ã解決ããããã®çæ AI ãœãªã¥ãŒã·ã§ã³ã®èšèšãšæ§ç¯ãæ¯æŽããŠããŸãã Sean Eichenberger ã¯ã AWS ã®ã·ãã¢ãããã¯ããããŒãžã£ãŒã§ãã