10 æ 28 æ¥ããšãŒãžã§ã³ãã£ã㯠æ€çŽ¢æ¡åŒµçæ (RAG) ããã³ã»ãã³ãã£ãã¯æ€çŽ¢ã¢ããªã±ãŒã·ã§ã³åãã®æå
端ã®ãã«ãã¢ãŒãã«åã蟌ã¿ã¢ãã«ã§ãã Amazon Nova Multimodal Embeddings ãã玹ä»ããŸãããã®ã¢ãã«ã¯ Amazon Bedrock ã§ãå©çšããã ããŸããããã¯ãããã¹ããããã¥ã¡ã³ããç»åãåç»ãé³å£°ãåäžã®ã¢ãã«ãéããŠãµããŒãããæ¥µããŠé«ã粟床ã®ã¯ãã¹ã¢ãŒãã«æ€çŽ¢ãå¯èœã«ããåã®çµ±ååã蟌ã¿ã¢ãã«ã§ãã åã蟌ã¿ã¢ãã«ã¯ãããã¹ããç»åãé³å£°ã®å
¥åãã åã蟌㿠ãšåŒã°ããæ°å€è¡šçŸã«å€æããŸãããããã®åã蟌ã¿ã¯ãAI ã·ã¹ãã ãæ¯èŒãæ€çŽ¢ãåæã§ããããã«å
¥åã®æå³ãæããã»ãã³ãã£ãã¯æ€çŽ¢ã RAG ãªã©ã®ãŠãŒã¹ã±ãŒã¹ã匷åããŸãã çµç¹ã¯ãããã¹ããç»åãããã¥ã¡ã³ããåç»ãé³å£°ã³ã³ãã³ãã«åæ£ããŠãããå¢å€§ãç¶ããéæ§é åããŒã¿ããã€ã³ãµã€ããåŒãåºããœãªã¥ãŒã·ã§ã³ããŸããŸãæ±ããŠããŸããäŸãã°ãçµç¹ã«ã¯ã補åã®ç»åãã€ã³ãã©ã°ã©ãã£ãã¯ãšããã¹ããå«ããã³ãã¬ããããŠãŒã¶ãŒãã¢ããããŒãããåç»ã¯ãªãããªã©ãååšããå ŽåããããŸããåã蟌ã¿ã¢ãã«ã¯éæ§é åããŒã¿ãã䟡å€ãåŒãåºãããšãã§ããŸãããåŸæ¥ã®ã¢ãã«ã¯éåžžã1 ã€ã®ã³ã³ãã³ãã¿ã€ããåŠçããããã«ç¹åããŠããŸãããã®å¶éã«ãããã客æ§ã¯ãè€éãªã¯ãã¹ã¢ãŒãã«åã蟌ã¿ãœãªã¥ãŒã·ã§ã³ãæ§ç¯ãããããŸãã¯åäžã®ã³ã³ãã³ãã¿ã€ãã«ç¹åãããŠãŒã¹ã±ãŒã¹ã«å¶éããããåŸãªããªããŸãããŸãããã®åé¡ã¯ãããã¹ããšç»åãã€ã³ã¿ãŒãªãŒããããããã¥ã¡ã³ãããããžã¥ã¢ã«ãé³å£°ãããã¹ãèŠçŽ ãå«ãåç»ãªã©ãæ··åã¢ãŒãã«ã®ã³ã³ãã³ãã¿ã€ãã«ãåœãŠã¯ãŸããŸãããããã®ã³ã³ãã³ãã¿ã€ãã§ã¯ãæ¢åã®ã¢ãã«ã§ã¯ãã¯ãã¹ã¢ãŒãã«é¢ä¿ã广çã«æããããšãå°é£ã§ãã Nova Multimodal Embeddings ã¯ãæ··åã¢ãŒãã«ã³ã³ãã³ãéã®ã¯ãã¹ã¢ãŒãã«æ€çŽ¢ãåç
§ç»åã䜿ã£ãæ€çŽ¢ãããžã¥ã¢ã«ããã¥ã¡ã³ãã®ååŸãªã©ã®ãŠãŒã¹ã±ãŒã¹ã«ãããŠãããã¹ããããã¥ã¡ã³ããç»åãåç»ãé³å£°ã®ããã®çµ±åã»ãã³ãã£ãã¯ç©ºéããµããŒãããŸãã Amazon Nova Multimodal Embeddings ã®ããã©ãŒãã³ã¹ã®è©äŸ¡ å¹
åºããã³ãããŒã¯ã§ã¢ãã«ãè©äŸ¡ããçµæãæ¬¡ã®è¡šã«ç€ºããšãããããã«æŽ»çšã§ããæ¥µããŠåªãã粟床ãå®çŸããŸããã Nova Multimodal Embeddings ã¯ãæå€§ 8K ããŒã¯ã³ã®ã³ã³ããã¹ãé·ãæå€§ 200 èšèªã®ããã¹ãããµããŒãããåæããã³éåæ API ãä»ããŠå
¥åãåãä»ããŸããããã«ãã»ã°ã¡ã³ããŒã·ã§ã³ (ããã£ã³ãã³ã°ããšãåŒã°ããŸã) ããµããŒãããé·æã®ããã¹ããåç»ãé³å£°ã³ã³ãã³ããæ±ããããã»ã°ã¡ã³ãã«ããŒãã£ã·ã§ãã³ã°ããåéšåã®åã蟌ã¿ãçæããŸããæåŸã«ããã®ã¢ãã«ã¯ 4 ã€ã®åºååã蟌ã¿ãã£ã¡ã³ã·ã§ã³ãæäŸããŸãããããã®åã蟌ã¿ãã£ã¡ã³ã·ã§ã³ã¯ã Matryoshka Representation Learning (MRL) ã䜿çšããŠãã¬ãŒãã³ã°ãããŠããã粟床ã®å€åãæå°éã«æããªãããäœã¬ã€ãã³ã·ãŒã®ãšã³ãããŒãšã³ãæ€çŽ¢ãå¯èœã«ããŸãã å®éã«æ°ããã¢ãã«ãã©ã®ããã«äœ¿çšã§ããã®ããèŠãŠã¿ãŸãããã Amazon Nova Multimodal Embeddings ã®äœ¿çš Nova Multimodal Embeddings ã®éå§æ¹æ³ã¯ã Amazon Bedrock ã®ä»ã®ã¢ãã« ãšåããã¿ãŒã³ã«åŸããŸãããã®ã¢ãã«ã¯ãããã¹ããããã¥ã¡ã³ããç»åãåç»ããŸãã¯é³å£°ãå
¥åãšããŠåãå
¥ããã»ãã³ãã£ãã¯æ€çŽ¢ãé¡äŒŒåºŠæ¯èŒããŸã㯠RAG ã«äœ¿çšã§ããæ°å€åã蟌ã¿ãè¿ããŸãã ããã§ã¯ã AWS SDK for Python (Boto3) ã䜿çšããŠãããŸããŸãªã³ã³ãã³ãã¿ã€ãããåã蟌ã¿ãäœæããåŸã§ååŸã§ããããã«ä¿åããæ¹æ³ã瀺ãå®çšçãªäŸã瀺ããŸããç°¡æœã«ããããã«ãåã蟌ã¿ã®ä¿åãšæ€çŽ¢ã«ã¯ãããããèŠæš¡ã®ãã¯ãã«ã®ä¿åãšã¯ãšãªããã€ãã£ãã«ãµããŒãããã³ã¹ãæé©åã¹ãã¬ãŒãžã§ãã Amazon S3 Vectors ã䜿çšããŸãã ãŸãã¯åºæ¬ãããªãã¡ãããã¹ããåã蟌ã¿ã«å€æããããšããå§ããŸãããããã®äŸã§ã¯ãã·ã³ãã«ãªããã¹ãèšè¿°ãããã®æå³è«çæå³ãæããæ°å€è¡šçŸã«å€æããæ¹æ³ã瀺ããŸãããããã®åã蟌ã¿ã¯ãåŸã§ããã¥ã¡ã³ããç»åãåç»ãé³å£°ããã®åã蟌ã¿ãšæ¯èŒããããšã§ãé¢é£ã³ã³ãã³ããèŠã€ããããšãã§ããŸãã ã³ãŒããçè§£ããããããããã«ãäžåºŠã«ç€ºãã®ã¯ã¹ã¯ãªããã®äžéšã«ãšã©ããŸããå®å
šãªã¹ã¯ãªããã¯ãã®ãŠã©ãŒã¯ã¹ã«ãŒã®æåŸã«å«ãŸããŠããŸãã import json import base64 import time import boto3 MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0" EMBEDDING_DIMENSION = 3072 # Amazon Bedrock ã©ã³ã¿ã€ã ã¯ã©ã€ã¢ã³ããåæåããŸã bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1") print(f"Generating text embedding with {MODEL_ID} ...") # åã蟌ãããã¹ã text = "Amazon Nova is a multimodal foundation model" # åã蟌ã¿ãäœæããŸã request_body = { "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": text}, }, } response = bedrock_runtime.invoke_model( body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json", ) # åã蟌ã¿ãæœåºããŸã response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] print(f"Generated embedding with {len(embedding)} dimensions") 次ã«ãã¹ã¯ãªãããšåããã©ã«ãã«ãã photo.jpg ãã¡ã€ã«ã䜿çšããŠãåãåã蟌ã¿ç©ºéã§ããžã¥ã¢ã«ã³ã³ãã³ããåŠçããŸããããã¯ããã«ãã¢ãŒããªãã£ã®åã瀺ããŠããŸããNova Multimodal Embeddings ã¯ãããã¹ããšããžã¥ã¢ã«ã®äž¡æ¹ã®ã³ã³ããã¹ããåäžã®åã蟌ã¿ã«åã蟌ãããšãã§ããããã¥ã¡ã³ããããæ·±ãçè§£ã§ããããã«ããŸãã Nova Multimodal Embeddings ã¯ãäœ¿çšæ¹æ³ã«åãããŠæé©åãããåã蟌ã¿ãçæã§ããŸããæ€çŽ¢ãŸãã¯ååŸã®ãŠãŒã¹ã±ãŒã¹ã®ããã«ã€ã³ããã¯ã¹ãäœæããå Žåã embeddingPurpose ã GENERIC_INDEX ã«èšå®ã§ããŸããã¯ãšãªã¹ãããã§ã¯ãååŸããé
ç®ã®ã¿ã€ãã«å¿ã㊠embeddingPurpose ãèšå®ã§ããŸããäŸãã°ãããã¥ã¡ã³ããååŸããå Žåã embeddingPurpose ã DOCUMENT_RETRIEVAL ã«èšå®ã§ããŸãã # ç»åãèªã¿åã£ãŠãšã³ã³ãŒãããŸã print(f"Generating image embedding with {MODEL_ID} ...") with open("photo.jpg", "rb") as f: image_bytes = base64.b64encode(f.read()).decode("utf-8") # åã蟌ã¿ãäœæããŸã request_body = { "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "image": { "format": "jpeg", "source": {"bytes": image_bytes} }, }, } response = bedrock_runtime.invoke_model( body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json", ) # åã蟌ã¿ãæœåºããŸã response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] print(f"Generated embedding with {len(embedding)} dimensions") åç»ã³ã³ãã³ãã®åŠçã«ã¯ãéåæ API ã䜿çšããŸããããã¯ã Base64 ãšããŠãšã³ã³ãŒãããããšãã« 25 MB ãè¶
ããåç»ã®èŠä»¶ã§ãããŸããåã AWS ãªãŒãžã§ã³ å
ã® S3 ãã±ããã«ããŒã«ã«åç»ãã¢ããããŒãããŸãã aws s3 cp presentation.mp4 s3://my-video-bucket/videos/ ãã®äŸã§ã¯ãåç»ãã¡ã€ã«ã®ããžã¥ã¢ã«ãšé³å£°ã®äž¡æ¹ã®ã³ã³ããŒãã³ãããåãèŸŒã¿æ
å ±ãæœåºããæ¹æ³ã瀺ããŸããã»ã°ã¡ã³ããŒã·ã§ã³ç¹åŸŽéã«ãããé·ãåç»ãæ±ãããããã£ã³ã¯ã«åå²ããããããäœæéã«ãåã¶ã³ã³ãã³ããå¹ççã«æ€çŽ¢ã§ããŸãã # Amazon S3 ã¯ã©ã€ã¢ã³ããåæåããŸã s3 = boto3.client("s3", region_name="us-east-1") print(f"Generating video embedding with {MODEL_ID} ...") # Amazon S3 URI S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4" S3_EMBEDDING_DESTINATION_URI = "s3://my-embedding-destination-bucket/embeddings-output/" # é³å£°ä»ãåç»ã®éåæåã蟌ã¿ãžã§ããäœæããŸã model_input = { "taskType": "SEGMENTED_EMBEDDING", "segmentedEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "video": { "format": "mp4", "embeddingMode": "AUDIO_VIDEO_COMBINED", "source": { "s3Location": {"uri": S3_VIDEO_URI} }, "segmentationConfig": { "durationSeconds": 15 # 15 ç§åäœã®ãã£ã³ã¯ã«ã»ã°ã¡ã³ãåããŸã }, }, }, } response = bedrock_runtime.start_async_invoke( modelId=MODEL_ID, modelInput=model_input, outputDataConfig={ "s3OutputDataConfig": { "s3Uri": S3_EMBEDDING_DESTINATION_URI } }, ) invocation_arn = response["invocationArn"] print(f"Async job started: {invocation_arn}") # ãžã§ããå®äºãããŸã§ããŒãªã³ã°ããŸã print("\nPolling for job completion...") while True: job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn) status = job["status"] print(f"Status: {status}") if status != "InProgress": break time.sleep(15) # ãžã§ããæ£åžžã«å®äºãããã©ããããã§ãã¯ããŸã if status == "Completed": output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"] print(f"\nSuccess! Embeddings at: {output_s3_uri}") # S3 URI ãè§£æããŠãã±ãããšãã¬ãã£ãã¯ã¹ãååŸããŸã s3_uri_parts = output_s3_uri[5:].split("/", 1) # ãã¬ãã£ãã¯ã¹ãs3://ããåé€ããŸã bucket = s3_uri_parts[0] prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else "" # AUDIO_VIDEO_COMBINED ã¢ãŒãã¯ãembedding-audio-video.jsonl ã«åºåããŸã # output_s3_uri ã«ã¯æ¢ã«ãžã§ã ID ãå«ãŸããŠããããããã¡ã€ã«åãä»å ããã ãã§ã embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/") print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}") # JSONL ãã¡ã€ã«ãèªã¿åã£ãŠè§£æããŸã response = s3.get_object(Bucket=bucket, Key=embeddings_key) content = response['Body'].read().decode('utf-8') embeddings = [] for line in content.strip().split('\n'): if line: embeddings.append(json.loads(line)) print(f"\nFound {len(embeddings)} video segments:") for i, segment in enumerate(embeddings): print(f" Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s") print(f" Embedding dimension: {len(segment.get('embedding', []))}") else: print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}") åã蟌ã¿ãçæããããããããå¹ççã«ä¿åããã³æ€çŽ¢ããããã®å Žæãå¿
èŠã§ãããã®äŸã§ã¯ãå€§èŠæš¡ãªé¡äŒŒæ§æ€çŽ¢ã«å¿
èŠãšãªãã€ã³ãã©ã¹ãã©ã¯ãã£ãæäŸãã Amazon S3 Vectors ã䜿çšããŠãã¯ãã«ã¹ãã¢ãã»ããã¢ããããæ¹æ³ã瀺ããŸããããã¯ãæå³è«çã«é¡äŒŒããã³ã³ãã³ããèªç¶ã«ã¯ã©ã¹ã¿ãŒåããããæ€çŽ¢å¯èœãªã€ã³ããã¯ã¹ãäœæãããã®ãšèããŠãã ãããã€ã³ããã¯ã¹ã«åã蟌ã¿ã远å ããéãã¡ã¿ããŒã¿ã䜿çšããŠå
ã®åœ¢åŒãšã€ã³ããã¯ã¹ã®äœæå¯Ÿè±¡ã®ã³ã³ãã³ããæå®ããŸãã # Amazon S3 Vectors ã¯ã©ã€ã¢ã³ããåæåããŸã s3vectors = boto3.client("s3vectors", region_name="us-east-1") # èšå® VECTOR_BUCKET = "my-vector-store" INDEX_NAME = "embeddings" # ãã¯ãã«ãã±ãããšã€ã³ããã¯ã¹ãäœæããŸã (ååšããŠããªãå Žå) try: s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET) print(f"Vector bucket {VECTOR_BUCKET} already exists") except s3vectors.exceptions.NotFoundException: s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET) print(f"Created vector bucket: {VECTOR_BUCKET}") try: s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME) print(f"Vector index {INDEX_NAME} already exists") except s3vectors.exceptions.NotFoundException: s3vectors.create_index( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, dimension=EMBEDDING_DIMENSION, dataType="float32", distanceMetric="cosine" ) print(f"Created index: {INDEX_NAME}") texts = [ "Machine learning on AWS", "Amazon Bedrock provides foundation models", "S3 Vectors enables semantic search" ] print(f"\nGenerating embeddings for {len(texts)} texts...") # Amazon Nova ã䜿çšããŠåããã¹ãã®åã蟌ã¿ãçæããŸã vectors = [] for text in texts: response = bedrock_runtime.invoke_model( body=json.dumps({ "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": text} } }), modelId=MODEL_ID, accept="application/json", contentType="application/json" ) response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] vectors.append({ "key": f"text:{text[:50]}", # äžæã®èå¥å "data": {"float32": embedding}, "metadata": {"type": "text", "content": text} }) print(f" â Generated embedding for: {text}") # 1 åã®åŒã³åºãã§ãä¿åãããã¹ãŠã®ãã¯ãã«ã远å ããŸã s3vectors.put_vectors( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, vectors=vectors ) print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!") ãã®æåŸã®äŸã§ã¯ãåäžã®ã¯ãšãªã§ããŸããŸãªã³ã³ãã³ãã¿ã€ããæ€çŽ¢ããããã¹ããç»åãåç»ãé³å£°ã®ãããããçæããããã«ããããããæãé¡äŒŒããã³ã³ãã³ããèŠã€ããæ©èœã瀺ããŸããè·é¢ã¹ã³ã¢ã¯ãçµæãå
ã®ã¯ãšãªãšã©ã®çšåºŠé¢é£ããŠããã®ããçè§£ããã®ã«åœ¹ç«ã¡ãŸãã # ã¯ãšãªããããã¹ã query_text = "foundation models" print(f"\nGenerating embeddings for query '{query_text}' ...") # åã蟌ã¿ãçæããŸã response = bedrock_runtime.invoke_model( body=json.dumps({ "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_RETRIEVAL", "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": query_text} } }), modelId=MODEL_ID, accept="application/json", contentType="application/json" ) response_body = json.loads(response["body"].read()) query_embedding = response_body["embeddings"][0]["embedding"] print(f"Searching for similar embeddings...\n") # æãé¡äŒŒããäžäœ 5 ã€ã®ãã¯ãã«ãæ€çŽ¢ããŸã response = s3vectors.query_vectors( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, queryVector={"float32": query_embedding}, topK=5, returnDistance=True, returnMetadata=True ) # çµæã衚瀺ããŸã print(f"Found {len(response['vectors'])} results:\n") for i, result in enumerate(response["vectors"], 1): print(f"{i}. {result['key']}") print(f" Distance: {result['distance']:.4f}") if result.get("metadata"): print(f" Metadata: {result['metadata']}") print() ã¯ãã¹ã¢ãŒãã«æ€çŽ¢ã¯ããã«ãã¢ãŒãã«åã蟌ã¿ã®éèŠãªå©ç¹ã® 1 ã€ã§ããã¯ãã¹ã¢ãŒãã«æ€çŽ¢ã䜿çšãããšãããã¹ãã§ã¯ãšãªããŠãé¢é£ããç»åãèŠã€ããããšãã§ããŸãããŸããããã¹ãã®èª¬æã䜿çšããŠåç»ãæ€çŽ¢ããããç¹å®ã®ãããã¯ã«äžèŽããé³å£°ã¯ãªãããèŠã€ããããããžã¥ã¢ã«ãšããã¹ãã®ã³ã³ãã³ãã«åºã¥ããŠããã¥ã¡ã³ããæ€åºãããããããšãã§ããŸãããåèãŸã§ã«ããããŸã§ã®äŸããã¹ãŠãŸãšããå®å
šãªã¹ã¯ãªããããã¡ãã«ç€ºããŸã: import json import base64 import time import boto3 MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0" EMBEDDING_DIMENSION = 3072 # Amazon Bedrock ã©ã³ã¿ã€ã ã¯ã©ã€ã¢ã³ããåæåããŸã bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1") print(f"Generating text embedding with {MODEL_ID} ...") # åã蟌ãããã¹ã text = "Amazon Nova is a multimodal foundation model" # åã蟌ã¿ãäœæããŸã request_body = { "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": text}, }, } response = bedrock_runtime.invoke_model( body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json", ) # åã蟌ã¿ãæœåºããŸã response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] print(f"Generated embedding with {len(embedding)} dimensions") # ç»åãèªã¿åã£ãŠãšã³ã³ãŒãããŸã print(f"Generating image embedding with {MODEL_ID} ...") with open("photo.jpg", "rb") as f: image_bytes = base64.b64encode(f.read()).decode("utf-8") # åã蟌ã¿ãäœæããŸã request_body = { "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "image": { "format": "jpeg", "source": {"bytes": image_bytes} }, }, } response = bedrock_runtime.invoke_model( body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json", ) # åã蟌ã¿ãæœåºããŸã response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] print(f"Generated embedding with {len(embedding)} dimensions") # Amazon S3 ã¯ã©ã€ã¢ã³ããåæåããŸã s3 = boto3.client("s3", region_name="us-east-1") print(f"Generating video embedding with {MODEL_ID} ...") # Amazon S3 URI S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4" # Amazon S3 åºåãã±ãããšå Žæ S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/" # é³å£°ä»ãåç»ã®éåæåã蟌ã¿ãžã§ããäœæããŸã model_input = { "taskType": "SEGMENTED_EMBEDDING", "segmentedEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "video": { "format": "mp4", "embeddingMode": "AUDIO_VIDEO_COMBINED", "source": { "s3Location": {"uri": S3_VIDEO_URI} }, "segmentationConfig": { "durationSeconds": 15 # 15 ç§åäœã®ãã£ã³ã¯ã«ã»ã°ã¡ã³ãåããŸã }, }, }, } response = bedrock_runtime.start_async_invoke( modelId=MODEL_ID, modelInput=model_input, outputDataConfig={ "s3OutputDataConfig": { "s3Uri": S3_EMBEDDING_DESTINATION_URI } }, ) invocation_arn = response["invocationArn"] print(f"Async job started: {invocation_arn}") # ãžã§ããå®äºãããŸã§ããŒãªã³ã°ããŸã print("\nPolling for job completion...") while True: job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn) status = job["status"] print(f"Status: {status}") if status != "InProgress": break time.sleep(15) # ãžã§ããæ£åžžã«å®äºãããã©ããããã§ãã¯ããŸã if status == "Completed": output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"] print(f"\nSuccess! Embeddings at: {output_s3_uri}") # S3 URI ãè§£æããŠãã±ãããšãã¬ãã£ãã¯ã¹ãååŸããŸã s3_uri_parts = output_s3_uri[5:].split("/", 1) # ãã¬ãã£ãã¯ã¹ãs3://ããåé€ããŸã bucket = s3_uri_parts[0] prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else "" # AUDIO_VIDEO_COMBINED ã¢ãŒãã¯ãembedding-audio-video.jsonl ã«åºåããŸã # output_s3_uri ã«ã¯æ¢ã«ãžã§ã ID ãå«ãŸããŠããããããã¡ã€ã«åãä»å ããã ãã§ã embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/") print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}") # JSONL ãã¡ã€ã«ãèªã¿åã£ãŠè§£æããŸã response = s3.get_object(Bucket=bucket, Key=embeddings_key) content = response['Body'].read().decode('utf-8') embeddings = [] for line in content.strip().split('\n'): if line: embeddings.append(json.loads(line)) print(f"\nFound {len(embeddings)} video segments:") for i, segment in enumerate(embeddings): print(f" Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s") print(f" Embedding dimension: {len(segment.get('embedding', []))}") else: print(f"\nJob failed: {job.get('failureMessage', 'Unknown error')}") # Amazon S3 Vectors ã¯ã©ã€ã¢ã³ããåæåããŸã s3vectors = boto3.client("s3vectors", region_name="us-east-1") # èšå® VECTOR_BUCKET = "my-vector-store" INDEX_NAME = "embeddings" # ãã¯ãã«ãã±ãããšã€ã³ããã¯ã¹ãäœæããŸã (ååšããŠããªãå Žå) try: s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET) print(f"Vector bucket {VECTOR_BUCKET} already exists") except s3vectors.exceptions.NotFoundException: s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET) print(f"Created vector bucket: {VECTOR_BUCKET}") try: s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME) print(f"Vector index {INDEX_NAME} already exists") except s3vectors.exceptions.NotFoundException: s3vectors.create_index( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, dimension=EMBEDDING_DIMENSION, dataType="float32", distanceMetric="cosine" ) print(f"Created index: {INDEX_NAME}") texts = [ "Machine learning on AWS", "Amazon Bedrock provides foundation models", "S3 Vectors enables semantic search" ] print(f"\nGenerating embeddings for {len(texts)} texts...") # Amazon Nova ã䜿çšããŠåããã¹ãã®åã蟌ã¿ãçæããŸã vectors = [] for text in texts: response = bedrock_runtime.invoke_model( body=json.dumps({ "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_INDEX", "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": text} } }), modelId=MODEL_ID, accept="application/json", contentType="application/json" ) response_body = json.loads(response["body"].read()) embedding = response_body["embeddings"][0]["embedding"] vectors.append({ "key": f"text:{text[:50]}", # äžæã®èå¥å "data": {"float32": embedding}, "metadata": {"type": "text", "content": text} }) print(f" â Generated embedding for: {text}") # 1 åã®åŒã³åºãã§ãä¿åãããã¹ãŠã®ãã¯ãã«ã远å ããŸã s3vectors.put_vectors( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, vectors=vectors ) print(f"\nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!") # ã¯ãšãªããããã¹ã query_text = "foundation models" print(f"\nGenerating embeddings for query '{query_text}' ...") # åã蟌ã¿ãçæããŸã response = bedrock_runtime.invoke_model( body=json.dumps({ "taskType": "SINGLE_EMBEDDING", "singleEmbeddingParams": { "embeddingPurpose": "GENERIC_RETRIEVAL", "embeddingDimension": EMBEDDING_DIMENSION, "text": {"truncationMode": "END", "value": query_text} } }), modelId=MODEL_ID, accept="application/json", contentType="application/json" ) response_body = json.loads(response["body"].read()) query_embedding = response_body["embeddings"][0]["embedding"] print(f"Searching for similar embeddings...\n") # æãé¡äŒŒããäžäœ 5 ã€ã®ãã¯ãã«ãæ€çŽ¢ããŸã response = s3vectors.query_vectors( vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, queryVector={"float32": query_embedding}, topK=5, returnDistance=True, returnMetadata=True ) # çµæã衚瀺ããŸã print(f"Found {len(response['vectors'])} results:\n") for i, result in enumerate(response["vectors"], 1): print(f"{i}. {result['key']}") print(f" Distance: {result['distance']:.4f}") if result.get("metadata"): print(f" Metadata: {result['metadata']}") print() æ¬çªã¢ããªã±ãŒã·ã§ã³ã§ã¯ãåã蟌ã¿ã¯ä»»æã®ãã¯ãã«ããŒã¿ããŒã¹ã«ä¿åã§ããŸãã Amazon OpenSearch Service ã¯ããªãªãŒã¹æã« Nova Multimodal Embeddings ãšã®ãã€ãã£ãçµ±åãæäŸããŠãããããã¹ã±ãŒã©ãã«ãªæ€çŽ¢ã¢ããªã±ãŒã·ã§ã³ãç°¡åã«æ§ç¯ã§ããŸããåã®äŸã§ç€ºããããã«ã Amazon S3 Vectors ã䜿çšãããšãã¢ããªã±ãŒã·ã§ã³ããŒã¿ã䜿çšããŠåã蟌ã¿ãç°¡åã«ä¿åããã³ã¯ãšãªã§ããŸãã ç¥ã£ãŠããã¹ãããš Nova Multimodal Embeddings ã¯ã3,072ã1,024ã384ã256 ã® 4 ã€ã®åºåãã£ã¡ã³ã·ã§ã³ãªãã·ã§ã³ãæäŸããŸãããã£ã¡ã³ã·ã§ã³ã倧ããã»ã©è©³çްãªè¡šçŸãåŸãããŸãããããå€ãã®ã¹ãã¬ãŒãžãšèšç®ãå¿
èŠã«ãªããŸãããã£ã¡ã³ã·ã§ã³ãå°ããã»ã©ãæ€çŽ¢ããã©ãŒãã³ã¹ãšãªãœãŒã¹å¹çã®å®çšçãªãã©ã³ã¹ãå®çŸã§ããŸãããã®æè»æ§ã¯ãç¹å®ã®ã¢ããªã±ãŒã·ã§ã³ãšã³ã¹ãèŠä»¶ã«åãããŠæé©åããã®ã«åœ¹ç«ã¡ãŸãã ãã®ã¢ãã«ã¯ãããªãé·ãã³ã³ããã¹ããåŠçã§ããŸããããã¹ãå
¥åã§ã¯ãäžåºŠã«æå€§ 8,192 ããŒã¯ã³ãåŠçã§ããŸããåç»ãšé³å£°ã®å
¥åã¯æå€§ 30 ç§ã®ã»ã°ã¡ã³ãããµããŒãããã¢ãã«ã¯ããé·ããã¡ã€ã«ãã»ã°ã¡ã³ãåã§ããŸãããã®ã»ã°ã¡ã³ããŒã·ã§ã³æ©èœã¯ãç¹ã«å€§å®¹éã®ã¡ãã£ã¢ãã¡ã€ã«ãæ±ãéã«åœ¹ç«ã¡ãŸããã¢ãã«ã¯ãã¡ã€ã«ãæ±ãããããµã€ãºã«åå²ããåã»ã°ã¡ã³ãã®åã蟌ã¿ãäœæããŸãã ãã®ã¢ãã«ã«ã¯ãAmazon Bedrock ã«çµã¿èŸŒãŸãã責任ãã AI ã®æ©èœãå«ãŸããŠããŸããåã蟌ã¿çšã«éä¿¡ãããã³ã³ãã³ãã¯ãAmazon Bedrock ã®ã³ã³ãã³ãã»ãŒããã£ãã£ã«ã¿ãŒãééããŸãããŸããã¢ãã«ã«ã¯ããã€ã¢ã¹ãäœæžããããã®å
¬å¹³æ§å¯Ÿçãå«ãŸããŠããŸãã ã³ãŒãäŸã§èª¬æãããŠããããã«ããã®ã¢ãã«ã¯åæ API ãšéåæ API ã®äž¡æ¹ãéããŠåŒã³åºãããšãã§ããŸããåæ API ã¯ãæ€çŽ¢ã€ã³ã¿ãŒãã§ã€ã¹ã§ã®ãŠãŒã¶ãŒã¯ãšãªã®åŠçãªã©ã峿ã®å¿çãå¿
èŠãªãªã¢ã«ã¿ã€ã ã¢ããªã±ãŒã·ã§ã³ã«é©ããŠããŸããéåæ API ã¯ãã¬ã€ãã³ã·ãŒã®åœ±é¿ãå°ããã¯ãŒã¯ããŒããããå¹ççã«åŠçãããããåç»ãªã©ã®å€§å®¹éã³ã³ãã³ãã®åŠçã«é©ããŠããŸãã å©çšå¯èœãªãªãŒãžã§ã³ãšæé Amazon Nova Multimodal Embeddings ã¯ãç±³åœæ±éš (ããŒãžãã¢åéš) ã® AWS ãªãŒãžã§ã³ ã® Amazon Bedrock ã§æ¬æ¥ãããå©çšããã ããŸããæéã®è©³çްã«ã€ããŠã¯ã Amazon Bedrock ã®æéããŒãž ã«ã¢ã¯ã»ã¹ããŠãã ããã 詳ããã¯ãå
æ¬çãªããã¥ã¡ã³ãã«ã€ããŠã¯ã Amazon Nova ãŠãŒã¶ãŒã¬ã€ã ããå®çšçãªã³ãŒãäŸã«ã€ããŠã¯ GitHub ã® Amazon Nova ã¢ãã«ã¯ãã¯ãã㯠ãã芧ãã ããã Amazon Q Developer ã Kiro ãªã©ã® AI ãå©çšããã¢ã·ã¹ã¿ã³ãããœãããŠã§ã¢éçºã«äœ¿çšããŠããå Žåã¯ãAI ã¢ã·ã¹ã¿ã³ãã AWS ã®ãµãŒãã¹ããªãœãŒã¹ãšã€ã³ã¿ã©ã¯ã·ã§ã³ããã®ã«åœ¹ç«ã€ããã« AWS API MCP ãµãŒã㌠ãã»ããã¢ããããããææ°ã®ããã¥ã¡ã³ããã³ãŒããµã³ãã«ãAWS API ãš CloudFormation ãªãœãŒã¹ã®ãªãŒãžã§ã³ã¬ãã«ã®å¯çšæ§ã«é¢ãããã¬ããžãæäŸããããã« AWS Knowledge MCP ãµãŒã㌠ãã»ããã¢ãããããã§ããŸãã ä»ãã Nova Multimodal Embeddings ã䜿çšããŠãã«ãã¢ãŒãã« AI ãå©çšããã¢ããªã±ãŒã·ã§ã³ã®æ§ç¯ãéå§ãã AWS re:Post for Amazon Bedrock ãŸãã¯éåžžã® AWS ãµããŒãã®é£çµ¡å
ãéããŠãã£ãŒãããã¯ããã²ãå¯ããã ããã â Danilo åæã¯ ãã¡ã ã§ãã