ããã«ã¡ã¯ãããã¯ãšã³ããšã³ãžãã¢ã®ãŸãã§ãã AIæè¡ã®æ¥éãªé²åã«äŒŽããåŸæ¥ã®ããŒã¯ãŒãæ€çŽ¢ã§ã¯å¯Ÿå¿ã§ããªããæå³çãªé¡äŒŒæ§ãã«åºã¥ãæ€çŽ¢ããŒãºãæ¥å¢ããŠããŸããæ¬èšäºã§ã¯ã ãªãŒãã³ãœãŒã¹RDBMSã§ããPostgreSQLã«pgvectoræ¡åŒµãçµã¿èŸŒãã ãã§ãç°¡åã«ãã¯ãã«æ€çŽ¢ã·ã¹ãã ãæ§ç¯ããæ¹æ³ ã解説ããŸãã ãã¯ãã«æ€çŽ¢ ãšã¯ãæç« ãæ°å€ãã¯ãã«ã«å€æããŠæœè±¡çãªæå³ãæ€çŽ¢ããæè¡ã§ãããããŒã¯ãŒãäŸååã®æ€çŽ¢ã§ã¯æããããªãããŠãŒã¶ãŒãæ¬åœã«æ±ããŠããæå³ããã é«ã粟床 ã§æ±²ã¿åããæ€çŽ¢ææ³ã§ãã ãã®èšäºã§ã¯ããã¯ãã«æ€çŽ¢ãPostgreSQLã«çµã¿èŸŒãæ¹æ³ãããã³ãºãªã³åœ¢åŒã§ç°å¢æ§ç¯ãé²ããªãã説æããŠãããŸãã ãã¯ãã«æ€çŽ¢ãšã¯ 3åã§åãããã¯ãã«æ€çŽ¢ã®ä»çµã¿ åŸæ¥ã®ããŒã¯ãŒãæ€çŽ¢ã¯ãæåéãããŒã¯ãŒããäžèŽãããã©ããã§æ€çŽ¢çµæãè¿ããŸããããããèšèã®è¡šçŸã¯å€æ§ã§ããããŠãŒã¶ãŒã®æå³ãšå®å
šã«äžèŽããããŒã¯ãŒãã䜿ããããšã¯éããŸãããããã§ç»å Žããã®ã ãã¯ãã«æ€çŽ¢ ã§ãã ãã¯ãã«æ€çŽ¢ã¯ãããã¹ããç»åãªã©ã®ããŒã¿ããæ°å€ã®éãŸãã§ãã ãã¯ãã« ã«å€æããŸãããã®ãã¯ãã«ã¯ãå
ã®ããŒã¿ã®æå³çãªç¹åŸŽãæããŠãããé¡äŒŒããæå³ãæã€ããŒã¿ã¯ããã¯ãã«ç©ºéäžã§è¿ãäœçœ®ã«é
眮ãããŸãã å
·äœçãªä»çµã¿: ãšã³ããã£ã³ã°:  ããã¹ãïŒè³ªåãæç« ïŒããäºåã«åŠç¿æžã¿ã®AIã¢ãã«ïŒäŸïŒOpenAIã®text-embedding-ada-002ïŒãçšããŠãæ°å€ãã¯ãã«ã«å€æããŸãããã®åŠçããšã³ããã£ã³ã°ãšåŒã³ãŸãã ãã¯ãã«ããŒã¿ããŒã¹:  ãšã³ããã£ã³ã°ããããã¯ãã«ãããŒã¿ããŒã¹ïŒãã®äŸã§ã¯PostgreSQL + pgvectorïŒã«æ ŒçŽããŸãã é¡äŒŒåºŠèšç®:  æ€çŽ¢ã¯ãšãªãåæ§ã«ãã¯ãã«ã«å€æããããŒã¿ããŒã¹å
ã®ãã¯ãã«ãšã®é¡äŒŒåºŠãèšç®ããŸããé¡äŒŒåºŠã®é«ããã¯ãã«ãæã€ããŒã¿ããæ€çŽ¢çµæãšããŠè¿ãããŸãã äŸ: ãç«ã奜ããªäººã«ããããã®æ ç»ã¯ïŒããšãã質åããã¯ãã«æ€çŽ¢ã«ããããšããç«ããšããããŒã¯ãŒããå«ãŸããŠããªããŠãããç«ãç»å Žããæ ç»ããç«ã飌ã£ãŠãã人ã䞻人å
¬ã®æ ç»ããªã©ãæå³çã«é¢é£æ§ã®é«ãæ ç»ãæ€çŽ¢çµæãšããŠè¡šç€ºãããå¯èœæ§ããããŸãã ã€ãŸãããã¯ãã«æ€çŽ¢ã¯ãããŒã¯ãŒãã«çžããããæå³ã«åºã¥ããæè»ãªæ€çŽ¢ãå®çŸããæè¡ãªã®ã§ãã PostgreSQLæ¡çšã®5倧ã¡ãªãã ãã¯ãã«æ€çŽ¢ã·ã¹ãã ãæ§ç¯ããäžã§ãPostgreSQLãæ¡çšããã¡ãªããã¯å€å²ã«ããããŸãã以äžã«äž»ãª5ã€ã®ã¡ãªãããæããŸãã æ¡åŒµæ§:  pgvectoræ¡åŒµã«ãããPostgreSQLã«ãã¯ãã«æ€çŽ¢æ©èœã远å ã§ããŸããæ¢åã®ããŒã¿ããŒã¹ç°å¢ã倧ãã倿Žããå¿
èŠã¯ãããŸããã ã³ã¹ãå¹ç:  ãªãŒãã³ãœãŒã¹ã§ãããããé«é¡ãªã©ã€ã»ã³ã¹è²»çšã¯äžèŠã§ããå¿
èŠãªããŒããŠã§ã¢ãªãœãŒã¹ã®ã¿ã§éçšã§ããŸãã ä¿¡é Œæ§:  PostgreSQLã¯é·å¹Žã®å®çžŸãæã€å
ç¢ãªRDBMSã§ãããé«ãä¿¡é Œæ§ãšå®å®æ§ãèªããŸãã æšæºSQL察å¿:  æ¢åã®SQLã¯ãšãªãšçµã¿åãããŠãè€éãªæ€çŽ¢åŠçãèšè¿°ã§ããŸãã ã³ãã¥ããã£ãµããŒã:  äžçäžã«æŽ»çºãªã³ãã¥ããã£ãååšããè±å¯ãªæ
å ±ããµããŒããåŸãããŸãã åŸæ¥æ€çŽ¢ãšã®ããã©ãŒãã³ã¹æ¯èŒè¡š æ€çŽ¢æ¹åŒ æ€çŽ¢ç²ŸåºŠ æ€çŽ¢é床 (ããŒã¿éäŸå) æè»æ§ åè ããŒã¯ãŒãæ€çŽ¢ ããŒã¯ãŒãäžèŽã«äŸåãææ§ãªè¡šçŸãå矩èªã«åŒ±ãã é«é äœããããŒã¯ãŒãã®å³å¯ãªäžèŽãå¿
èŠã ã·ã³ãã«ãªæ€çŽ¢ã«ã¯é©ããŠããã ãã¯ãã«æ€çŽ¢ æå³çãªé¡äŒŒæ§ã«åºã¥ããããããŒã¯ãŒãã«äŸåããªããé«ã粟床ãå®çŸã ããŒã¿éã«äŸåãã€ã³ããã¯ã¹æ§é ã§é«éåå¯èœã é«ãããŠãŒã¶ãŒã®æå³ãæ±²ã¿åã£ãæè»ãªæ€çŽ¢ãå¯èœã 倧éã®ããŒã¿ã«å¯ŸããŠã¯ãé©åãªã€ã³ããã¯ã¹èšèšãéèŠã å
šææ€çŽ¢ ããŒã¯ãŒãæ€çŽ¢ããé«åºŠãªæ€çŽ¢ãå¯èœã ããæå³çè§£ã¯éå®çã ããŒã¿éã«äŸåã äžçšåºŠã æ¥æ¬èªã®åœ¢æ
çŽ è§£æãªã©ãèšèªäŸåã®åŠçãå¿
èŠãªå Žåãããã è£è¶³: ããã©ãŒãã³ã¹ã¯ãããŒã¿éãããŒããŠã§ã¢ãã€ã³ããã¯ã¹èšèšãªã©ã«å€§ããå·Šå³ãããŸãã äžèšæ¯èŒè¡šã¯ãããŸã§äžè¬çãªåŸåã瀺ããã®ã§ãããå®éã®ããã©ãŒãã³ã¹ã¯ç°å¢ã«ãã£ãŠç°ãªããŸãã å°çšã®ãã¯ãã«ããŒã¿ããŒã¹ãšã®æ¯èŒ å°çšã®ãã¯ãã«ããŒã¿ããŒã¹ïŒäŸ: Chroma, FAISS, PineconeïŒãšPostgreSQLã®æ¡åŒµæ©èœã§ããpgvectorã¯ãããããç°ãªã匷ã¿ãæã£ãŠããŸããããã§ã¯ãpgvectorãšå°çšãã¯ãã«ããŒã¿ããŒã¹ãšã®éãããããããã®ã¡ãªããã»ãã¡ãªããã解説ããŸãã pgvectorã®ç¹åŸŽ pgvectorã¯ãPostgreSQLã«ãã¯ãã«æ€çŽ¢æ©èœã远å ããæ¡åŒµæ©èœã§ãã以äžãäž»ãªç¹åŸŽã§ãïŒ ãªã¬ãŒã·ã§ãã«ããŒã¿ãšã®çµ±å ïŒãã¯ãã«ããŒã¿ãšåŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ãåãããŒã¿ããŒã¹ã§ç®¡çå¯èœã äœã³ã¹ãå°å
¥ ïŒæ¢åã®PostgreSQLç°å¢ã«æ¡åŒµæ©èœãšããŠè¿œå ããã ãã§å©çšå¯èœã ACIDæºæ ïŒãã©ã³ã¶ã¯ã·ã§ã³ç®¡çãã»ãã¥ãªãã£æ©èœããã®ãŸãŸå©çšå¯èœã å°çšãã¯ãã«ããŒã¿ããŒã¹ã®ç¹åŸŽ å°çšãã¯ãã«ããŒã¿ããŒã¹ïŒäŸ: Chroma, FAISS, PineconeïŒã¯ããã¯ãã«æ€çŽ¢ã«ç¹åããèšèšããããŠããŸãã以äžãäž»ãªç¹åŸŽã§ãïŒ é«éæ€çŽ¢ ïŒé«æ¬¡å
ãã¯ãã«ã«æé©åãããã€ã³ããã¯ã¹èšèšïŒäŸ: HNSW, IVFïŒã ã¹ã±ãŒã©ããªã㣠ïŒåæ£ã·ã¹ãã ã«ããæ°Žå¹³ã¹ã±ãŒãªã³ã°ã容æã 倿§ãªããŒã¿åœ¢åŒå¯Ÿå¿ ïŒç»åãé³å£°ãåç»ãªã©ã®éæ§é åããŒã¿ãå¹ççã«åŠçå¯èœã æ¯èŒè¡š é
ç® pgvector å°çšãã¯ãã«ããŒã¿ããŒã¹ïŒäŸ: Chroma, FAISSïŒ å°å
¥ã³ã¹ã äœãïŒæ¢åPostgreSQLç°å¢ã§å©çšå¯èœïŒ é«ãïŒæ°èŠã€ã³ãã©æ§ç¯ãå¿
èŠïŒ é床ïŒå€§èŠæš¡ããŒã¿ïŒ äžçšåºŠïŒPostgreSQLäŸåïŒ é«éïŒå°çšèšèšã«ããæé©åïŒ ã¹ã±ãŒã©ããªã㣠åçŽã¹ã±ãŒã«ïŒããŒããŠã§ã¢å¢åŒ·ãå¿
èŠïŒ æ°Žå¹³ã¹ã±ãŒã«ïŒåæ£ã·ã¹ãã 察å¿ïŒ çµ±åæ§ é«ãïŒSQLã¯ãšãªã§ãªã¬ãŒã·ã§ãã«ããŒã¿ãšçµ±åïŒ äœãïŒå¥éAPIãããã«ãŠã§ã¢ãå¿
èŠïŒ ãŠãŒã¹ã±ãŒã¹ å°ãäžèŠæš¡ããŒã¿ãæ¢åRDBMSãšã®çµ±å å€§èŠæš¡ããŒã¿ãé«éæ€çŽ¢ãæ±ããããAI/MLã¯ãŒã¯ãã㌠åŠç¿ã³ã¹ã äœãïŒPostgreSQLãŠãŒã¶ãŒã«éŠŽæã¿ãããïŒ äžãé«ïŒæ°ããããŒã«ãAPIã®ç¿åŸãå¿
èŠïŒ pgvectorã®ã¡ãªãããšãã¡ãªãã ã¡ãªãã ç°¡åãªå°å
¥æé ïŒPostgreSQLç°å¢ã«æ¡åŒµæ©èœãšããŠã€ã³ã¹ããŒã«ããã ãã§å©çšå¯èœã äœã³ã¹ãéçš ïŒæ¢åã€ã³ãã©ã掻çšã§ãããããæ°ããªãµãŒããŒæ§ç¯ãäžèŠã SQLçµ±åæ§ ïŒåŸæ¥ã®SQLã¯ãšãªãšçµã¿åãããŠãã€ããªããæ€çŽ¢ãå¯èœã ãã¡ãªãã ããã©ãŒãã³ã¹éç ïŒå€§èŠæš¡ããŒã¿ã»ãããè¶
髿¬¡å
ãã¯ãã«ã§ã¯å°çšVectorDBã«å£ãã æ°Žå¹³ã¹ã±ãŒãªã³ã°éå¯Ÿå¿ ïŒPostgreSQLèªäœã忣ã·ã¹ãã ã«æé©åãããŠããªãããã倧éãã©ãã£ãã¯ã«ã¯äžåãã æ©èœå¶çŽ ïŒç»åãé³å£°ãªã©éæ§é åããŒã¿ãžã®å¯Ÿå¿ã¯éå®çã å°çšãã¯ãã«ããŒã¿ããŒã¹ã®ã¡ãªãããšãã¡ãªãã ã¡ãªãã é«éãªé¡äŒŒæ€çŽ¢ ïŒHNSWãIVFãªã©ãé«åºŠãªã€ã³ããã¯ã¹ã¢ã«ãŽãªãºã ãæ¡çšã å€§èŠæš¡ããŒã¿å¯Ÿå¿ ïŒåæ£ã·ã¹ãã ã«ããæ°Žå¹³ã¹ã±ãŒãªã³ã°ã§æ°åå件以äžã®ãã¯ãã«åŠçãå¯èœã æè»æ§ ïŒç»åãé³å£°ãåç»ãªã©å€æ§ãªããŒã¿åœ¢åŒã«å¯Ÿå¿ã ãã¡ãªãã å°å
¥ã»éçšã³ã¹ããé«ã ïŒæ°ããªã€ã³ãã©æ§ç¯ãéçšç®¡çãå¿
èŠã åŠç¿ã³ã¹ããé«ã ïŒæ°ããããŒã«ãAPIã®ç¿åŸãæ±ããããã çµ±åæ§ãäœã ïŒåŸæ¥ã®RDBMSãšã®é£æºã«ã¯è¿œå éçºãå¿
èŠã éžæåºæº pgvectorãéžã¶ã¹ãã±ãŒã¹ æ¢åã®PostgreSQLç°å¢ã掻çšãããå Žå äžå°èŠæš¡ã®ãããžã§ã¯ãã§ã³ã¹ãå¹çãéèŠããå Žå ãªã¬ãŒã·ã§ãã«ããŒã¿ãšã®çµ±åæ§ãéèŠãªå Žå å°çšVectorDBãéžã¶ã¹ãã±ãŒã¹ æ°åäžãæ°å件以äžã®å€§èŠæš¡ãã¯ãã«æ€çŽ¢ãè¡ãå Žå éæ§é åããŒã¿ïŒäŸ: ç»åãé³å£°ïŒã®åŠçãå¿
èŠãªå Žå é«éæ§ãšã¹ã±ãŒã©ããªãã£ãæåªå
ããå Žå pgvectorã¯ãPostgreSQLãŠãŒã¶ãŒã«ãšã£ãŠæè»œãã€ã³ã¹ãå¹çã®è¯ãéžæè¢ã§ãããäžå°èŠæš¡ãããžã§ã¯ãã«ã¯æé©ã§ããäžæ¹ãå°çšVectorDBã¯ãå€§èŠæš¡ãã€è€éãªAI/MLã¯ãŒã¯ãããŒã«ãããŠå§åçãªããã©ãŒãã³ã¹ãçºæ®ããŸããçšéãèŠä»¶ã«å¿ããŠãããããã®ç¹æ§ã掻ãããéžæãè¡ãããšãéèŠã§ãã ç°å¢æ§ç¯ æ¬ç« ã§ã¯ãã³ãºãªã³åœ¢åŒã§ PostgreSQLã³ã³ããã®ã»ããã¢ãã ããã Streamlitãçšããããã³ããšã³ãã®æ§ç¯ ããããŠãããã飿ºããã Dockerç°å¢ã®èšå® ãŸã§ãå¿
èŠãªæé ããããããã解説ããŸãããã®ç« ã®æé ã«åŸãããšã§ãpgvectorãçšããç°¡åãªãã¯ãã«æ€çŽ¢ã·ã¹ãã ãæ§ç¯ã§ããŸãã ç°å¢æ§æ äžèšã®ããã«éåžžã«ã·ã³ãã«ãªæ§æãdocker composeã§æ§ç¯ããŸãã ãã£ã¬ã¯ããªæ§æã¯ä»¥äžã®ããã«ãªããŸãã âââ .streamlit â âââ secrets.toml *# DBæ¥ç¶æ
å ±* âââ docker-compose.yml âââ postgres â âââ Dockerfile *# PGæ¡åŒµæ©èœã€ã³ã¹ããŒã«* â âââ initdb â âââ init.sql *# ããŒãã«å®çŸ©* âââ streamlit âââ Dockerfile *# Pythonç°å¢æ§ç¯* âââ app.py *# ã¡ã€ã³ã¢ããª* âââ embeddings.py *# ãã¯ãã«çæããžãã¯* âââ requirements.txt âââ seed.py *# ãã¹ãããŒã¿çæ* PostgreSQLã³ã³ããã®ç°å¢æ§ç¯ PostgreSQLã®ããŒã¿ããŒã¹ãèµ·åããã³ã³ããäžã«åæèšå®çšã®SQLãã¡ã€ã«ãäœæããŸãã CREATE EXTENSION IF NOT EXISTS vector; -- ãµã³ãã«ããŒãã«äœæïŒå¿
èŠã«å¿ããŠã«ã¹ã¿ãã€ãºïŒ CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, embedding VECTOR(1024) ); CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops); Streamlitã³ã³ããã®ç°å¢æ§ç¯ ã¡ã€ã³åŠçãšãªãpythonã³ãŒããèšè¿°ãããã¡ã€ã«ãäœæããŸãã import streamlit as st import pandas as pd import numpy as np from datetime import datetime import psycopg2 from psycopg2.extras import execute_values from sqlalchemy.sql import text from streamlit.logger import get_logger from embedding import Embedding logger = get_logger(__name__) embedding = Embedding() # ããŒãžèšå® st.set_page_config( page_title="Vector Database Demo", page_icon="🔍", layout="wide" ) def get_embedding(text): return embedding.get_embedding([text])[0] # ããŒã¿ããŒã¹æ¥ç¶é¢æ° def get_connection(): return st.connection('postgresql', type='sql') # ã¡ã€ã³ã¢ããªã±ãŒã·ã§ã³ def main(): st.title("📊 Vector Database Demo") # ãµã€ãããŒã§ã®æäœéžæ operation = st.sidebar.selectbox( "æäœãéžæ", ["ããŒã¿è¡šç€º", "ããŒã¿è¿œå ", "ãã¯ãã«æ€çŽ¢"] ) if operation == "ããŒã¿è¡šç€º": show_data() elif operation == "ããŒã¿è¿œå ": add_data() elif operation == "ãã¯ãã«æ€çŽ¢": vector_search() def show_data(): st.header("📋 ç»é²ããŒã¿äžèЧ") conn = get_connection() # ããŒã¿ååŸ query = "SELECT id, title,content FROM documents LIMIT 100" df = conn.query(query, ttl=0) if not df.empty: st.dataframe(df) else: st.info("ããŒã¿ãç»é²ãããŠããŸãã") def add_data(): st.header("➕ ããŒã¿è¿œå ") # å
¥åãã©ãŒã with st.form("data_form"): title = st.text_input("ã¿ã€ãã«ãå
¥å") content = st.text_area("ããã¹ããå
¥å") submitted = st.form_submit_button("ç»é²") if submitted and content: conn = get_connection() # ãµã³ãã«ãšããŠãã©ã³ãã ãª1536次å
ãã¯ãã«ãçæ # å®éã®ã¢ããªã±ãŒã·ã§ã³ã§ã¯ãé©åãªãšã³ãããã£ã³ã°ã¢ãã«ã䜿çšãã embedding = get_embedding(title + " " + content) # ããŒã¿ç»é² query = text(""" INSERT INTO documents (title, content, embedding) VALUES (:title, :content, :embedding) """) params = {"title": title, "content": content, "embedding": embedding} try: with conn.session as session: session.execute(query, params) session.commit() st.success("ããŒã¿ãç»é²ããŸãã") except Exception as e: st.error(f"ãšã©ãŒãçºçããŸãã: {str(e)}") def vector_search(): st.header("🔍 ãã¯ãã«æ€çŽ¢") search_text = st.text_input("æ€çŽ¢ããã¹ããå
¥å") k = st.slider("衚瀺件æ°", min_value=1, max_value=10, value=5) if st.button("æ€çŽ¢") and search_text: # ãµã³ãã«ãšããŠãã©ã³ãã ãªã¯ãšãªãã¯ãã«ãçæ # å®éã®ã¢ããªã±ãŒã·ã§ã³ã§ã¯ãå
¥åããã¹ããé©åã«ãšã³ãããã£ã³ã° query_vector = get_embedding(search_text) conn = get_connection() # ã³ãµã€ã³é¡äŒŒåºŠã«ããæ€çŽ¢ query = """ SELECT title,content, 1 - (embedding <-> :query_vector) as similarity FROM documents ORDER BY embedding <-> :query_vector LIMIT :k """ params = {"query_vector": str(query_vector), "k": k} try: df = conn.query(query, params=params, ttl=0) if not df.empty: # çµæè¡šç€º for _, row in df.iterrows(): with st.expander(f"{row['title']} é¡äŒŒåºŠ: {row['similarity']:.4f}"): st.write(row['content']) else: st.info("æ€çŽ¢çµæãèŠã€ãããŸããã§ãã") except Exception as e: st.error(f"æ€çŽ¢äžã«ãšã©ãŒãçºçããŸãã: {str(e)}") # ã¢ããªã±ãŒã·ã§ã³å®è¡ if __name__ == "__main__": main() ããã¹ãã®åã蟌ã¿åŠçãè¡ãpythonã³ãŒããèšè¿°ãããã¡ã€ã«ãäœæããŸãã ä»åã®äŸã§ã¯ããã©ã«ãã§åã蟌ã¿çšã®ã¢ãã«ã«multilingual-e5-largeã䜿çšããããã«èšå®ããŠããŸãããã®ã¢ãã«ã倿Žããããšã§æ€çŽ¢æã®åŸåçãå€ããããšãå¯èœã§ãã intfloat/multilingual-e5-large · Hugging Face import torch.nn.functional as F from torch import Tensor from transformers import AutoTokenizer, AutoModel class Embedding: def __init__(self, model_name: str = 'intfloat/multilingual-e5-large'): self.model_name = model_name self.load_model() def load_model(self): self.tokenizer = AutoTokenizer.from_pretrained(self.model_name) self.model = AutoModel.from_pretrained(self.model_name) def average_pool( self, last_hidden_states: Tensor, attention_mask: Tensor ) -> Tensor: last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0) return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None] def get_embedding(self, input_texts: list[str]) -> list[float]: batch_dict = self.tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt') outputs = self.model(**batch_dict) embeddings = self.average_pool(outputs.last_hidden_state, batch_dict['attention_mask']) return F.normalize(embeddings, p=2, dim=1).tolist() åæããŒã¿ã®æå
¥ãè¡ãpythonã³ãŒããèšè¿°ãããã¡ã€ã«ãäœæããŸãã åçŽã«ã¿ã€ãã«ãšå
容ããªãã¹ãããããããŒã¿ããŒã¹ã«åã蟌ã¿è¡šçŸãšå
±ã«ä¿åããŠããŸãã import psycopg2 import numpy as np from psycopg2.extras import execute_values from embedding import Embedding test_data = [ { "title": "Dockerã³ã³ããã®ãã¹ããã©ã¯ãã£ã¹2025幎ç", "content": "Dockerã³ã³ãããæ¬çªç°å¢ã§å¹ççã«éçšããããã®ãã¹ããã©ã¯ãã£ã¹ã解説ããŸããã€ã¡ãŒãžãµã€ãºã®æé©åãã»ãã¥ãªãã£å¯Ÿçããããã¯ãŒã¯èšå®ãããªã¥ãŒã 管çãªã©ãå®è·µçãªãããã¯ãç¶²çŸ
çã«ã«ããŒããŸãããã«ãã¹ããŒãžãã«ãã®æŽ»çšæ¹æ³ããç°å¢å€æ°ã®é©åãªç®¡çæ¹æ³ã«ã€ããŠã詳ãã説æããŸãã" }, { "title": "PyTorchã«ããæ·±å±€åŠç¿ã¢ãã«ã®æé©åææ³", "content": "PyTorchã䜿çšããæ·±å±€åŠç¿ã¢ãã«ã®ããã©ãŒãã³ã¹æé©åã«ã€ããŠè§£èª¬ããŸããããããµã€ãºã®èª¿æŽãåŠç¿çã¹ã±ãžã¥ãŒãªã³ã°ãããŒã¿ããŒããŒã®æé©åãGPUã¡ã¢ãªã®å¹ççãªäœ¿ç𿹿³ãªã©ãå®è·µçãªæé©åãã¯ããã¯ã玹ä»ããŸãã" }, { "title": "ãã€ã¯ããµãŒãã¹ã¢ãŒããã¯ãã£ã®èšèšãã¿ãŒã³", "content": "ãã€ã¯ããµãŒãã¹ã¢ãŒããã¯ãã£ãæ¡çšããéã®äž»èŠãªèšèšãã¿ãŒã³ã«ã€ããŠè§£èª¬ããŸãããµãŒãã¹ééä¿¡ãããŒã¿äžè²«æ§ã®ç¢ºä¿ãé害察çãã¢ãã¿ãªã³ã°æŠç¥ãªã©ãå®è£
æã®éèŠãªãã€ã³ãã詳ãã説æããŸãã" }, { "title": "Kuberneteséçšç®¡çã®å®è·µã¬ã€ã", "content": "Kubernetesã¯ã©ã¹ã¿ã®å¹ççãªéçšç®¡çæ¹æ³ã«ã€ããŠè§£èª¬ããŸãããªãœãŒã¹ç®¡çããªãŒãã¹ã±ãŒãªã³ã°ãã¢ãã¿ãªã³ã°ãã»ãã¥ãªãã£å¯Ÿçãªã©ãå®éçšã§å¿
èŠãšãªãç¥èãäœç³»çã«èª¬æããŸãã" }, { "title": "å¹ççãªããŒã¿ããŒã¹ã€ã³ããã¯ã¹èšèš", "content": "ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã«ãããã€ã³ããã¯ã¹èšèšã®ãã¹ããã©ã¯ãã£ã¹ã解説ããŸããã¯ãšãªããã©ãŒãã³ã¹ã®æé©åãã€ã³ããã¯ã¹éžæã®åºæºãã¡ã³ããã³ã¹æŠç¥ãªã©ãå®è·µçãªã¢ãããŒãã玹ä»ããŸãã" }, { "title": "GraphQLã«ããã¢ãã³APIã®æ§ç¯", "content": "GraphQLã䜿çšããAPIã®èšèšãšå®è£
ã«ã€ããŠè§£èª¬ããŸããã¹ããŒãèšèšããªãŸã«ãã®å®è£
ãN+1åé¡ã®è§£æ±ºããã£ãã·ã¥æŠç¥ãªã©ãå®è·µçãªéçºææ³ã玹ä»ããŸãã" }, { "title": "CI/CDãã€ãã©ã€ã³ã®èªååæŠç¥", "content": "ç¶ç¶çã€ã³ãã°ã¬ãŒã·ã§ã³/ããªããªãŒãã€ãã©ã€ã³ã®å¹ççãªæ§ç¯æ¹æ³ã«ã€ããŠè§£èª¬ããŸãããã¹ãèªååããããã€æŠç¥ãå質管çãã¢ãã¿ãªã³ã°ãªã©ãå®è·µçãªèªååææ³ã玹ä»ããŸãã" }, { "title": "ã»ãã¥ã¢ãªWebã¢ããªã±ãŒã·ã§ã³éçº", "content": "Webã¢ããªã±ãŒã·ã§ã³ã®ã»ãã¥ãªãã£å¯Ÿçã«ã€ããŠå
æ¬çã«è§£èª¬ããŸããXSS察çãCSRF察çãèªèšŒã»èªå¯ã®å®è£
ãã»ãã¥ã¢ãªã»ãã·ã§ã³ç®¡çãªã©ãéèŠãªã»ãã¥ãªãã£èæ
®äºé
ã説æããŸãã" }, { "title": "å¹ççãªãã£ãã·ã¥æŠç¥ã®å®è£
", "content": "Webã¢ããªã±ãŒã·ã§ã³ã«ããããã£ãã·ã¥æŠç¥ã®èšèšãšå®è£
ã«ã€ããŠè§£èª¬ããŸããCDNããã©ãŠã¶ãã£ãã·ã¥ãã¢ããªã±ãŒã·ã§ã³ãã£ãã·ã¥ãããŒã¿ããŒã¹ãã£ãã·ã¥ãªã©ãå€å±€çãªãã£ãã·ã¥æŠç¥ã玹ä»ããŸãã" }, { "title": "å€§èŠæš¡ããŒã¿åŠçã®ãã¹ããã©ã¯ãã£ã¹", "content": "å€§èŠæš¡ããŒã¿åŠçã·ã¹ãã ã®èšèšãšå®è£
ã«ã€ããŠè§£èª¬ããŸãããããåŠçãã¹ããªãŒã åŠçãããŒã¿ãã€ãã©ã€ã³ãã¹ã±ãŒã©ããªãã£ç¢ºä¿ãªã©ãå®è·µçãªã¢ãããŒãã玹ä»ããŸãã" }, { "title": "ReactãšTypeScriptã«ããããã³ããšã³ãéçº", "content": "ReactãšTypeScriptãçµã¿åãããææ°ã®ããã³ããšã³ãéçºææ³ã«ã€ããŠè§£èª¬ããŸããåå®å
šãªéçºãã³ã³ããŒãã³ãèšèšãç¶æ
管çãããã©ãŒãã³ã¹æé©åãªã©ãå®è·µçãªéçºãã¯ããã¯ã玹ä»ããŸãã" }, { "title": "AWSã§ã®ã¹ã±ãŒã©ãã«ãªã€ã³ãã©æ§ç¯", "content": "AWSã䜿çšããã¹ã±ãŒã©ãã«ãªã€ã³ãã©ã¹ãã©ã¯ãã£ã®æ§ç¯æ¹æ³ã«ã€ããŠè§£èª¬ããŸãããªãŒãã¹ã±ãŒãªã³ã°ãè² è·åæ£ãé害察çãã³ã¹ãæé©åãªã©ãã¯ã©ãŠãã€ã³ãã©ã®èšèšãã€ã³ãã説æããŸãã" }, { "title": "å¹ççãªãã°ç®¡çãšã¢ãã¿ãªã³ã°", "content": "忣ã·ã¹ãã ã«ããããã°ç®¡çãšã¢ãã¿ãªã³ã°ã®å®è·µçã¢ãããŒãã«ã€ããŠè§£èª¬ããŸãããã°åéãåæãå¯èŠåãã¢ã©ãŒãèšå®ãªã©ã广çãªéçšç£èŠã®æ¹æ³ã玹ä»ããŸãã" }, { "title": "ãã€ã¯ãããã³ããšã³ãã¢ãŒããã¯ãã£ã®å®è£
", "content": "ãã€ã¯ãããã³ããšã³ãã¢ãŒããã¯ãã£ã®èšèšãšå®è£
ã«ã€ããŠè§£èª¬ããŸããã¢ãžã¥ãŒã«åå²ãçµ±åæŠç¥ãã«ãŒãã£ã³ã°ãç¶æ
管çãªã©ãããã³ããšã³ãéçºã®æ°ããã¢ãããŒãã玹ä»ããŸãã" }, { "title": "NoSQLããŒã¿ããŒã¹ã®èšèšãã¿ãŒã³", "content": "NoSQLããŒã¿ããŒã¹ã䜿çšããéã®å¹æçãªèšèšãã¿ãŒã³ã«ã€ããŠè§£èª¬ããŸããããŒã¿ã¢ããªã³ã°ãã¯ãšãªæé©åãã¹ã±ãŒãªã³ã°æŠç¥ãªã©ãå®è·µçãªäœ¿ç𿹿³ã玹ä»ããŸãã" }, { "title": "æ©æ¢°åŠç¿ã¢ãã«ã®æ¬çªç°å¢ãããã€", "content": "æ©æ¢°åŠç¿ã¢ãã«ãæ¬çªç°å¢ã«ãããã€ããéã®å®è·µçã¢ãããŒãã«ã€ããŠè§£èª¬ããŸããã¢ãã«ã®ããŒãžã§ã³ç®¡çãã¹ã±ãŒãªã³ã°ãã¢ãã¿ãªã³ã°ãååŠç¿æŠç¥ãªã©ãéçšäžã®éèŠãã€ã³ãã説æããŸãã" }, { "title": "Terraformã«ããã€ã³ãã©ã®ã³ãŒãå", "content": "Terraformã䜿çšããã€ã³ãã©ã¹ãã©ã¯ãã£ã®ã³ãŒãåã«ã€ããŠè§£èª¬ããŸãããªãœãŒã¹ç®¡çãã¢ãžã¥ãŒã«èšèšãç¶æ
管çãããŒã éçºãªã©ãIaCã®å®è·µçãªé©ç𿹿³ã玹ä»ããŸãã" }, { "title": "å¹ççãªAPIããŒãžã§ãã³ã°æŠç¥", "content": "WebAPIã®ããŒãžã§ãã³ã°æŠç¥ã«ã€ããŠå®è·µçãªæ¹æ³ã解説ããŸããããŒãžã§ã³ç®¡çææ³ãäžäœäºææ§ã®ç¢ºä¿ããã€ã°ã¬ãŒã·ã§ã³æŠç¥ãªã©ãé·æç㪠API éçšã®ãã€ã³ãã説æããŸãã" }, { "title": "ã»ãã¥ã¢ãªãã€ã¯ããµãŒãã¹ééä¿¡", "content": "ãã€ã¯ããµãŒãã¹ç°å¢ã«ãããã»ãã¥ã¢ãªéä¿¡æ¹æ³ã«ã€ããŠè§£èª¬ããŸããèªèšŒãèªå¯ãæå·åãèšŒææžç®¡çãªã©ããµãŒãã¹ééä¿¡ã®ã»ãã¥ãªãã£ç¢ºä¿æ¹æ³ã説æããŸãã" }, { "title": "å¹ççãªããŒã¿ããŒã¹ãã€ã°ã¬ãŒã·ã§ã³", "content": "å€§èŠæš¡ããŒã¿ããŒã¹ã®ãã€ã°ã¬ãŒã·ã§ã³æŠç¥ã«ã€ããŠè§£èª¬ããŸããããŠã³ã¿ã€ã æå°åãããŒã¿æŽåæ§ç¢ºä¿ãããŒã«ããã¯èšç»ãªã©ãå®å
šãªãã€ã°ã¬ãŒã·ã§ã³ã®å®æœæ¹æ³ã玹ä»ããŸãã" } ] def insert_test_data(): conn = psycopg2.connect( dbname="vectordb", user="postgres", password="postgres", host="postgres", port="5432" ) cur = conn.cursor() embedding = Embedding() for data in test_data: # ãµã³ãã«ãšããŠ1536次å
ã®ã©ã³ãã ãã¯ãã«ãçæ emb = embedding.get_embedding([data["title"] + " " + data["content"]])[0] cur.execute( "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)", (data["title"], data["content"], emb) ) conn.commit() cur.close() conn.close() if __name__ == "__main__": insert_test_data() ããŒã¿ããŒã¹ãšã®æ¥ç¶å®çŸ©ãèšè¿°ãããã¡ã€ã«ãäœæããŸãã [connections.postgresql] dialect = "postgresql" host = "postgres" port = 5432 database = "vectordb" username = "postgres" password = "postgres" äŸåã©ã€ãã©ãªãèšè¿°ãããã¡ã€ã«ãäœæããŸãã SQLAlchemy==2.0.35 streamlit==1.32.0 pandas==2.2.0 numpy==1.26.0 psycopg2-binary==2.9.9 torch==2.6.0 transformers==4.48.2 Dockerç°å¢ã®ã»ããã¢ãã ãŸãã¯postgresã®DockerfileããäœæãããŠãããŸãã postgresããŒã¹ã®ã€ã¡ãŒãžã«pgvectorã®ã€ã³ã¹ããŒã«åŠçã远å ããŸãã FROM postgres:16.3 # pgvectorã€ã³ã¹ããŒã« RUN apt-get update && \\ apt-get install -y \\ build-essential \\ git \\ postgresql-server-dev-16 RUN git clone <https://github.com/pgvector/pgvector.git> /tmp/pgvector && \\ cd /tmp/pgvector && \\ make && \\ make install && \\ rm -rf /tmp/pgvector ç¶ããŠStreamlitã®DockerfileãäœæããŸãã FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt CMD ["streamlit", "run", "./streamlit/app.py", "--server.port=8501"] æåŸã«ãããã®ã³ã³ãããæããŠç®¡çããdocker-compose.ymlãäœæããŸãã version: "3.9" services: postgres: build: ./postgres environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres POSTGRES_DB: vectordb volumes: - postgres_data:/var/lib/postgresql/data - ./postgres/initdb:/docker-entrypoint-initdb.d ports: - "5432:5432" networks: - app-network streamlit: build: ./streamlit volumes: - .:/app environment: - STREAMLIT_SERVER_PORT=8501 ports: - "8501:8501" depends_on: - postgres networks: - app-network volumes: postgres_data: networks: app-network: driver: bridge åäœç¢ºèª èµ·åç¢ºèª ä»¥äžã®ã³ãã³ãã§ã³ã³ããã®ãã«ããè¡ããŸãã docker compose build 以äžã®ã³ãã³ãã§ã³ã³ããã®èµ·åãè¡ããŸãã docker compose up -d ãã©ãŠã¶ã§ä»¥äžã®ã¢ãã¬ã¹ã«ã¢ã¯ã»ã¹ããŸãã http://localhost:8501/ æåã¯ã¢ãã«ãèªã¿èŸŒãããããŒãäžãšãªããäžèšã®ãããªç»é¢ã衚瀺ãããããšæããŸãã ãã®åŸãã®ãããªè¡šç€ºã«ãªãã°ããµãŒããŒã®èµ·åã«æåããŠããŸãã åæããŒã¿ã®æå
¥ ããŒã¿ããŒã¹ã«åæããŒã¿ãæå
¥ããçºãã³ã³ããäžã§ã³ãã³ããå®è¡ããŸãã ãŸãã¯ã³ã³ããã®ç¶æ
ã®ç¢ºèªãè¡ããŸãã $ docker compose ps NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS pgvector-postgres-1 pgvector-postgres "docker-entrypoint.sâŠ" postgres 2 hours ago Up 2 hours 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pgvector-streamlit-1 pgvector-streamlit "streamlit run ./strâŠ" streamlit 2 hours ago Up 2 hours 0.0.0.0:8501->8501/tcp, :::8501->8501/tcp äžèšã§åºåãããã³ã³ããã®å
ãstreamlitã®ã³ã³ããäžã§ä»¥äžã®ããã«ããŠã³ãã³ããå®è¡ããŸãã $ docker exec -it pgvector-streamlit-1 python streamlit/seed.py ã³ãã³ãå®è¡åŸã«å床ãã©ãŠã¶ã§ä»¥äžã®ã¢ãã¬ã¹ã«ã¢ã¯ã»ã¹ããŸãã http://localhost:8501/ æå
¥ããããŒã¿ãäžèЧã§è¡šç€ºãããããã«ãªããŸããã Vectoræ€çŽ¢ã®ç¢ºèª å
ã»ã©ã¢ã¯ã»ã¹ããç»é¢ã®ãæäœãéžæãããããã¯ãã«æ€çŽ¢ããéžæããŸãã äžèšã®ãããªç»é¢ã衚瀺ãããããšæããŸãã 詊ãã«ãã¡ãã®æ€çŽ¢ããã¹ãã«ãMySQLããšå
¥åããŠæ€çŽ¢ããŠã¿ãŸãã äžèšã®ããã«çµæã衚瀺ãããŸãããå
容ãšããŠã¯ä»¥äžã®ãããªãã®ã§ããããã¡ããæ¬æã«ãMySQLããšããæèšã¯ãããŸããã å
å®¹ã®æåæ§ãæå³ãªã©ãããã£ãšãé¡äŒŒåºŠã®é«ãå
容é ã«äžŠã¹ãããšãã§ããŠããããšã確èªã§ããŸãã ãããã« æ¬èšäºã§å®è·µããPostgreSQLãã¯ãã«æ€çŽ¢ã·ã¹ãã ã®æ§ç¯ã¯ãAIæä»£ã®ããŒã¿æŽ»çšã«ãããéèŠãªç¬¬äžæ©ã§ãã åŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã®æ çµã¿ãè¶
ããæå³çè§£ãçµã¿èŸŒãã æ¬¡äžä»£æ€çŽ¢æè¡ ããæ¢åã€ã³ãã©ã§å®çŸããææ³ãå
·äœäŸãšå
±ã«è§£èª¬ããŸããã æ¬æ Œçãªå°å
¥ãæ€èšãããæ¹ã¯ããŸã㯠pgvectorå
¬åŒããã¥ã¡ã³ã ãš E5ã¢ãã«ã«é¢ããŠã®èšäº ã®ç²Ÿèªããå§ãããŸããå®éã®ãããã¯ã·ã§ã³ç°å¢ã§ã¯ã ã€ã³ããã¯ã¹åæ§ç¯æŠç¥ ãš ã¡ã¢ãªæé©å ãæåŠãåããéµãšãªããŸãã æåŸã«ãæ¬èšäºãçæ§ã®AI/MLãããžã§ã¯ãæšé²ã®äžå©ãšãªãã°å¹žãã§ãã The post PostgreSQLã§å®è·µãããã¯ãã«æ€çŽ¢å
¥éïŒAIæä»£ã®RDBMS掻çšè¡ first appeared on Sqripts .