
OSS
ã€ãã³ã
ãã¬ãžã³
æè¡ããã°
ã¯ããã«ïŒããŒã¿æŽ»çšã®çæ³ãšçŸå®ããããŠé²åããã¬ããã³ã¹ã®ç³»è ãããŒã¿é§ååçµå¶ãããããžã¿ã«ãã©ã³ã¹ãã©ãŒã¡ãŒã·ã§ã³ïŒDXïŒã®æšé²ããäŒæ¥ã®è³äžåœé¡ãšãªãäžãå€ãã®äŒæ¥ã¯äŸç¶ãšããŠããŒã¿ããããžãã¹äŸ¡å€ãåµåºããããã»ã¹ã«ãããŠã倧ããªå£ã«çŽé¢ããŠããŸããããŒã¿ãµã€ãšã³ãã£ã¹ããé«åºŠãªAIã¢ãã«ãæ§ç¯ããçµå¶å±€ãããŒã¿ããªãã³ãªæææ±ºå®ãç®æãäžæ¹ã§ãåºç€ãšãªããããŒã¿ããã®ãã®ã®ç®¡çãšçµ±å¶ã远ãã€ããŠããªãããšãããããžã§ã¯ãã®é倧ãªããã«ããã¯ãšãªã£ãŠããŸãã ãçµå¶äŒè°ã«ãããŠãå¶æ¥éšéãšããŒã±ãã£ã³ã°éšéããæåºãããKPIã®æ°å€ãåãããè°è«ãçŽç³Ÿããã ãå
šç€Ÿçãªé¡§å®¢ããŒã¿çµ±åãããžã§ã¯ãããéšçœ²ããšã®ããŒã¿ãµã€ãåã«ãã£ãŠé æ«ããããŠããã ãITéšéã¯ã»ãã¥ãªãã£ãå人æ
å ±ä¿è·æ³ãéµå®ããããããŒã¿æäŸã«æ
éã«ãªããããåŸããçµæãšããŠããžãã¹ã®ã¹ããŒããé»å®³ããŠããŸã£ãŠããã çæ§ã®çµç¹ã§ãããã®ãããªãžã¬ã³ãã«å¿åœããã¯ãªãã§ããããããããã®åé¡ã®æ ¹åºã«ããã®ã¯ãåå¥ã®BIããŒã«ãETLããŒã«ã®æ©èœäžè¶³ã§ã¯ãªããçµç¹ãšããŠçµ±äžããããããŒã¿ã¬ããã³ã¹ãââããªãã¡ããŒã¿ãäŒæ¥ã®éèŠè³ç£ãšããŠç®¡çããå®å
šã«çµ±å¶ããããã®å
æ¬çãªã«ãŒã«ãšæè¡çåºç€ââãäžåšã§ãããšããäºå®ã§ãã çã®ããŒã¿ã¬ããã³ã¹ãèšèšããããã«ã¯ãé廿°å幎ã«ããããšã³ã¿ãŒãã©ã€ãºããŒã¿ç®¡çã®ã¢ãŒããã¯ãã£ã®é²åã俯ç°ããçŸä»£ã®èŠä»¶ãæ£ç¢ºã«æããå¿
èŠããããŸããããŒã¿ç®¡çã®ã¢ãããŒãã¯ããã¯ãããžãŒã®é²åãšããžãã¹èŠä»¶ã®å€åã«äŒŽãã倧ãã3ã€ã®äžä»£ãçµãŠçºå±ããŠããŸããã äžä»£ äž»ãªã¢ãŒããã¯ã㣠䞻å°éšé æ žå¿æŠå¿µ ã¡ãªãã èª²é¡ ç¬¬1äžä»£ (ã2000幎代åå) MDM, ãšã³ã¿ãŒãã©ã€ãºDWH ITéšé æ£ç¢ºæ§, äžå€®éæš©ççµ±å¶ é«å質ãªããŒã¿, äžè²«æ§ã®æ
ä¿ ããªããªã®å€§å¹
ãªé
å»¶, æ¡åŒµæ§ã®æ¬ åŠ, ããžãã¹ã®å€åãžã®é©å¿é£ 第2äžä»£ (2010幎代ã) ã¯ã©ãŠãããŒã¿ã¬ã€ã¯, ã¢ãã³BI ããžãã¹éšé, åæéšé ã¢ãžãªãã£, ã»ã«ããµãŒãã¹, ããŒã¿ã®æ°äž»å é«éãªåæ, çŸå Žäž»å°ã®æè»ãªå¯Ÿå¿ ããŒã¿ã®ãµã€ãå, éè¯ããŒã¿ããŒãã®ä¹±ç«, ã»ãã¥ãªãã£ãªã¹ã¯å¢å€§, ä¿¡é Œæ§ã®äœäž 第3äžä»£ (2020幎代ã) çµ±åããŒã¿ãã©ãããã©ãŒã (ã¬ã€ã¯ããŠã¹, ããŒã¿ã¯ã©ãŠã) ITéšéãšããžãã¹éšéã®å調 ã¬ãŒãã¬ãŒã«, ã³ã³ããã¹ã, çµ±åã¬ããã³ã¹ ä¿¡é Œãšã¹ããŒãã®å®å
šãªäž¡ç«, AI掻çšãžã®é©å¿ å®è£
ã®æè¡çè€éæ§, çµç¹æåã®å€é©ã®å¿
èŠæ§ 第1äžä»£ã®äžå€®éæš©åã¢ãããŒãã§ã¯ãITéšéã峿 Œãªã²ãŒãããŒããŒãšããŠæ©èœããå
ç¢ãªãã¹ã¿ãŒããŒã¿ïŒMDMïŒãç¶æããããšã«æåããŸããããããžãã¹ã®æææ±ºå®é床ã«å¯ŸããããŒã¿äŸçµŠã®é
ããèŽåœçã§ããããã®ååãšããŠå°é ãã第2äžä»£ã®ã»ã«ããµãŒãã¹åã¢ãããŒãã¯ãçŸå Žã®ã¢ãžãªãã£ãåçã«åäžããããã®ã®ãèªç±ã®ä»£åãšããŠãã¬ããã³ã¹ã®æ¬ åŠããçã¿åºããŸãããéšééã§ã®KPIå®çŸ©ã®äžäžèŽãæ
å ±æŒæŽ©ãªã¹ã¯ã®å¢å€§ãé¡åšåããçµæãšããŠããŒã¿ã«å¯Ÿããçµç¹çãªä¿¡é Œãæãªãããäºæ
ãæããŠããŸãã çŸåšããšã³ã¿ãŒãã©ã€ãºã¢ãŒããã¯ãã£ãç®æãã¹ãã第3äžä»£ãã®ã¢ãã«ã¯ã第1äžä»£ã®ãçµ±å¶ããšç¬¬2äžä»£ã®ãèªç±ããæè¡çã«çµ±åãã詊ã¿ã§ãããã®çµ±åããŒã¿ã¬ããã³ã¹ã®æ žå¿ã¯ãã¬ãŒãã¬ãŒã«ããšãã³ã³ããã¹ããã«ãããŸããITéšéã¯ãŠãŒã¶ãŒã®è¡ãæãé»ãã颿ãã§ã¯ãªãããŠãŒã¶ãŒãè¿·ããå®å
šã«ããŒã¿ã掻çšã§ãããèè£
ãããé«ééè·¯ïŒã¬ãŒãã¬ãŒã«ä»ãã®åºç€ïŒããæäŸãã圹å²ãžãšé²åããŠããŸãã æ¬çš¿ã§ã¯ããã®ç¬¬3äžä»£ã®ã¬ããã³ã¹ãããŒã¿ãã©ãããã©ãŒã ã®å
éšã«çµã¿èŸŒã¿ãããã©ãŒãã³ã¹ãç ç²ã«ããããšãªããªã¢ã«ã¿ã€ã ã®çµ±å¶ãå¯èœã«ããæè¡çã€ãã€ãã©ãŒãšããŠã Databricks ã®ãUnity Catalogããš Snowflake ã®ãHorizon Catalogãã®ã¢ãŒããã¯ãã£ãçŽè§£ããŸããããã«ããããã®åŒ·åºãªã¬ããã³ã¹åºç€ã®äžã§ãdotData瀟ã®è£œå矀ãããã«ããŠãã»ãã¥ãªãã£ãæ
ä¿ãããŸãŸãæ¥åéšéäž»å°ã®é«åºŠãªAIåæããå®çŸããã®ãããã®å®è·µã¢ãããŒãã玹ä»ããŸãã ããŒã¿ã¬ããã³ã¹ã®äºé¢æ§ïŒå®ãã®ãã¬ãŒããšæ»ãã®ã¢ã¯ã»ã« ææ°ã®æè¡ç詳现ã«èžã¿èŸŒãåã«ãçŸä»£ã®ããŒã¿ã¬ããã³ã¹ãæããã¹ãæ¬è³ªçãªåœ¹å²ãåå®çŸ©ããŸããããŒã¿ã¬ããã³ã¹ã¯ãçžåããããã«èŠããäºã€ã®åŽé¢ããå®ãã®ãã¬ãŒãããšãæ»ãã®ã¢ã¯ã»ã«ããåæã«æºãããããžãã¹ã®ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ãšããŠæ©èœããªããã°ãªããŸããã å®ãã®ã¬ããã³ã¹ïŒãã¬ãŒãïŒ ãšã¯ãããŒã¿æŒæŽ©ãäžæ£å©çšãæ³ä»€éåãšãã£ãé倧ãªäºæ¥ãªã¹ã¯ããäŒæ¥ãé²è¡ããããã®ã¡ã«ããºã ã§ããäžåºŠã®æ
å ±æŒæŽ©ãçµå¶ã«èŽåœçãªãã¡ãŒãžãäžããããªãçŸä»£ã«ãããŠãå人æ
å ±ä¿è·æ³ãGDPRãšãã£ã峿 Œåããæ³èŠå¶ãžã®å¯Ÿå¿ã¯ãäºæ¥ç¶ç¶ã®å¿
é èŠä»¶ã§ãã äžæ¹ã§ æ»ãã®ã¬ããã³ã¹ïŒã¢ã¯ã»ã«ïŒ ãšã¯ãããŒã¿ã®ä¿¡é Œæ§ãæ£ç¢ºæ§ã鮮床ãã·ã¹ãã çã«æ
ä¿ãã誰ããå®å¿ããŠããŒã¿ãå©çšã§ããç°å¢ãæŽåããããšã§ããä¿¡é Œã§ããããŒã¿ã«è¿
éã«ã¢ã¯ã»ã¹ã§ããç¶æ
ããããæ°ããªããžãã¹ã€ã³ãµã€ãã®çºèŠãä¿ããããŒã¿ããªãã³ãªæææ±ºå®ãå
šç€Ÿçã«å éãããåååãšãªããŸãã ãã€ãŠããŒã¿ãè€æ°ã®ã·ã¹ãã ã«æ£åšããŠããæä»£ãã¬ããã³ã¹ããŸãããŒã«ããšã«åæããããµã€ãåããããåŸãŸããã§ãããããããDatabricksãæå±ãããã¬ã€ã¯ããŠã¹ããSnowflakeãæäŸãããAI Data Cloudãã«ãã£ãŠããã¹ãŠã®ããŒã¿ãšAIã¯ãŒã¯ããŒããåäžã®ãã©ãããã©ãŒã ã«çµ±åãããæä»£ãå°æ¥ããŸãããããã«ãããã¡ã¿ããŒã¿ãšããŒã¿ã®å®äœãåäžã®ã»ãã¥ãªãã£å¢çå
ã§ç®¡çãããã¢ã¯ã»ã¹ããªã·ãŒã®é©çšã«ã¿ã€ã ã©ã°ãçããªããçã®ãçµ±åã¬ããã³ã¹ããæè¡çã«å¯èœãšãªã£ãã®ã§ãã SnowflakeãDatabricksã§å®çŸãããæ¬¡äžä»£ã¬ããã³ã¹ã®ã6ã€ã®æè¡çèŠä»¶ã ããããã¯ãAIæä»£ã®ããŒã¿æŽ»çšã«äžå¯æ¬ ãªèŠä»¶ãã6ã€ã®æ±ããšããŠæŽçããDatabricks Unity CatalogãšSnowflake Horizon Catalogãããããã®èª²é¡ãã©ã®ããã«æè¡çã«è§£æ±ºããŠããã®ããå
·äœçãªã³ãŒãã¹ãããããæäœäŸã亀ããŠè§£èª¬ããŸããèªç€Ÿã«æé©ãªã¢ãŒããã¯ãã£ãèšèšããäžã§ãäž¡è
ã®ã¢ãããŒãã®éããçè§£ããããšã¯æ¥µããŠéèŠã§ãã æ±1ïŒDatabricks IQ/Snowflake Cortex AIã«ããã¢ã¯ãã£ãã¡ã¿ããŒã¿ç®¡çãšèªåã«ã¿ãã°å ããŒã¿æŽ»çšã®ç¬¬äžæ©ã¯ããèªç€Ÿã«ã©ã®ãããªããŒã¿ãååšããã©ãã«ããã®ãããè¿
éãã€æ£ç¢ºã«ææ¡ããããšã§ããæäœæ¥ã§ã¡ã³ããã³ã¹ãããåŸæ¥ã®éçãªããŒã¿ã«ã¿ãã°ã¯ãããã«é³è
åããŠããŸããšãã課é¡ãæ±ããŠããŸãããäž¡ãã©ãããã©ãŒã ã¯ãAIãæŽ»çšãããã¢ã¯ãã£ãã¡ã¿ããŒã¿ãã«ãã£ãŠãã®èª²é¡ã解決ããŸãã Databricks Unity CatalogãšDatabricks IQã«ããèªåææžå Databricksã§ã¯ãæ§é åã»éæ§é åããŒã¿ã«å ããæ©æ¢°åŠç¿ã¢ãã«ãããã·ã¥ããŒããšãã£ãAIè³ç£ãŸã§ãäžå
管çå¯èœã§ãã ç¹çãã¹ãã¯ãAIãšã³ãžã³ãDatabricks IQããæäŸããã¢ã¯ãã£ãã¡ã¿ããŒã¿æ©èœã§ããããŒã¿ã®äžèº«ãå®éã®ã¯ãšãªç¶æ³ãAIãè§£æããããŒãã«ãã«ã©ã ã®èª¬ææãèªåçæã»ææ¡ããŸããããã«ãããããŒã¿ãšã³ãžãã¢ãæ©ãŸããŠããããã¥ã¡ã³ãäœæã®å·¥æ°ã倧å¹
ã«åæžãããã¡ã¿ããŒã¿ãåžžã«ææ°ã®ç¶æ
ã«ä¿ãããŸãã Snowflakeã®Universal SearchãšããŒã¿åé¡ã®èªåå Snowflake Horizon Catalogã¯ãå€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒãå
èµãããšã³ã¿ãŒãã©ã€ãºæ€çŽ¢ãšã³ãžã³ãUniversal SearchããæäŸããŠããŸããããŒã¿ããŒã¹å
ã®ãªããžã§ã¯ãã ãã§ãªããMarketplaceã®ããŒã¿è£œåã«è³ããŸã§æšªæçãªæ€çŽ¢ãå¯èœã§ãã ãŠãŒã¶ãŒãSnowsightïŒWeb UIïŒã®æ€çŽ¢ããŒã«ãã¯ããŒãºããããªå¶æ¥æ¡ä»¶ãããéµäŸ¿çªå·ããšãã£ãèªç¶èšèªãå
¥åãããšãAIããªããžã§ã¯ãåãã³ã¡ã³ããéå»ã®ã¯ãšãªå±¥æŽããæèãè§£æããæé©ãªããŒãã«ãæç€ºããŸããç¹çãã¹ãã¯ãçŸåšã¢ã¯ãã£ããªããŒã«ãã¢ã¯ã»ã¹æš©éãæã€ãªããžã§ã¯ãã®ã¿ãæ€çŽ¢çµæã«è¡šç€ºãããç¹ã§ããæš©éã®ãªãæ©å¯ããŒã¿ã¯å®å
šã«é èœããããããããŒã¿ãã£ã¹ã«ããªãšé«åºŠãªã»ãã¥ãªãã£ãäž¡ç«ããŸãã ããã«ãã¬ããã³ã¹ã®åºç€ãšãªãå人æ
å ±ïŒPIIïŒã®æåšãèªåææ¡ãããããããŒã¿åé¡ã®èªååæ©èœãæäŸããŠããŸãã SQL -- 1. ã¹ããŒãå
šäœã®ããŒãã«ã«å¯Ÿããåé¡ãžã§ããã¹ã±ãžã¥ãŒã«ããèªåã¿ã°ä»ããæå¹å CALL SYSTEM$CLASSIFY_SCHEMA('hr.tables', {'auto_tag': true}); -- 2. ã¢ã«ãŠã³ãå
šäœã®ææ°ã®åé¡çµæãç£èŠã·ã¹ãã ã§ç¢ºèª SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_CLASSIFICATION_LATEST; -- 1. ã¹ããŒãå
šäœã®ããŒãã«ã«å¯Ÿããåé¡ãžã§ããã¹ã±ãžã¥ãŒã«ããèªåã¿ã°ä»ããæå¹å CALL SYSTEM $CLASSIFY_SCHEMA( 'hr.tables' , { 'auto_tag' : true}); -- 2. ã¢ã«ãŠã³ãå
šäœã®ææ°ã®åé¡çµæãç£èŠã·ã¹ãã ã§ç¢ºèª SELECT * FROM SNOWFLAKE . ACCOUNT_USAGE .DATA_CLASSIFICATION_LATEST; ããã«ãããããŒã¿ã¹ãã¥ã¯ãŒãã®éçšè² è·ãåçã«è»œæžãããç¶ç¶çãªããŒã¿ã®æ£åžããå®çŸããŸãã æ±2ïŒåçã»ãã¥ãªãã£çµ±å¶ã®å®è£
ïŒABACãã³ãŒãã§å®çŸ©ã»å®çŸããäŸ ã誰ã«ãã©ã®ããŒã¿ããã©ããŸã§èŠãããããåŸæ¥ã®ããŒã«ããŒã¹ã¢ã¯ã»ã¹å¶åŸ¡ïŒRBACïŒã§ã¯ãçµç¹ãããŒã¿ã®å¢å ã«äŒŽãããŒã«æ°ãççºãã管çãç Žç¶»ããã±ãŒã¹ãåŸãçµ¶ã¡ãŸããããã®èª²é¡ã«å¯ŸãããŠãŒã¶ãŒå±æ§ãšããŒã¿å±æ§ãåçã«è©äŸ¡ãã屿§ããŒã¹ã¢ã¯ã»ã¹å¶åŸ¡ïŒABACïŒãšããããã³ãŒããšããŠç®¡çãããããªã·ãŒ as Codeãã®ã¢ãããŒããæšæºãšãªãã€ã€ãããŸãã Databricksã«ãããè¡ãã£ã«ã¿ãŒãšåçã«ã©ã ãã¹ãã³ã° Unity Catalogã§ã¯ãSQL UDFïŒãŠãŒã¶ãŒå®çŸ©é¢æ°ïŒãçšããŠè¡ãã£ã«ã¿ãŒãã«ã©ã ãã¹ã¯ãå®çŸ©ããABACãå®çŸããŸãã 以äžã¯ã人äºéšéïŒHumanResourceDeptïŒã®ã¡ã³ããŒã«ã®ã¿ç€ŸäŒä¿éçªå·ïŒSSNïŒã®å¹³æã衚瀺ããä»éšéã«ã¯ãã¹ã¯ãããæååãè¿ãå®è£
äŸã§ãã SQL -- ãã¹ãã³ã°çšã®SQL UDFãäœæ CREATE FUNCTION ssn_mask(ssn STRING) RETURN CASE WHEN is_account_group_member('HumanResourceDept') THEN ssn ELSE '***-**-****' END; -- ããŒãã«äœææã«ã«ã©ã ãã¹ã¯ãé©çš CREATE TABLE users ( name STRING, ssn STRING MASK ssn_mask ); -- ãã¹ãã³ã°çšã®SQL UDFãäœæ CREATE FUNCTION ssn_mask (ssn STRING) RETURN CASE WHEN is_account_group_member( 'HumanResourceDept' ) THEN ssn ELSE '***-**-****' END ; -- ããŒãã«äœææã«ã«ã©ã ãã¹ã¯ãé©çš CREATE TABLE users ( name STRING, ssn STRING MASK ssn_mask ); ãããã®é¢æ°ã¯ã¯ãšãªå®è¡æã«åçã«è©äŸ¡ããããããããŒã¿ãç©ççã«åå²ããŠè€æ°ã®ãã¥ãŒãäœæããæéãçããéçšã®è€éæ§ãåçã«äœäžããŸãã Snowflakeã«ãããã¿ã°ããŒã¹ã®ãã¹ãã³ã°ãšæš©éãããã³ã° Snowflakeã¯ãã«ã©ã ããšã§ã¯ãªããä»äžããããã¿ã°ãã«å¯ŸããŠãã¹ãã³ã°ããªã·ãŒãçŽä»ãããšããã¹ã±ãŒã©ãã«ãªã¢ãããŒããæ¡çšããŠããŸããå€§èŠæš¡ç°å¢ã§ãã£ãŠããæ°åã®ããªã·ãŒã§å
šç€Ÿã®ã»ãã¥ãªãã£èŠä»¶ãç¶²çŸ
ã§ããŸãã ãŸãè¡ã¢ã¯ã»ã¹ããªã·ãŒã«ãããŠã¯ãããžãã¯ãããŒãã³ãŒãã£ã³ã°ãããæš©éãããã³ã°ããŒãã«ãåç
§ãããèšèšãæšå¥šãããŠããŸããçµç¹å€æŽæã«ããããªã·ãŒèªäœã«ã¯è§Šãããã¹ã¿ããŒã¿ã®æŽæ°ã®ã¿ã§å³åº§ã«å¯Ÿå¿å¯èœã§ãã SQL -- 1. ã»ãã¥ãªãã£ã¹ããŒãå
ã«æš©éãããã³ã°ããŒãã«ãäœæãããŒã¿ãæ¿å
¥ CREATE TABLE security.sales_entitlements (role_entitled string, region string); INSERT INTO security.sales_entitlements VALUES ('SALES_EU', 'eu'), ('SALES_US', 'us'); -- 2. ãããã³ã°ããŒãã«ãåç
§ããåçãªè¡ã¢ã¯ã»ã¹ããªã·ãŒãäœæ CREATE OR REPLACE ROW ACCESS POLICY security.regional_access AS (region_val varchar) RETURNS BOOLEAN -> CASE WHEN IS_ROLE_IN_SESSION('GLOBAL_MANAGER') THEN TRUE WHEN EXISTS ( SELECT 1 FROM security.sales_entitlements WHERE IS_ROLE_IN_SESSION(role_entitled) AND region = region_val ) THEN TRUE ELSE FALSE END; -- 3. ä¿è·å¯Ÿè±¡ããŒãã«ã«ããªã·ãŒããã€ã³ã ALTER TABLE sales.raw_data ADD ROW ACCESS POLICY security.regional_access ON (region); -- 1. ã»ãã¥ãªãã£ã¹ããŒãå
ã«æš©éãããã³ã°ããŒãã«ãäœæãããŒã¿ãæ¿å
¥ CREATE TABLE security .sales_entitlements (role_entitled string, region string); INSERT INTO security . sales_entitlements VALUES ( 'SALES_EU' , 'eu' ), ( 'SALES_US' , 'us' ); -- 2. ãããã³ã°ããŒãã«ãåç
§ããåçãªè¡ã¢ã¯ã»ã¹ããªã·ãŒãäœæ CREATE OR REPLACE ROW ACCESS POLICY security . regional_access AS (region_val varchar ) RETURNS BOOLEAN -> CASE WHEN IS_ROLE_IN_SESSION( 'GLOBAL_MANAGER' ) THEN TRUE WHEN EXISTS ( SELECT 1 FROM security . sales_entitlements WHERE IS_ROLE_IN_SESSION(role_entitled) AND region = region_val ) THEN TRUE ELSE FALSE END ; -- 3. ä¿è·å¯Ÿè±¡ããŒãã«ã«ããªã·ãŒããã€ã³ã ALTER TABLE sales . raw_data ADD ROW ACCESS POLICY security . regional_access ON (region); æ±3ïŒãªããŒãžã®ãªã¢ã«ã¿ã€ã 管çã»å¯èŠåã«ããä¿¡é Œæ§æ
ä¿ãšåè³ªç£æ»ã®èªåå ããã®ããã·ã¥ããŒãã®å£²äžæ°å€ã¯æ¬åœã«æ£ããã®ããããã®ããŒãã«å®çŸ©ã倿Žãããšãã©ã®AIã¢ãã«ã«åœ±é¿ãåºãã®ãã â ããŒã¿ã®åºæãšåœ±é¿ç¯å²ã远跡ããããŒã¿ãªããŒãžãšãåè³ªç¶æ
ãç£èŠãããªãã¶ãŒãããªãã£ã¯ãçµå¶å±€ãäºæ¥éšéããã®ãããŒã¿ã«å¯Ÿããä¿¡é Œããåã¡åãããã®çåœç·ã§ãã Databricksã«ãããèªååãããããŒã¿ãªããŒãž Unity Catalogã¯ãDatabricksäžã§å®è¡ããããã¹ãŠã®åŠçïŒSQLãPythonãªã©èšèªãåããïŒãç£èŠããããŒã¿ã®æµããããŒãã«ã¬ãã«ã®ã¿ãªãããã«ã©ã ïŒåïŒã¬ãã«ã§èªåçãã€ãªã¢ã«ã¿ã€ã ã«è¿œè·¡ããŸãããšãŒãžã§ã³ãã®ã€ã³ã¹ããŒã«ãã³ãŒãæ¹ä¿®ã¯äžåäžèŠã§ãã Catalog Explorerã®UIãããLineageãã¿ããéããSee Lineage Graphããã¯ãªãã¯ããã ãã§ãããŒã¿ã®äŸåé¢ä¿ãèŠèŠçãªã°ã©ããšããŠå
šç»é¢è¡šç€ºãããŸããç¹å®ã®ã«ã©ã ãã¯ãªãã¯ããã°ããã®ããŒã¿ãã©ãããæ¥ãŠãã©ã®ããã·ã¥ããŒããžæµããŠããã®ããç¬æã«ãã€ã©ã€ããããå®å
šãªå€æŽç®¡çãšè¿
éãªé害åå ã®ç¹å®ãå¯èœãšãªããŸãã Snowflakeã®Data Quality Monitoring (DMF) Snowflakeã¯ãData Metric Functions (DMF) ãçšããŠããŒã¿å質ãç¶ç¶çãã€èªåçã«ç£æ»ããä»çµã¿ãæäŸããŠããŸãããŠãŒã¶ãŒç¬èªã®ããžãã¹èŠä»¶ã«åºã¥ããå質ãã§ãã¯ïŒäŸïŒç¹å®ãã©ãŒãããã®ã¡ãŒã«ã¢ãã¬ã¹ã®å²åïŒãã«ã¹ã¿ã DMFãšããŠå®çŸ©ããã¹ã±ãžã¥ãŒã«å®è¡ãããããšãã§ããŸãã SQL -- äžæ£ãªã¡ãŒã«ã¢ãã¬ã¹åœ¢åŒãã«ãŠã³ãããã«ã¹ã¿ã DMFããã€ã³ãããæ¥æ¬¡ç£æ»ãã¹ã±ãžã¥ãŒã« ALTER TABLE hr.tables.customers ADD DATA METRIC FUNCTION governance.dmfs.invalid_email_count ON (email); ALTER TABLE hr.tables.customers SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC'; -- äžæ£ãªã¡ãŒã«ã¢ãã¬ã¹åœ¢åŒãã«ãŠã³ãããã«ã¹ã¿ã DMFããã€ã³ãããæ¥æ¬¡ç£æ»ãã¹ã±ãžã¥ãŒã« ALTER TABLE hr . tables .customers ADD DATA METRIC FUNCTION governance . dmfs .invalid_email_count ON (email); ALTER TABLE hr . tables .customers SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC' ; å®è¡çµæã¯Snowsightã®UIäžã§æç³»åã®æãç·ã°ã©ããšããŠèŠèŠåãããããŒã¿ãããžã¡ã³ãæ
åœè
ã¯ããŒã¿ã®ç°åžžå€ãå£åãäžç®ã§ææ¡ã§ããŸãã ãŸããSnowflakeã«ãããŠãDatabricksãšåæ§ã«ãèªåã§ããŒã¿ãªããŒãžãå¯èŠåããæ©èœãåãã£ãŠããŸãã æ±4ïŒããžãã¹ææšã®èšç®ããžãã¯äžå
åïŒDatabricks Unity Catalog Metrics/Snowflake Semantic Views ãéšééã§KPIïŒéèŠæ¥çžŸè©äŸ¡ææšïŒã®å®çŸ©ãç°ãªããæ°å€ãåããªãã â ããã¯å€ãã®äŒæ¥ã§çºçããæ ¹æ·±ã課é¡ã§ããçããŒã¿ãšBIããŒã«ãAIã®éã«ç«ã£ãŠãããžãã¹ã®æå³ïŒã»ãã³ãã£ã¯ã¹ïŒããäžå
管çããã®ãã»ãã³ãã£ãã¯ã¬ã€ã€ãŒã§ãã Databricks Unity Catalog Metrics Databricksã§ã¯ããUnity Catalog Metricsããå©çšããŠããžãã¹ææšã®èšç®ããžãã¯ãUnity Catalogå
ã«äžå
çã«ä¿åã»ç®¡çã§ããŸããããã«ãããBIããŒã«ãããŒãããã¯ãAIãšãŒãžã§ã³ãã®ã©ãããã¢ã¯ã»ã¹ããŠããçµç¹å
šäœã§åãå®çŸ©ã«åºã¥ããäžè²«æ§ã®ããæ°å€ãåç
§ããããšãå¯èœã«ãªããŸãã è€éãªéèšããžãã¯ãSQLã«éœåºŠèšè¿°ããã®ã§ã¯ãªããMEASURE() 颿°ãå©çšããŠã·ã³ãã«ãã€å®å
šã«ææšãåŒã³åºããŸãã SQL SELECT `Order Month`, `Order Status`, MEASURE(`Order Count`), MEASURE(`Total Revenue`) FROM orders_metric_view GROUP BY ALL; SELECT `Order Month` , `Order Status` , MEASURE( `Order Count` ), MEASURE( `Total Revenue` ) FROM orders_metric_view GROUP BY ALL; Snowflake Semantic ViewsãšCortex Analyst Snowflakeãåæ§ã«ãYAML圢åŒã§ããžãã¹ããžãã¯ãå®çŸ©ãããSemantic ViewsããæäŸããŠããŸããç¹çãã¹ãã¯ããã®ã¢ãã«å
ã«ãæ€èšŒæžã¿ã¯ãšãªããçµã¿èŸŒããç¹ã§ãããã®å®çŸ©ã¯ãèªç¶èšèªãæ£ç¢ºãªSQLã«å€æããçæAIæ©èœãCortex Analystãã«å¯Ÿãã匷åãªããã³ãããšããŠæ©èœããŸããRBACãå®å
šã«é©çšãããç¶æ
ã§ãçæAIããã«ã·ããŒã·ã§ã³ïŒãã£ãšããããåïŒãèµ·ããããšãªããæ£ç¢ºãªããžãã¹ããŒã¿ã«åºã¥ããåçãæç€ºããŸãã æ±5ïŒããŒã¿è€è£œãæé€ããZero-Copyã¢ãŒããã¯ãã£ïŒDelta SharingãšSecure Data Sharing å€éšããŒã«ãããŒãããŒäŒæ¥ãšé£æºããããã«ããŒã¿ãCSVçã§ãšã¯ã¹ããŒããããšããã®ç¬éã«ããŒã¿ã®é®®åºŠã倱ãããã¬ããã³ã¹ã®çµ±å¶å€ã«çœ®ããããšããèŽåœçãªã»ãã¥ãªãã£ãªã¹ã¯ãçºçããŸããããã解決ããã®ããããŒã¿ãç©ççã«ç§»åãããããšãªããã€ã³ã¿ã®å
±æã®ã¿ã§ã©ã€ãããŒã¿ãžã®ã¢ã¯ã»ã¹ãæäŸãããZero-CopyïŒãŒãã³ããŒïŒãã¢ãŒããã¯ãã£ã§ãã Databricksã®Delta SharingãšSnowflakeã®Secure Data Sharing Databricksã¯ãªãŒãã³ãœãŒã¹ã®ãããã³ã«ã§ãããDelta SharingãããSnowflakeã¯ãSecure Data SharingããããããæäŸããŠããŸãããããããæäŸåŽïŒProviderïŒãçŽæçãªUIãŸãã¯ã·ã³ãã«ãªSQLã§Shareãäœæããåä¿¡åŽïŒConsumerïŒã«æš©éãä»äžããã ãã§ãããŒã¿ã®è€è£œãäžåè¡ãããšãªããå³åº§ã«ææ°ã®ããŒã¿ãžã®ã»ãã¥ã¢ãªã¢ã¯ã»ã¹ãå¯èœã«ããŸãã äžæ¹ã§ãZero-Copyã¢ãŒããã¯ãã£ã¯æ¬è³ªçã«ããŒã¿å©çšæã«ãããã¯ãŒã¯éä¿¡ãçºçãããããããŒã¿è»¢éã«ãããæéãããŒã¿ã¢ããªã±ãŒã·ã§ã³ã®å¿çæ§èœã«åœ±é¿ãäžããããšã«ã¯æ³šæãå¿
èŠã§ãããã®åœ±é¿ãæå°åããããã«ãããŒã¿åŠçïŒSQLã¯ãšãªå®è¡ãªã©ïŒãããŒã¿ãœãŒã¹åŽã«å®è¡ãããŠè»¢éããŒã¿éãå°ããããã¯ãšãªããã·ã¥ããŠã³ã®ä»çµã¿ãåããããŠããããšãå€ãã§ãã æ±6ïŒãã³ããŒããã¯ã€ã³ãåé¿ãããªãŒãã³èŠæ ŒïŒIceberg/Delta/Unity Catalog/PolarisïŒã®æ¡çš ç¹å®ã®ãã³ããŒã®ç¬èªãã©ãŒãããã«ããŒã¿ãããã¯ã€ã³ããããšãå°æ¥çãªã¢ãŒããã¯ãã£å€æŽæã«å€å€§ãªç§»è¡ã³ã¹ããçºçããŸãã Databricksã¯ãDelta LakeãããDelta Sharingãã«å ããã¬ããã³ã¹ã¬ã€ã€ãŒã§ãããUnity Catalogããã®ãã®ã®ãªãŒãã³ãœãŒã¹åãçºè¡šããŸãããäžæ¹ã®SnowflakeãããªãŒãã³ãã©ãŒãããã§ãããApache Icebergãããã€ãã£ããµããŒããããªãŒãã³ã«ã¿ãã°ãPolarisããžã®ã¡ã¿ããŒã¿èªååææ©èœãæäŸããŠããŸããããã«ãããäŒæ¥ã¯ç¹å®ã®ãã³ããŒã«çžãããããšãªããå°æ¥ã«ããã£ãŠæè»ã§æ¡åŒµæ§ã®é«ãããŒã¿ãšã³ã·ã¹ãã ãç¶æã§ããŸãã å¢çãè¶
ããçµ±å¶ïŒå€éšã¢ããªã±ãŒã·ã§ã³ãšã®ã»ãã¥ã¢ãªé£æºèšèš SaaSãBIããŒã«ãé«åºŠãªAIãã©ãããã©ãŒã ãšãã£ããå€éšã¢ããªã±ãŒã·ã§ã³ããšèªç€Ÿã®ããŒã¿åºç€ã飿ºãããéãåŸæ¥ã®ããã¹ã¯ãŒããå
±æããã·ã¹ãã å
±éã¢ã«ãŠã³ããã¯ãæ
åœè
ã®ç°åã«äŒŽãç®¡çæŒãããã«ãŒããã©ãŒã¹æ»æã«å¯Ÿããè匱æ§ãšãã倧ããªãªã¹ã¯ãæ±ããŠããŸããã Databricksã«ãããã ãµãŒãã¹ããªã³ã·ãã«ïŒService PrincipalïŒ ãããSnowflakeã«ãããã Service User ãã¯ãèªååããŒã«ãã¢ããªã±ãŒã·ã§ã³ã®ããã«èšèšãããã人éã§ã¯ãªããç¹å¥ãªã¢ã€ãã³ãã£ãã£ã§ãã ãããã®ã¢ã«ãŠã³ãã¯ãã¹ã¯ãŒãèªèšŒãæé€ããOAuth 2.0ã®M2MããŒã¯ã³ãRSAããŒãã¢èªèšŒãšãã£ãã»ãã¥ã¢ãªæ¹åŒã匷å¶ããŸããæãéèŠãªç¹ã¯ããããã®å€éšé£æºã¢ã«ãŠã³ãããŸããUnity CatalogãHorizon Catalogã®åŒ·åºãªã¬ããã³ã¹ïŒABACãè¡ãã£ã«ã¿ãŒãç£æ»ãã°ïŒã®å®å
šãªçµ±å¶äžã«çœ®ããããšããããšã§ãã dotDataã®å®è·µã¢ãããŒãïŒãData GravityããããããAIãšã¬ããã³ã¹ã®çµ±å ãããŸã§ã«è©³è¿°ããæå
端ã®ã¯ã©ãŠãããŒã¿ãã©ãããã©ãŒã ã®ã¬ããã³ã¹åºç€ããããã«ããŠé«åºŠãªAIã«ããããžãã¹äŸ¡å€ïŒROIïŒã®åµåºãžãšçµã³ã€ããããITéšéã®éçšè² è·ãäžãã€ã€ãæ¥åéšéã®èªèµ°åãä¿ãããã®æãå
é²çãªè§£çã®äžã€ãããšã³ã¿ãŒãã©ã€ãºAIã®èªååãªãŒããŒã§ããdotData瀟ã®è£œåã¢ãããŒãã§ãã åŸæ¥ã®AIåæã§ã¯ãã¢ãã«åŠç¿ãç¹åŸŽéèšèšã®ããã«ããŒã¿ãŠã§ã¢ããŠã¹ããå€éšç°å¢ãžå€§éã®ããŒã¿ããæœåºã»ãšã¯ã¹ããŒããããå¿
èŠããããŸããããããåè¿°ã®éããããŒã¿ãå€ã«åºããç¬éã«ã»ãã¥ãªãã£ãªã¹ã¯ã¯å¢å€§ããã¬ããã³ã¹ã¯ç Žç¶»ããŸãã dotData瀟ã¯ããã®æ§é ç課é¡ãæ ¹æ¬ãã解決ããããã ãData GravityïŒããŒã¿ã®åŒåïŒããŒã¿ãåããã®ã§ã¯ãªããããŒã¿ãããå Žæãžèšç®åŠçã»AIãæã¡èŸŒãïŒã ãšããã¢ãŒããã¯ãã£ææ³ãæ¡çšããŸããããããŠãäž»å補åã§ãããdotData InsightããšãdotData Feature Factoryãã®åæ¹ã«ãããŠã Databricksããã³Snowflakeã®äž¡ãã©ãããã©ãŒã ãšã®ãã€ãã£ãçµ±å ãæãããŠããŸããåãã©ãããã©ãŒã ãšã®ãã€ãã£ãçµ±åã®è©³çްã«ã€ããŠã¯ã dotData on Databricks ããã³ dotData on Snowflake ã®åããŒãžã§è©³ããã玹ä»ããŠããŸãã æ¥åéšéã䞻圹ãšãªããdotData Insightãã®ããŒã¿åæåºç€çµ±å ã dotData Insight ãã¯ãããŒã¿ãµã€ãšã³ãã£ã¹ãäžåšã®æ¥åéšéã§ãã£ãŠããçŽæçãªUIãéããŠé«åºŠãªããžãã¹ã€ã³ãµã€ãã®çºèŠãæœçç«æ¡ãèªèµ°åã§ãããã©ãããã©ãŒã ã§ããçŽè¿ã®ã¢ããããŒãã«ãããDatabricksããã³SnowflakeããããŒã¿ãã³ããŒããã«è§£æã»ç¹åŸŽã®æœåºãå®è¡ã§ããããã«ãªããããããã®ã»ãã¥ãªãã£ãå®å
šã«ç¶æ¿ããããã«ãªããŸããã Databricksãšã®ãã€ãã£ãçµ±å ããŒã¿ã¯Delta Lakeäžã«ä¿æããããŸãŸãUnity Catalogã®é«åºŠãªããŒã¿ã¢ã¯ã»ã¹å¶åŸ¡ïŒABACçïŒãå®å
šã«äº«åã§ããŸãã dotDataã®AIã«ããè€éãªç¹åŸŽéæ¢çŽ¢ã¯å€éšã®èšç®ãªãœãŒã¹ã«äŸåãããDatabricksã®ãLakeflow JobsããéããŠçŽæ¥å®è¡ãããŸããããã«ãããåéšéã®ããŒãºã«åãããã»ãã¥ã¢ãªåæç°å¢ãå³åº§ã«ç«ã¡äžãããŸãã Snowflakeãšã®ãã€ãã£ãçµ±å dotDataã®å¿èéšã§ããç¬èªã®ç¹åŸŽéèªåèšèšãšã³ãžã³ããSnowflakeå
ã®ãSnowpark Container Services (SPCS)ãäžã§çŽæ¥å®è¡ãããŸããdotData Insightã®WebãµãŒãã¹ãã³ã³ããã¯Snowflakeç°å¢ã®å³æ Œãªã»ãã¥ãªãã£ç®¡çäžã§åäœãããããHorizon Catalogã§å®çŸ©ãããè¡ã¢ã¯ã»ã¹ããªã·ãŒããã¹ãã³ã°ã«ãŒã«ããAIãšã³ãžã³ã«å¯ŸããŠå®å
šã«åŒ·å¶ã»ç¶æ¿ãããŸãã æ¬çªå質ã®AIå®è£
ãå éãããdotData Feature Factoryã ããŒã¿ãµã€ãšã³ãã£ã¹ããæ©æ¢°åŠç¿ãšã³ãžãã¢åãã«ãç¹åŸŽéèšèšã®ããã»ã¹ãèªååã»ã¢ã»ããåããã dotData Feature Factory ãããŸããäž¡ãã©ãããã©ãŒã ã«å¯Ÿå¿ããæè»ãªãããã€ã¡ã³ããªãã·ã§ã³ãåããŠããŸãã Databricksç°å¢ã§ã®Lakeflow Jobsãšã«ã¿ãã°ã®æŽ»çš Databricksç°å¢ã«ãããŠãèšå€§ãªèšç®ãªãœãŒã¹ãèŠæ±ããããç¹åŸŽéèšèšãã®ããã»ã¹ã¯ãDatabricksã®ãã€ãã£ããªã¯ãŒã¯ãããŒãšã³ãžã³ã§ããLakeflow JobsãéããŠåæ£åŠçãããŸãããŠãŒã¶ãŒäŒæ¥ã¯Unity Catalogã«ããå
ç¢ãªã¢ã¯ã»ã¹å¶åŸ¡ã劥åããããšãªããdotDataã®ãäžçæå
端ã®ç¹åŸŽéèªåèšèšããå©çšå¯èœã«ãªããŸãã Snowflakeç°å¢ã§ã®SPCSå®è¡ãªãã·ã§ã³ åæ§ã«ãdotData Feature Factoryã«ã¯Snowflakeã®Snowpark Container Services (SPCS) ãæŽ»çšããŠå®è¡ãããªãã·ã§ã³ãæèŒãããŠããŸããããã«ãããSnowflakeå
ã«èç©ãããããŒã¿ãå€ã«åºãããšãªããå€§èŠæš¡ãªç¹åŸŽé空éã®æ¢çŽ¢ãšçæãSnowflakeã®ã³ã³ãã¥ãŒãããŒã«å
ã§å®å
šã«å®çµãããããšãã§ããŸãã ç¹çãã¹ãã¯ããããã®çµ±åç°å¢ã§çºèŠããã䟡å€ããç¹åŸŽéããæ¬çªå質ã»ã¹ã±ãŒã©ããªãã£ããã£ããç¹åŸŽéãã€ãã©ã€ã³ããšããŠèªåçæãããç¹ã§ããåŸæ¥ãå±äººåããŠæšãŠãããŠããããŒã¿å å·¥ããã»ã¹ãåå©çšå¯èœãªäŒæ¥ã®ãã¢ã»ããããšããŠã«ã¿ãã°äžã«èç©ãããAIéçºããã»ã¹å
šäœã®å¹çãšå質ãé£èºçã«åäžããPoCïŒæŠå¿µå®èšŒïŒããæ¬çªéçšãžã®ç§»è¡ãšãããæ»ã®è°·ããã¹ã ãŒãºã«è¶ããããšãã§ããŸãã ãããã« æ¬çš¿ã§è©³è¿°ããéããDatabricksã®Unity CatalogãSnowflakeã®Horizon Catalogã«ä»£è¡šãããæ¬¡äžä»£ã®ããŒã¿ã¬ããã³ã¹ã¯ããã¯ãåãªããã³ã³ãã©ã€ã¢ã³ã¹ã®ããã®å¶éã«ãŒã«ãã§ã¯ãããŸãããããã¯ãAIã®æã€åŒ·å€§ãªåãå®å
šãã€ççºçã«åŒãåºããããžãã¹ã®æææ±ºå®ãå
šç€ŸèŠæš¡ã§å éãããããã®ãçã®ããšã³ã¿ãŒãã©ã€ãºã®ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ããžãšæè¯ããŠããŸãã ããªã·ãŒãã³ãŒããšããŠç®¡çããAIãæŽ»çšããŠã¢ã¯ãã£ãã«ã¡ã¿ããŒã¿ãçæãããŒãã³ããŒã§å®å
šã«ããŒã¿ã飿ºããããã®å
ç¢ãªåºç€ã®äžã«ãdotDataãæäŸããç¹åŸŽéèªåèšèšãã©ãããã©ãŒã ããã€ãã£ãã«çµ±åããããšã§ãäŒæ¥ã¯ãITéšéãæ±ããã»ãã¥ãªãã£ã»çµ±å¶ããšãæ¥åéšéãæ±ããã¢ãžãªãã£ã»ã€ã³ãµã€ããããã€ãŠãªãé«ã次å
ã§äž¡ç«ãããããšãã§ããŸãã çµå¶å±€ãITéšéããããŠããŒã¿ãããžã¡ã³ããçœåŒããçæ§ã«ãšã£ãŠãããããã®äŒæ¥ç«¶äºã®åªäœæ§ã¯ãããã«è¿
éã«ããã€å®å
šã«ãçŸå Žã®æ¥åéšéãèªåŸããŠããŒã¿ããããžãã¹äŸ¡å€ãåŒãåºããããã«æžãã£ãŠããŸããããŒã¿ã®ãµã€ãåãåæããã»ã¹ã®å±äººåã«çµæ¢ç¬Šãæã¡ãæ»ããšå®ããäž¡ç«ããçµ±åããŒã¿ã¬ããã³ã¹ã®çã®äŸ¡å€ãäœéšããæãæ¥ãŠããŸãã dotDataãšäžç·ã«ãæ°ããªããžãã¹ãã£ã³ã¹ãèŠã€ããŸãããïŒ dotDataã®è£œå矀ã¯ãã客æ§ã®çµç¹ã®AIæç段éã«é¢ããããããŒã¿ã®å å·¥ããç¹åŸŽéèšèšãæ©æ¢°åŠç¿ã¢ãã«ã®æ§ç¯ã«è³ãããã»ã¹å
šäœãèªååãããšã³ã¿ãŒãã©ã€ãºã«ãããAIãšããŒã¿æŽ»çšã®æ°äž»åã匷åã«æ¯æŽããããŸãã DatabricksãSnowflakeã®åŒ·åºãªçµ±åããŒã¿ã¬ããã³ã¹åºç€ã®äžã§ã·ãŒã ã¬ã¹ã«åäœãããdotData Feature Factory ã«ããæ¬çªå質ã®ç¹åŸŽéãã€ãã©ã€ã³èªåçæããdotData Insightã«ããäºæ¥éšéäž»å°ã®ããžãã¹ã€ã³ãµã€ãèªåæ¢çŽ¢ã»AIããªã«ããŠã³åæã®ç䟡ãããã²ãèªèº«ã®ç°å¢ã§ã確ãããã ããã æ§ã
ãªããžãã¹èª²é¡ã®è§£æ±ºããŠãŒã¹ã±ãŒã¹ã«ã€ããŠã®ãçžè«ãææ°ã®è£œåãã¢ã®ãªã¯ãšã¹ãã«ã€ããŸããŠã¯ã以äžã®é£çµ¡å
ãŸãã¯ãåãåãããã©ãŒã ãããæ°è»œã«ãé£çµ¡ãã ãããçµå¶å±€ãäºæ¥éšéãåæéšéãITéšéã®ãã¹ãŠã®çæ§ã«ãèªååã«ãã確ããªããžãã¹äŸ¡å€ããæäŸããããŸãã 補åã»ãµãŒãã¹ã«é¢ãããåãåããã»ãã¢ã®ãªã¯ãšã¹ãïŒ contact-j@dotdata.com Webãåãåãããã©ãŒã ïŒ https://jp.dotdata.com/contact-us/ çæ§ã®ããŒã¿ããªãã³ãªçµç¹å€é©ãšããžãã¹ã®é£èºããdotDataãå
šåã§äŒŽèµ°ã»ãµããŒãããããŸãã The post æ»ããšå®ããäž¡ç«ããæ¬¡äžä»£ããŒã¿ã¬ããã³ã¹ïŒAIæä»£ã®çµ±åããŒã¿åºç€ãå®çŸããDatabricksãšSnowflake appeared first on dotData .
æ¬èšäºã¯ 2026 幎 1 æ 26 æ¥ ã«å
¬éãããã Top 10 best practices for Amazon EMR Serverless ãã翻蚳ãããã®ã§ãã Amazon EMR Serverless 㯠Amazon EMR ã®ãããã€ãªãã·ã§ã³ã® 1 ã€ã§ã Apache Spark ã Apache Hive ãªã©ã®ãªãŒãã³ãœãŒã¹ããã°ããŒã¿åæãã¬ãŒã ã¯ãŒã¯ããã¯ã©ã¹ã¿ãŒããµãŒããŒã®èšå®ã»ç®¡çã»ã¹ã±ãŒãªã³ã°ãªãã§å®è¡ã§ããŸããEMR Serverless ã¯ãããŒã¿ã¹ãã¬ãŒãžãã¹ããªãŒãã³ã°ããªãŒã±ã¹ãã¬ãŒã·ã§ã³ãã¢ãã¿ãªã³ã°ãã¬ããã³ã¹ã«ããã Amazon Web Services (AWS) ãµãŒãã¹ãšçµ±åãããµãŒããŒã¬ã¹åæãœãªã¥ãŒã·ã§ã³ãå®çŸããŸãã æ¬èšäºã§ã¯ãEMR Serverless ã¯ãŒã¯ããŒãã®ããã©ãŒãã³ã¹ãã³ã¹ããã¹ã±ãŒã©ããªãã£ãæé©åããããã®ãã¹ããã©ã¯ãã£ã¹ 10 éžã玹ä»ããŸããEMR Serverless ã䜿ãå§ããã°ããã®æ¹ããæ¢åã®æ¬çªã¯ãŒã¯ããŒããæ¹åãããæ¹ããå¹ççã§ã³ã¹ãå¹çã®é«ãããŒã¿åŠçãã€ãã©ã€ã³ã®æ§ç¯ã«åœ¹ç«ã€å
容ã§ãã以äžã®å³ã¯ãEMR Serverless ã®ãšã³ãããŒãšã³ãã¢ãŒããã¯ãã£ã瀺ããŠãããåæãã€ãã©ã€ã³ãžã®çµ±åæ¹æ³ã衚ããŠããŸãã 1. ã¢ããªã±ãŒã·ã§ã³ã¯äžåºŠå®çŸ©ããŠç¹°ãè¿ã䜿ã EMR Serverless ã¢ããªã±ãŒã·ã§ã³ ã¯ã¯ã©ã¹ã¿ãŒãã³ãã¬ãŒãã«çžåœãããžã§ãéä¿¡æã«ã€ã³ã¹ã¿ã³ã¹åãããåäœæããã«è€æ°ã®ãžã§ããåŠçã§ããŸããã¢ããªã±ãŒã·ã§ã³ãåå©çšããããšã§èµ·åã¬ã€ãã³ã·ãŒãåæžããéçšç®¡çãç°¡çŽ åã§ããŸãã EMR on EC2 äžæã¯ã©ã¹ã¿ãŒã®äžè¬çãªã¯ãŒã¯ãããŒ: EMR Serverless ã®äžè¬çãªã¯ãŒã¯ãããŒ: ã¢ããªã±ãŒã·ã§ã³ã¯èªå·±ç®¡çåã®ã©ã€ããµã€ã¯ã«ãåããŠãããå¿
èŠãªãšãã«æåæäœãªãã§ãªãœãŒã¹ãããããžã§ãã³ã°ããŸãããžã§ããéä¿¡ããããšèªåçã«ãã£ãã·ãã£ãããããžã§ãã³ã°ããŸããäºååæåãã£ãã·ãã£ã®ãªãã¢ããªã±ãŒã·ã§ã³ã§ã¯ããžã§ãå®äºåŸããã«ãªãœãŒã¹ãè§£æŸãããŸããäºååæåãã£ãã·ãã£ãèšå®ãããŠããå Žåãèšå®ãããã¢ã€ãã«ã¿ã€ã ã¢ãŠã (ããã©ã«ã 15 å) ãè¶
ãããšã¯ãŒã«ãŒã忢ããŸãããã®ã¿ã€ã ã¢ãŠãã¯ã CreateApplication ãŸã㯠UpdateApplication API ã® AutoStopConfig ã§ã¢ããªã±ãŒã·ã§ã³ã¬ãã«ã§èª¿æŽã§ããŸããããšãã°ããžã§ãã 30 åããšã«å®è¡ãããå Žåãã¢ã€ãã«ã¿ã€ã ã¢ãŠããå»¶é·ãããšå®è¡éã®èµ·åé
å»¶ãè§£æ¶ã§ããŸãã ã»ãšãã©ã®ã¯ãŒã¯ããŒãã«ã¯ããªã³ããã³ããã£ãã·ãã£ãé©ããŠããŸãããžã§ãã®èŠä»¶ã«åºã¥ããŠãªãœãŒã¹ãèªåã¹ã±ãŒãªã³ã°ããã¢ã€ãã«æã«ã¯èª²éãããŸãããETL ã¯ãŒã¯ããŒãããããåŠçãžã§ããæå€§éã®ãžã§ãå埩åãå¿
èŠãªã·ããªãªãªã©ãäžè¬çãªãŠãŒã¹ã±ãŒã¹ã«é©ããã³ã¹ãå¹çã®é«ãã¢ãããŒãã§ãã å³æèµ·åãå³å¯ã«æ±ããããã¯ãŒã¯ããŒãã«ã¯ããªãã·ã§ã³ã§ äºååæåãã£ãã·ã㣠ãèšå®ã§ããŸããäºååæåãã£ãã·ãã£ã¯ãæ°ç§ä»¥å
ã«ãžã§ããå®è¡ã§ãããã©ã€ããŒãšãšã°ãŒãã¥ãŒã¿ãŒã®ãŠã©ãŒã ããŒã«ãäœæããŸãããã ããããã©ãŒãã³ã¹ãåäžããåãã³ã¹ãã¯é«ããªããŸããäºååæåã¯ãŒã«ãŒã¯ã¢ããªã±ãŒã·ã§ã³ãåæ¢ç¶æ
ã«ãªããŸã§ã¢ã€ãã«äžãç¶ç¶çã«èª²éãããŸãããŸããäºååæåãã£ãã·ãã£ã¯ãžã§ããåäžã®ã¢ãã€ã©ããªãã£ãŸãŒã³ã«å¶éãããããå埩åãäœäžããŸãã äºååæåãã£ãã·ãã£ãæ€èšãã¹ãã±ãŒã¹: èµ·åã¬ã€ãã³ã·ãŒã蚱容ã§ããªãããµãç§ã® SLA èŠä»¶ãããæéçå¶çŽã®å³ãããžã§ã ãŠãŒã¶ãŒäœéšã峿å¿çã«äŸåããã€ã³ã¿ã©ã¯ãã£ãåæ æ°åããšã«å®è¡ãããé«é »åºŠã®æ¬çªãã€ãã©ã€ã³ ãã以å€ã®ã»ãšãã©ã®ã±ãŒã¹ã§ã¯ããªã³ããã³ããã£ãã·ãã£ãã³ã¹ããããã©ãŒãã³ã¹ãå埩åã®ãã©ã³ã¹ã«åªããŠããŸãã ãªãœãŒã¹ã®æé©åã«å ããŠãã¯ãŒã¯ããŒãå
šäœã§ã®ã¢ããªã±ãŒã·ã§ã³ã®æŽçæ¹æ³ãæ€èšããŠãã ãããæ¬çªã¯ãŒã¯ããŒãã§ã¯ãããžãã¹ãã¡ã€ã³ãããŒã¿ã®æ©å¯ã¬ãã«ããšã«å¥ã
ã®ã¢ããªã±ãŒã·ã§ã³ã䜿çšããŸããããã¢ããªã±ãŒã·ã§ã³ãåé¢ããããšã§ã¬ããã³ã¹ãåäžããéèŠãªãžã§ããšéèŠã§ãªããžã§ãéã®ãªãœãŒã¹ç«¶åãé²ããŸãã 2. AWS Graviton ããã»ããµ ã§äŸ¡æ Œæ§èœæ¯ãåäž é©åãªããã»ããµã¢ãŒããã¯ãã£ã®éžæã¯ãããã©ãŒãã³ã¹ãšã³ã¹ãã®äž¡æ¹ã«å€§ãã圱é¿ããŸãã Graviton ARM ããŒã¹ããã»ããµã¯ãx86_64 ãšæ¯èŒããŠåªããäŸ¡æ Œæ§èœæ¯ãå®çŸããŸãã EMR Serverless ã¯ææ°ã®ã€ã³ã¹ã¿ã³ã¹äžä»£ãå©çšå¯èœã«ãªããšèªåçã«æŽæ°ãããããã远å èšå®ãªãã§ææ°ã®ããŒããŠã§ã¢æ¹åã®æ©æµãåããããŸãã EMR Serverless ã§ Graviton ã䜿çšããã«ã¯ã CreateApplication ã§ã¢ããªã±ãŒã·ã§ã³äœææã« architecture ãã©ã¡ãŒã¿ã§ ARM64 ãæå®ããããæ¢åã®ã¢ããªã±ãŒã·ã§ã³ã«ã¯ UpdateApplication API ã䜿çšããŸã: aws emr-serverless create-application \ --name my-spark-app \ -- SPARK \ --architecture ARM64 \ --release-label emr-7.12.0 Graviton äœ¿çšæã®èæ
®äºé
: ãªãœãŒã¹ã®å¯çšæ§ â å€§èŠæš¡ã¯ãŒã¯ããŒãã®å ŽåãGraviton ã¯ãŒã«ãŒã®ãã£ãã·ãã£ãã©ã³ãã³ã°ã«ã€ã㊠AWS ã¢ã«ãŠã³ãããŒã ã«çžè«ããããšãæ€èšããŠãã ããã äºææ§ â äžè¬çã«äœ¿çšãããæšæºã©ã€ãã©ãªã®å€ã㯠Graviton (arm64) ã¢ãŒããã¯ãã£ãšäºææ§ããããŸããã䜿çšããŠãããµãŒãããŒãã£ããã±ãŒãžãã©ã€ãã©ãªã®äºææ§ãæ€èšŒããå¿
èŠããããŸãã ç§»è¡èšç» â Graviton ã®å°å
¥ã«ã¯æŠç¥çãªã¢ãããŒããåããŸããããæ°ããã¢ããªã±ãŒã·ã§ã³ã¯ããã©ã«ãã§ ARM64 ã¢ãŒããã¯ãã£ã§æ§ç¯ããæ¢åã®ã¯ãŒã¯ããŒãã¯äžæãæå°éã«æããæ®µéçãªç§»è¡èšç»ã§ ç§»è¡ ããŸããæ®µéçã«ç§»è¡ããããšã§ãä¿¡é Œæ§ãæãªããã«ã³ã¹ããšããã©ãŒãã³ã¹ãæé©åã§ããŸãã ãã³ãããŒã¯ã®å®æœ â æ£ç¢ºãªäŸ¡æ Œæ§èœæ¯ã¯ã¯ãŒã¯ããŒãã«ãã£ãŠç°ãªããŸããã¯ãŒã¯ããŒãåºæã®çµæãææ¡ããããã«ãç¬èªã®ãã³ãããŒã¯ã宿œããããšãæšå¥šããŸãã詳现ã¯ã Achieve up to 27% better price-performance for Spark workloads with AWS Graviton2 on Amazon EMR Serverless ããåç
§ããŠãã ããã 3. ããã©ã«ããæŽ»çšããå¿
èŠã«å¿ããŠã¯ãŒã«ãŒã驿£å ã¯ãŒã«ãŒ ã¯ã¯ãŒã¯ããŒãã®ã¿ã¹ã¯ãå®è¡ããããã«äœ¿çšãããŸããEMR Serverless ã®ããã©ã«ãèšå®ã¯ã»ãšãã©ã®ãŠãŒã¹ã±ãŒã¹ã«æé©åãããŠããŸãããåŠçæéã®æ¹åãã³ã¹ãå¹çã®æé©åã®ããã«ã¯ãŒã«ãŒã®ãµã€ãºã驿£åããå¿
èŠãããå ŽåããããŸããEMR Serverless ãžã§ããéä¿¡ããéã¯ãã¡ã¢ãªãµã€ãº (GB) ãã³ã¢æ°ãªã©ã®ã¯ãŒã«ãŒèšå®ã Spark ããããã£ã§å®çŸ©ããããšãæšå¥šããŸãã EMR Serverless ã®ããã©ã«ãã¯ãŒã«ãŒãµã€ãºã¯ 4 vCPUã16 GB ã¡ã¢ãªã20 GB ãã£ã¹ã¯ã§ããäžè¬çã«ã»ãšãã©ã®ãžã§ãã«ãã©ã³ã¹ã®åããæ§æã§ãããããã©ãŒãã³ã¹èŠä»¶ã«å¿ããŠãµã€ãºã調æŽããããšãã§ããŸããäºååæåã¯ãŒã«ãŒã«ç¹å®ã®ãµã€ãºãèšå®ããŠããå Žåã§ãããžã§ãéä¿¡æã«å¿
ã Spark ããããã£ãèšå®ããŠãã ãããäºååæåãã£ãã·ãã£ãè¶
ããŠã¹ã±ãŒã«ããéã«ãããã©ã«ãããããã£ã§ã¯ãªãæå®ããã¯ãŒã«ãŒãµã€ãºã䜿çšãããŸããSpark ã¯ãŒã¯ããŒãã®é©æ£åã§ã¯ããžã§ãã® vCPU:ã¡ã¢ãªæ¯çãéèŠã§ãããšã°ãŒãã¥ãŒã¿ãŒã®ä»®æ³ CPU ã³ã¢ãããã«å²ãåœãŠãã¡ã¢ãªéã¯ããã®æ¯çã§æ±ºãŸããŸããSpark ãšã°ãŒãã¥ãŒã¿ãŒã¯ããŒã¿ã广çã«åŠçããããã« CPU ãšã¡ã¢ãªã®äž¡æ¹ãå¿
èŠã§ãæé©ãªæ¯çã¯ã¯ãŒã¯ããŒãã®ç¹æ§ã«ãã£ãŠç°ãªããŸãã ãŸãã¯ä»¥äžã®ã¬ã€ãã³ã¹ãåèã«ããã¯ãŒã¯ããŒãåºæã®èŠä»¶ã«åºã¥ããŠèšå®ã調æŽããŠãã ããã ãšã°ãŒãã¥ãŒã¿ãŒèšå® 以äžã®è¡šã¯ãäžè¬çãªã¯ãŒã¯ããŒããã¿ãŒã³ã«åºã¥ãæšå¥šãšã°ãŒãã¥ãŒã¿ãŒèšå®ã§ã: ã¯ãŒã¯ããŒãã¿ã€ã æ¯ç CPU ã¡ã¢ãª èšå® ã³ã³ãã¥ãŒãã£ã³ã°éçŽå 1:2 16 vCPU 32 GB spark.emr-serverless.executor.cores=16spark.emr-serverless.executor.memory=32G æ±çš 1:4 16 vCPU 64 GB spark.emr-serverless.executor.cores=16spark.emr-serverless.executor.memory=64G ã¡ã¢ãªéçŽå 1:8 16 vCPU 108 GB spark.emr-serverless.executor.cores=16spark.emr-serverless.executor.memory=108G ãã©ã€ããŒèšå® 以äžã®è¡šã¯ãäžè¬çãªã¯ãŒã¯ããŒããã¿ãŒã³ã«åºã¥ãæšå¥šãã©ã€ããŒèšå®ã§ã: ã¯ãŒã¯ããŒãã¿ã€ã æ¯ç CPU ã¡ã¢ãª èšå® æ±çš 1:4 4 vCPU 16 GB spark.emr-serverless.driver.cores=4spark.emr-serverless.driver.memory=16G Apache Iceberg ã¯ãŒã¯ããŒã 1:8 (ã¡ã¿ããŒã¿ã«ãã¯ã¢ããçšã®å€§ããªãã©ã€ããŒ) 8 vCPU 60 GB spark.emr-serverless.driver.cores=8spark.emr-serverless.driver.memory=60G èšå®ãããã«ã¢ãã¿ãªã³ã°ããã³ãã¥ãŒãã³ã°ããã«ã¯ã Amazon CloudWatch ãžã§ãã¯ãŒã«ãŒã¬ãã«ã¡ããªã¯ã¹ ã§ã¯ãŒã¯ããŒãã®ãªãœãŒã¹æ¶è²»ãç£èŠããããã«ããã¯ãç¹å®ããŸããCPU 䜿çšçãã¡ã¢ãªäœ¿çšéããã£ã¹ã¯äœ¿çšçã®ã¡ããªã¯ã¹ã远跡ãã以äžã®è¡šãåèã«èšå®ã調æŽããŠãã ããã 芳枬ãããã¡ããªã¯ã¹ ã¯ãŒã¯ããŒãã¿ã€ã æšå¥šã¢ã¯ã·ã§ã³ 1 é«ã¡ã¢ãª (>90%)ãäœ CPU (<50%) ã¡ã¢ãªããŠã³ãã¯ãŒã¯ããŒã vCPU:ã¡ã¢ãªæ¯çãå¢å 2 é« CPU (>85%)ãäœã¡ã¢ãª (<60%) CPU ããŠã³ãã¯ãŒã¯ããŒã vCPU æ°ãå¢å ãã1:4 æ¯çãç¶æ (äŸ: 8 vCPU äœ¿çšæã¯ 32 GB ã¡ã¢ãª) 3 é«ã¹ãã¬ãŒãž I/Oãéåžžã® CPU/ã¡ã¢ãªãé·ãã·ã£ããã«æäœ ã·ã£ããã«éçŽå ãµãŒããŒã¬ã¹ã¹ãã¬ãŒãž ãŸã㯠ã·ã£ããã«æé©åãã£ã¹ã¯ ãæå¹å 4 å
šã¡ããªã¯ã¹ã§äœäœ¿çšç éå°ããããžã§ãã³ã° ã¯ãŒã«ãŒãµã€ãºãŸãã¯æ°ãåæž 5 äžè²«ããŠé«äœ¿çšç (>90%) éå°ããããžã§ãã³ã° ã¯ãŒã«ãŒä»æ§ãã¹ã±ãŒã«ã¢ãã 6 é »ç¹ãª GC äžæåæ¢** ã¡ã¢ãªå§è¿« ã¡ã¢ãªãªãŒããŒããããå¢å (10ã15%) **é »ç¹ãªã¬ããŒãžã³ã¬ã¯ã·ã§ã³ (GC) ã®äžæåæ¢ã¯ãSpark UI ã® Executors ã¿ãã§ç¢ºèªã§ããŸããGC time åããããäžè¬çã«ã¿ã¹ã¯æéã® 10% æªæºã§ããã¹ãã§ãããŸãããã©ã€ããŒãã°ã« GC (Allocation Failure)] ã¡ãã»ãŒãžãé »ç¹ã«å«ãŸããŠããå ŽåããããŸãã 4. T ã·ã£ããµã€ãžã³ã°ã§ã¹ã±ãŒãªã³ã°å¢çãå¶åŸ¡ ããã©ã«ãã§ã¯ãEMR Serverless 㯠åçãªãœãŒã¹å²ãåœãŠ (DRA) ã䜿çšããã¯ãŒã¯ããŒãã®éèŠã«åºã¥ããŠãªãœãŒã¹ãèªåã¹ã±ãŒãªã³ã°ããŸããEMR Serverless ã¯ãžã§ãã®ã¡ããªã¯ã¹ãç¶ç¶çã«è©äŸ¡ããŠã³ã¹ããšé床ãæé©åãããããå¿
èŠãªã¯ãŒã«ãŒæ°ãèŠç©ããå¿
èŠããããŸããã ã³ã¹ãæé©åãšäºæž¬å¯èœãªããã©ãŒãã³ã¹ã®ããã«ã以äžã®ããããã®ã¢ãããŒãã§ã¹ã±ãŒãªã³ã°ã®äžéãèšå®ã§ããŸã: ãžã§ãã¬ãã«ã§ spark.dynamicAllocation.maxExecutors ãã©ã¡ãŒã¿ãèšå® ã¢ããªã±ãŒã·ã§ã³ã¬ãã«ã®æå€§ãã£ãã·ã㣠ãèšå® åãžã§ãã® spark.dynamicAllocation.maxExecutors ãåå¥ã«çްãã調æŽããã®ã§ã¯ãªããã¯ãŒã¯ããŒããããã¡ã€ã«ã衚ã T ã·ã£ããµã€ãºãšããŠèšå®ãèããããšãã§ããŸã: ã¯ãŒã¯ããŒããµã€ãº ãŠãŒã¹ã±ãŒã¹ spark.dynamicAllocation.maxExecutors Small æ¢çŽ¢çã¯ãšãªãéçº 50 Medium 宿ç㪠ETL ãžã§ããã¬ããŒã 200 Large è€éãªå€æãå€§èŠæš¡åŠç 500 ãã® T ã·ã£ããµã€ãžã³ã°ã¢ãããŒãã«ããããã£ãã·ãã£ãã©ã³ãã³ã°ãç°¡çŽ åãããåã
ã®ãžã§ããæé©åããã®ã§ã¯ãªããã¯ãŒã¯ããŒãã«ããŽãªã«åºã¥ããŠããã©ãŒãã³ã¹ãšã³ã¹ãå¹çã®ãã©ã³ã¹ãåããŸãã EMR Serverless ãªãªãŒã¹ 6.10 以éã§ã¯ã spark.dynamicAllocation.maxExecutors ã®ããã©ã«ãå€ã¯ç¡å¶éã§ããããã以åã®ãªãªãŒã¹ã§ã¯ 100 ã§ãã EMR Serverless ã¯ãžã§ãã®åã¹ããŒãžã§å¿
èŠãªã¯ãŒã¯ããŒããšäžŠåæ§ã«åºã¥ããŠãã¯ãŒã«ãŒãèªåçã«ã¹ã±ãŒã«ã¢ãããŸãã¯ã¹ã±ãŒã«ããŠã³ããŸãããžã§ãã®ã¡ããªã¯ã¹ãç¶ç¶çã«è©äŸ¡ããŠã³ã¹ããšé床ãæé©åãããããã¯ãŒã«ãŒæ°ãèŠç©ããå¿
èŠããããŸããã ãã ããäºæž¬å¯èœãªã¯ãŒã¯ããŒãã®å Žåããšã°ãŒãã¥ãŒã¿ãŒæ°ãéçã«èšå®ãããå ŽåããããŸãããã®å Žå㯠DRA ãç¡å¹ã«ããŠãšã°ãŒãã¥ãŒã¿ãŒæ°ãæåã§æå®ããŸã: spark.dynamicAllocation.enabled=false spark.executor.instances=10 5. EMR Serverless ãžã§ãã«é©åãªã¹ãã¬ãŒãžãããããžã§ãã³ã° ã¹ãã¬ãŒãžãªãã·ã§ã³ãçè§£ããé©åã«ãµã€ãžã³ã°ããããšã§ããžã§ãã®å€±æãé²ããå®è¡æéãæé©åã§ããŸããEMR Serverless ã¯ããžã§ãå®è¡äžã®äžéããŒã¿ãåŠçããã¹ãã¬ãŒãžãªãã·ã§ã³ãè€æ°ãããŸããéžæããã¹ãã¬ãŒãžãªãã·ã§ã³ã¯ EMR ãªãªãŒã¹ãšãŠãŒã¹ã±ãŒã¹ã«ãã£ãŠç°ãªããŸããEMR Serverless ã§å©çšå¯èœãªã¹ãã¬ãŒãžãªãã·ã§ã³ã¯ä»¥äžã®ãšããã§ã: ã¹ãã¬ãŒãžã¿ã€ã EMR ãªãªãŒã¹ ãã£ã¹ã¯ãµã€ãºç¯å² ãŠãŒã¹ã±ãŒã¹ ã¡ãªãã ãµãŒããŒã¬ã¹ã¹ãã¬ãŒãž (æšå¥š) 7.12+ N/A (èªåã¹ã±ãŒãªã³ã°) ã»ãšãã©ã® Spark ã¯ãŒã¯ããŒããç¹ã«ããŒã¿éçŽåã¯ãŒã¯ããŒã ã¹ãã¬ãŒãžã³ã¹ããªã èªåã¹ã±ãŒãªã³ã° ãã£ã¹ã¯é害ã®åæž æå€§ 20% ã®ã³ã¹ãåæž æšæºãã£ã¹ã¯ 7.11 以å ã¯ãŒã«ãŒããã 20ã200 GB 10 TB æªæºã®ããŒã¿ã»ãããåŠçããå°ãäžèŠæš¡ã¯ãŒã¯ããŒã ã·ã³ãã«ãªèšå® ããã©ã«ã 20 GB ã§ã»ãšãã©ã®ã¯ãŒã¯ããŒãã«å¯Ÿå¿ æé©ãªã¹ã«ãŒãããã«ã¯æå€§ 200 GB ã·ã£ããã«æé©åãã£ã¹ã¯ 7.1.0+ ã¯ãŒã«ãŒããã 20ã2,000 GB ãã«ã TB ãåŠçããå€§èŠæš¡ ETL ã¯ãŒã¯ããŒã é« IOPS ãšã¹ã«ãŒããã ã¯ãŒã«ãŒãããæå€§ 2 TB ã®ãã£ãã·ã㣠ã¹ãã¬ãŒãžèšå®ãã¯ãŒã¯ããŒãã®ç¹æ§ã«åãããããšã§ãEMR Serverless ãžã§ããå€§èŠæš¡ã§ãå¹ççãã€å®å®çã«å®è¡ã§ããŸãã 6. ãã«ã AZ ãããã©ã«ãã§çµã¿èŸŒã¿ã®å埩åãæäŸ EMR Serverless ã¢ããªã±ãŒã·ã§ã³ã¯ãäºååæåãã£ãã·ãã£ãæå¹ã§ãªãå Žåãæåãããã«ã AZ ã§ãããã§ã€ã«ãªãŒããŒãçµã¿èŸŒãŸããŠãããããæåæäœãªãã§ AZ é害ã«å¯Ÿå¿ã§ããŸããåäžã®ãžã§ãã¯åäžã®ã¢ãã€ã©ããªãã£ãŸãŒã³å
ã§åäœããã¯ãã¹ AZ ããŒã¿è»¢éã³ã¹ããé²ããŸããåŸç¶ã®ãžã§ãã¯è€æ°ã® AZ ã«é©åã«åæ£ãããŸããEMR Serverless ã AZ ã®éå®³ãæ€åºãããšãæ°ãããžã§ããæ£åžžãª AZ ã«éä¿¡ããAZ é害ã«ããããããã¯ãŒã¯ããŒãã®å®è¡ãç¶ç¶ã§ããŸãã EMR Serverless ã®ãã«ã AZ æ©èœãæå€§éã«æŽ»çšããã«ã¯ã以äžã確èªããŠãã ãã: è€æ°ã®ã¢ãã€ã©ããªãã£ãŸãŒã³ã«ãŸããããµãããããéžæã㊠VPC ãžã®ãããã¯ãŒã¯æ¥ç¶ ãèšå® ã¢ããªã±ãŒã·ã§ã³ãåäžã® AZ ã«å¶éããäºååæåãã£ãã·ãã£ãé¿ãã ã¯ãŒã«ãŒã®ã¹ã±ãŒãªã³ã°ããµããŒãããããã«åãµããããã«åå㪠IP ã¢ãã¬ã¹ãããããšãç¢ºèª ãã«ã AZ ã«å ããŠãAmazon EMR 7.1 以éã§ã¯ ãžã§ãã®å埩å ãæå¹ã«ã§ãããšã©ãŒãçºçããå Žåã«ãžã§ããèªåçã«ãªãã©ã€ã§ããŸããè€æ°ã®ã¢ãã€ã©ããªãã£ãŸãŒã³ãèšå®ãããŠããå Žåãå¥ã® AZ ã§ããªãã©ã€ãããŸãããã®æ©èœã¯ ããã ãžã§ããš ã¹ããªãŒãã³ã° ãžã§ãã®äž¡æ¹ã§æå¹ã«ã§ããŸããããªãã©ã€åäœã¯äž¡è
ã§ç°ãªããŸãã æå€§ãªãã©ã€åæ°ãå®çŸ©ãããªãã©ã€ããªã·ãŒãæå®ããŠãžã§ãã®å埩åãèšå®ããŸããããããžã§ãã®ããã©ã«ãã¯èªåãªãã©ã€ãªã (maxAttempts=1) ã§ããã¹ããªãŒãã³ã°ãžã§ãã§ã¯ãEMR Serverless ã¯ç¡å¶éã«ãªãã©ã€ãã1 æé以å
ã« 5 å倱æãããšãªãã©ã€ã忢ããã¹ã©ãã·ã³ã°é²æ¢æ©èœãçµã¿èŸŒãŸããŠããŸãããã®ãããå€ã¯ 1ã10 åã®éã§èšå®ã§ããŸãã詳现ã¯ã Job resiliency ããåç
§ããŠãã ããã ãžã§ãããã£ã³ã»ã«ããå¿
èŠãããå Žåãããã©ã«ãã®å³æçµäºã§ã¯ãªãããžã§ããã¯ãªãŒã³ã«ã·ã£ããããŠã³ããããã® ç¶äºæé ãæå®ã§ããŸããã«ã¹ã¿ã ã¯ãªãŒã³ã¢ããã¢ã¯ã·ã§ã³ãå®è¡ããå¿
èŠãããå Žåã¯ãã«ã¹ã¿ã ã·ã£ããããŠã³ããã¯ãå«ããããŸãã ãã«ã AZ ãµããŒããèªåãžã§ããªãã©ã€ãã°ã¬ãŒã¹ãã«ã·ã£ããããŠã³æéãçµã¿åãããããšã§ãäžæã«èããæåæäœãªãã§ããŒã¿ã®æŽåæ§ãç¶æã§ãã EMR Serverless ã¯ãŒã¯ããŒããæ§ç¯ã§ããŸãã 7. VPC çµ±åã§ã»ãã¥ãªãã£ãšæ¥ç¶æ§ãæ¡åŒµ ããã©ã«ãã§ã¯ãEMR Serverless 㯠Amazon Simple Storage Service (Amazon S3)ã AWS Glue ã Amazon CloudWatch Logs ã AWS Key Management Service (AWS KMS)ã AWS Security Token Service (AWS STS)ã Amazon DynamoDB ã AWS Secrets Manager ãªã©ã® AWS ãµãŒãã¹ã«ã¢ã¯ã»ã¹ã§ããŸãã Amazon Redshift ã Amazon Relational Database Service (Amazon RDS) ãªã© VPC å
ã®ããŒã¿ã¹ãã¢ã«æ¥ç¶ããã«ã¯ãEMR Serverless ã¢ããªã±ãŒã·ã§ã³ã® VPC ã¢ã¯ã»ã¹ãèšå®ããå¿
èŠããããŸãã EMR Serverless ã¢ããªã±ãŒã·ã§ã³ã® VPC ã¢ã¯ã»ã¹ãèšå®ããéã¯ãæé©ãªããã©ãŒãã³ã¹ãšã³ã¹ãå¹çã®ããã«ä»¥äžã®èæ
®äºé
ã«çæããŠãã ãã: åå㪠IP ã¢ãã¬ã¹ãèšç» â åã¯ãŒã«ãŒã¯ãµããããå
ã§ 1 ã€ã® IP ã¢ãã¬ã¹ã䜿çšããŸãããžã§ãã®ã¹ã±ãŒã«ã¢ãŠãæã«èµ·åãããã¯ãŒã«ãŒãå«ãŸããŸããIP ã¢ãã¬ã¹ãäžè¶³ãããšããžã§ããã¹ã±ãŒã«ã§ããããžã§ãã®å€±æã«ã€ãªããå¯èœæ§ããããŸããæé©ãªããã©ãŒãã³ã¹ã®ããã« ãµãããããã©ã³ãã³ã°ã®ãã¹ããã©ã¯ãã£ã¹ ã«åŸã£ãŠããããšã確èªããŠãã ããã ãã©ã€ããŒããµããããã®ã¢ããªã±ãŒã·ã§ã³ã«ã¯ Amazon S3 çšã²ãŒããŠã§ã€ãšã³ããã€ã³ã ãèšå® â VPC ãšã³ããã€ã³ããªãã§ãã©ã€ããŒããµããããã§ EMR Serverless ãå®è¡ãããšãAmazon S3 ãã©ãã£ãã¯ã NAT ã²ãŒããŠã§ã€çµç±ã§ã«ãŒãã£ã³ã°ããã远å ã®ããŒã¿è»¢éæéãçºçããŸããS3 çš VPC ãšã³ããã€ã³ãã«ããããã©ãã£ãã¯ã VPC å
ã«ä¿æããã³ã¹ããåæžã㊠Amazon S3 æäœã®ããã©ãŒãã³ã¹ãåäžã§ããŸãã ãããã¯ãŒã¯ã€ã³ã¿ãŒãã§ãŒã¹ã® AWS Config ã³ã¹ãã管ç â EMR Serverless ã¯åã¯ãŒã«ãŒã«å¯Ÿã㊠AWS Config ã«ãšã©ã¹ãã£ãã¯ãããã¯ãŒã¯ã€ã³ã¿ãŒãã§ãŒã¹ã¬ã³ãŒããçæããã¯ãŒã¯ããŒãã®ã¹ã±ãŒã«ã«äŒŽãã³ã¹ããèç©ãããå¯èœæ§ããããŸããEMR Serverless ãããã¯ãŒã¯ã€ã³ã¿ãŒãã§ãŒã¹ã® AWS Config 远跡ãäžèŠãªå Žåã¯ããªãœãŒã¹ããŒã¹ã®é€å€ãã¿ã°ä»ãæŠç¥ã䜿çšããŠãã£ã«ã¿ãªã³ã°ããä»ã®ãªãœãŒã¹ã® AWS Config ã«ãã¬ããžã¯ç¶æããããšãæ€èšããŠãã ããã 詳现ã¯ã Configuring VPC access for EMR Serverless applications ããåç
§ããŠãã ããã 8. ãžã§ãéä¿¡ãšäŸåé¢ä¿ç®¡çãç°¡çŽ å EMR Serverless 㯠StartJobRun API ã«ããæè»ãªãžã§ãéä¿¡ããµããŒãããŠãããå®å
šãª spark-submit æ§æãåãä»ããŸããã©ã³ã¿ã€ã ç°å¢ã®èšå®ã«ã¯ã spark.emr-serverless.driverEnv ãš spark.executorEnv ãã¬ãã£ãã¯ã¹ã䜿çšããŠããã©ã€ããŒãšãšã°ãŒãã¥ãŒã¿ãŒããã»ã¹ã®ç°å¢å€æ°ãèšå®ããŸããæ©å¯èšå®ãã©ã³ã¿ã€ã åºæã®èšå®ãæž¡ãéã«ç¹ã«äŸ¿å©ã§ãã Python ã¢ããªã±ãŒã·ã§ã³ã®å Žåãvenv ãäœæããtar.gz ã¢ãŒã«ã€ããšããŠããã±ãŒãžåãããã spark.archives ã䜿çšã㊠Amazon S3 ã«ã¢ããããŒãããé©å㪠PYSPARK_PYTHON ç°å¢å€æ°ãèšå®ããŠãä»®æ³ç°å¢ã§äŸåé¢ä¿ãããã±ãŒãžåãããšããã©ã€ããŒãšãšã°ãŒãã¥ãŒã¿ãŒã¯ãŒã«ãŒå
šäœã§ Python ã®äŸåé¢ä¿ãå©çšã§ããŸãã é«è² è·æã®å¶åŸ¡ãåäžãããã«ã¯ã ãžã§ãã®åæå®è¡ãšãã¥ãŒã€ã³ã° (EMR 7.0.0+ ã§å©çšå¯èœ) ãæå¹ã«ããŠãåæã«å®è¡ã§ãããžã§ãæ°ãå¶éããŸãããã®æ©èœã«ãããåæå®è¡å¶éãè¶
ããŠéä¿¡ããããžã§ãã¯ããªãœãŒã¹ãå©çšå¯èœã«ãªããŸã§ãã¥ãŒã«å
¥ããããŸãã ãžã§ãã®åæå®è¡ãšãã¥ãŒèšå®ã¯ã CreateApplication ãŸã㯠UpdateApplication API ã® SchedulerConfiguration ããããã£ã§èšå®ã§ããŸãã --scheduler-configuration '{"maxConcurrentRuns": 5, "queueTimeoutMinutes": 30}' 9. EMR Serverless ã®èšå®ã§å¶éãé©çš EMR Serverless ã¯ã¯ãŒã¯ããŒãã®éèŠã«åºã¥ããŠãªãœãŒã¹ãèªåã¹ã±ãŒãªã³ã°ããã»ãšãã©ã®ãŠãŒã¹ã±ãŒã¹ã§ Spark èšå®ã®ãã¥ãŒãã³ã°ãªãã§æ©èœããæé©åãããããã©ã«ããçšæãããŠããŸããã³ã¹ã管çã®ããã«ãäºç®ãšããã©ãŒãã³ã¹èŠä»¶ã«åã£ããªãœãŒã¹å¶éãèšå®ã§ããŸããé«åºŠãªãŠãŒã¹ã±ãŒã¹ã§ã¯ããªãœãŒã¹æ¶è²»ã现ãã調æŽããã¯ã©ã¹ã¿ãŒããŒã¹ã®ãããã€ã¡ã³ããšåçã®å¹çãå®çŸããèšå®ãªãã·ã§ã³ãæäŸããŠããŸããå¶éãé©åã«èšå®ããããšã§ãããã©ãŒãã³ã¹ãšã³ã¹ãå¹çã®ãã©ã³ã¹ãåããŸãã å¶éã¿ã€ã ç®ç èšå®æ¹æ³ ãžã§ãã¬ãã« åã
ã®ãžã§ãã®ãªãœãŒã¹ãå¶åŸ¡ spark.dynamicAllocation.maxExecutors ãŸã㯠spark.executor.instances ã¢ããªã±ãŒã·ã§ã³ã¬ãã« ã¢ããªã±ãŒã·ã§ã³ãŸãã¯ããžãã¹ãã¡ã€ã³ããšã®ãªãœãŒã¹ãå¶é ã¢ããªã±ãŒã·ã§ã³äœææãŸãã¯æŽæ°æã«æå€§ãã£ãã·ãã£ãèšå® ã¢ã«ãŠã³ãã¬ãã« å
šã¢ããªã±ãŒã·ã§ã³ã«ãããç°åžžãªãªãœãŒã¹ã¹ãã€ã¯ã鲿¢ èªå調æŽå¯èœãªãµãŒãã¹ã¯ã©ãŒã¿ã Max concurrent vCPUs per account ãã Service Quotas ã³ã³ãœãŒã« ããåŒãäžãããªã¯ãšã¹ã ããã 3 ã€ã®ã¬ã€ã€ãŒã®å¶éã飿ºããŠãç°ãªãã¹ã³ãŒãã§æè»ã«ãªãœãŒã¹ã管çã§ããŸããã»ãšãã©ã®ãŠãŒã¹ã±ãŒã¹ã§ã¯ãT ã·ã£ããµã€ãžã³ã°ã¢ãããŒãã«ãããžã§ãã¬ãã«ã®å¶éèšå®ã§ååã§ãããã¢ããªã±ãŒã·ã§ã³ã¬ãã«ãšã¢ã«ãŠã³ãã¬ãã«ã®å¶éã¯ã³ã¹ã管çã®è¿œå çãªã¬ãŒãã¬ãŒã«ã«ãªããŸãã 10. CloudWatchãPrometheusãGrafana ã§ã¢ãã¿ãªã³ã° EMR Serverless ã¯ãŒã¯ããŒãã®ã¢ãã¿ãªã³ã°ã«ããããããã°ãã³ã¹ãæé©åãããã©ãŒãã³ã¹è¿œè·¡ã容æã«ãªããŸããEMR Serverless ã¯é£æºãã 3 ã€ã®ã¢ãã¿ãªã³ã°éå±€ããããŸã: Amazon CloudWatchã Amazon Managed Service for Prometheus ã Amazon Managed Grafana ã§ãã Amazon CloudWatch â CloudWatch çµ±å ã¯ããã©ã«ãã§æå¹ã«ãªã£ãŠãããAWS/EMRServerless åå空éã«ã¡ããªã¯ã¹ãçºè¡ããŸããEMR Serverless ã¯ã¢ããªã±ãŒã·ã§ã³ã¬ãã«ããžã§ããã¯ãŒã«ãŒã¿ã€ãããã£ãã·ãã£å²ãåœãŠã¿ã€ãã¬ãã«ã§æ¯å CloudWatch ã«ã¡ããªã¯ã¹ãéä¿¡ããŸããCloudWatch ã䜿çšããŠãã¯ãŒã¯ããŒãã®å¯èŠ³æž¬æ§ãé«ãã ããã·ã¥ããŒã ãèšå®ãããããžã§ãã®å€±æãã¹ã±ãŒãªã³ã°ã®ç°åžžãSLA éåã«å¯Ÿãã ã¢ã©ãŒã ãèšå®ã§ããŸããCloudWatch ãš EMR Serverless ã䜿çšããããšã§ããŠãŒã¶ãŒã«åœ±é¿ãåºãåã«åé¡ãæ€ç¥ã§ããŸãã Amazon Managed Service for Prometheus â EMR Serverless ãªãªãŒã¹ 7.1+ ã§ã¯ãPrometheus ãæå¹ã«ããŠè©³çŽ°ãª Spark ãšã³ãžã³ã¡ããªã¯ã¹ ã Amazon Managed Service for Prometheus ã«ããã·ã¥ã§ããã¡ã¢ãªäœ¿çšéãã·ã£ããã«ããªã¥ãŒã ãGC å§åãªã©ãšã°ãŒãã¥ãŒã¿ãŒã¬ãã«ã§å¯èŠåã§ããŸããã¡ã¢ãªå¶çŽã®ãããšã°ãŒãã¥ãŒã¿ãŒã®ç¹å®ãã·ã£ããã«ãå€ãã¹ããŒãžã®æ€åºãããŒã¿ã¹ãã¥ãŒã®çºèŠã«æŽ»çšã§ããŸãã Amazon Managed Grafana â Grafana 㯠CloudWatch ãš Prometheus ã®äž¡æ¹ã®ããŒã¿ãœãŒã¹ã«æ¥ç¶ããå¯èŠ³æž¬æ§ãšçžé¢åæãçµ±åããåäžç»é¢ãšããŠå©çšã§ããŸãã3 ã€ã®éå±€ãçµã¿åãããããšã§ãã€ã³ãã©ã¹ãã©ã¯ãã£ã®åé¡ãšã¢ããªã±ãŒã·ã§ã³ã¬ãã«ã®ããã©ãŒãã³ã¹åé¡ãé¢é£ä»ããããŸãã 远跡ãã¹ãäž»èŠã¡ããªã¯ã¹: ãžã§ãã®å®äºæéãšæåç ã¯ãŒã«ãŒã®äœ¿çšçãšã¹ã±ãŒãªã³ã°ã€ãã³ã ã·ã£ããã«ã®èªã¿åã/æžã蟌ã¿ããªã¥ãŒã ã¡ã¢ãªäœ¿çšãã¿ãŒã³ 詳现ã¯ã Monitor Amazon EMR Serverless workers in near real time using Amazon CloudWatch ããåç
§ããŠãã ããã ãŸãšã æ¬èšäºã§ã¯ãããã©ãŒãã³ã¹ã®æé©åãã³ã¹ã管çãå€§èŠæš¡ã§ã®å®å®ããéçšãå®çŸããããã® Amazon EMR Serverless ã®ãã¹ããã©ã¯ãã£ã¹ 10 éžã玹ä»ããŸãããã¢ããªã±ãŒã·ã§ã³èšèšãã¯ãŒã¯ããŒãã®é©æ£åãã¢ãŒããã¯ãã£ã®éžæã«æ³šåããããšã§ãå¹ççã§å埩åã®ããããŒã¿åŠçãã€ãã©ã€ã³ãæ§ç¯ã§ããŸãã 詳现ã¯ã Getting started with EMR Serverless ãã¬ã€ããåç
§ããŠãã ããã èè
ã«ã€ã㊠Karthik Prabhakar Karthik ã¯ãAmazon Web Services (AWS) ã® Amazon EMR ããŒã¿åŠçãšã³ãžã³ã¢ãŒããã¯ãã§ãã忣ã·ã¹ãã ã¢ãŒããã¯ãã£ãšã¯ãšãªæé©åãå°éãšããå€§èŠæš¡ããŒã¿åŠçã¯ãŒã¯ããŒãã«ãããè€éãªããã©ãŒãã³ã¹èª²é¡ã®è§£æ±ºãã客æ§ãšå
±ã«åãçµãã§ããŸãããšã³ãžã³å
éšãã³ã¹ãæé©åæŠç¥ããã¿ãã€ãèŠæš¡ã®åæãå¹ççã«å®è¡ããããã®ã¢ãŒããã¯ãã£ãã¿ãŒã³ã«æ³šåããŠããŸãã Neil Mukerje Neil ã¯ãAmazon Web Services ã®ããªã³ã·ãã«ãããã¯ããããŒãžã£ãŒã§ãã Amber Runnels Amber ã¯ãAmazon Web Services (AWS) ã®ã·ãã¢ã¢ããªãã£ã¯ã¹ã¹ãã·ã£ãªã¹ããœãªã¥ãŒã·ã§ã³ã¢ãŒããã¯ãã§ãããã°ããŒã¿ãšåæ£ã·ã¹ãã ãå°éãšããŠããŸããAWS ã®ããŒã¿ãµãŒãã¹ã§ã¯ãŒã¯ããŒããæé©åããã¹ã±ãŒã©ãã«ã§é«ããã©ãŒãã³ã¹ãã³ã¹ãå¹çã®é«ãã¢ãŒããã¯ãã£ã®å®çŸãã客æ§ã«æ¯æŽããŠããŸãã Parul Saxena Parul ã¯ãAmazon Web Services (AWS) ã®ã·ãã¢ããã°ããŒã¿ã¹ãã·ã£ãªã¹ããœãªã¥ãŒã·ã§ã³ã¢ãŒããã¯ãã§ããé«åºŠã«æé©åãããã¹ã±ãŒã©ãã«ã§ã»ãã¥ã¢ãªãœãªã¥ãŒã·ã§ã³ã®æ§ç¯ãã客æ§ãããŒãããŒã«æ¯æŽããŠããŸããAmazon EMRãAmazon AthenaãAWS Lake Formation ãå°éãšããè€éãªããã°ããŒã¿ã¯ãŒã¯ããŒãã®ã¢ãŒããã¯ãã£ã¬ã€ãã³ã¹ããçµç¹ã®ã¢ãŒããã¯ãã£ã¢ããã€ãŒãŒã·ã§ã³ãšåæã¯ãŒã¯ããŒãã® AWS ãžã®ç§»è¡ãæ¯æŽããŠããŸãã ãã®èšäºã¯ Kiro ã翻蚳ãæ
åœããSolutions Architect ã® Woosuk Choi ãã¬ãã¥ãŒããŸããã
2026幎4æ8æ¥ïŒæ°ŽïŒïœ4æ10æ¥ïŒéïŒã®3æ¥éãJapan DX Weekã«åºå±ããããŸãã çæAIã®æŽ»çšãé²ãäžãOSSã©ã€ã»ã³ã¹ãèäœæš©ãªã¹ã¯ã¯ãŸããŸãèŠãã«ãããªã£ãŠããŸãã ãµã€ãªã¹ãã¯ãããžãŒã®ããŒã¹ã§ã¯ãã³ãŒãè§£æã«ããOSSãå¯èŠåãããSCANOSSããã玹ä»ããŸãããœãŒã¹ã³ãŒãã¬ãã«ã§OSSã®å©çšç¶æ³ãææ¡ãããªã¹ã¯ã®æ©æçºèŠãšé©åãªç®¡çãæ¯æŽããŸãã ããµã€ãªã¹OSSãããçžè«å®€ãã§ã¯ãSCANOSSãšèŠªåæ§ã®é«ãSBOM管çããŒã«ã®å°å
¥ã»éçšããµããŒãããŸãã ããããŠãExcel AIãšãŒãžã§ã³ããRAGæ§ç¯ãAIé§åéçºãªã©ãAIã”çŸå Žã®å³æŠå”ã«ããåãçµã¿ãã玹ä»ããŸãã  å±ç€ºã®ã玹ä»ã¯ãã¡ã Â ç¡æã®ãç³èŸŒã¿ã¯ãã¡ã The post 4/8(æ°Ž)ïœ4/10(é) Japan DX Weekã«åºå±ããŸã first appeared on SIOS Tech Lab .






















