âããŒã¿åæåºç€ã°ã«ãŒãã§ããŒã¿ãšã³ãžãã¢ãããŠããå¹³å·ã§ãã DataVaultã«é¢ããèšäºã®ç¬¬2åç®ãšãªããŸãã(第1åã®èšäºã¯ ãã¡ã ã§ã) 第2åã®èšäºã¯ãDataVaultã¢ããªã³ã°ã®äžå¿ãšãªãHub/Link/Satelliteãdbtã®ããã±ãŒãžãå©çšããŠäœã£ãŠãããšããå
容ã§ãã 2,3åç®ã®å
容ãåœåãšå°ãå€ãã£ãŠããŸãã®ã§ãåæ²ããããŸãã 第1å: DataVaultã£ãŠãªã«ïŒã©ããªç¹åŸŽãããã®ïŒ 第2å: automate_dvã䜿ã£ãŠDataVaultã¢ããªã³ã°ã®äžå¿ãšãªãããŒãã«ãäœã£ãŠã¿ãŠããã£ãããš â ä»åã¯ãã 第3å: BusinessVaultãçºå±çãªSatelliteããŒãã«ãããŒãNullã ã£ãå Žåã®å¯ŸåŠæ¹æ³ã«ã€ããŠãªã© ååã®ãããã ã¯ããã« automate_dvã«ã€ã㊠automate_dvãšã¯ïŒ 䟿å©ãªç¹ã¯äœïŒ 泚æç¹ automate_dvã®äœ¿ãæ¹ ã€ã³ã¹ããŒã«æ¹æ³ Hub/Link/Satelliteã®å®è£
ããã·ã¥ããŒã®çæ Hubã®çæ Linkã®çæ Satelliteã®çæ automate_dvã䜿ãéã®å°æ automate_dvã䜿çšããŠããŠããã£ããã€ã³ã å±¥æŽåãããŠããããŒã¿ãœãŒã¹ãåã蟌ãå Žå åã蟌ã¿ããã¢ãã«ã®æ§é 察象ã®ã¢ãã«ã®ããŒã¿ã®å€å automate_dvã䜿ã£ãŠSatelliteããŒãã«ãäœã£ãéã®çµæ RawVaultå±€ãŸã§ã®å®è£
ã«automate_dvã䜿ã£ãææ³ãªã© ãŸãšããšæ¬¡åäºå åèè³æ ååã®ãããã ååã®èšäºã§ã¯ãããŒã¿ãŠã§ã¢ããŠã¹èšèšã«ããã1ã€ã®ã¢ãããŒãã§ããDataVaultã®ç¹åŸŽãã¡ãªããã«ã€ããŠèª¬æããŸãããDataVaultã¯ãæè»æ§ãæ¡åŒµæ§ã«åªãã倧éã®ããŒã¿ã广çã«ç®¡çããããšãã§ããŸãã ãŸããDataVaultãå®è£
ããããã«ã¯ãHub/Link/Satelliteãšããæ§é ãæã£ãããŒãã«ãçæããå¿
èŠããããŸãããããã®ããŒãã«ã®çæã«ã¯æéããããããšããããŸãããautomate_dvãšããããã±ãŒãžã䜿ãããšã§ããŒãã«çæãç°¡æã«ããããšãã§ããŸãã ä»åã®èšäºã§ã¯ããã®ããŒãã«ã®çæããµããŒãããautomate_dvã®ç޹ä»ãšå®éã®ããŒãã«çæãŸã§ã®æé ã解説ããŸãã ã¯ãã㫠第2匟ã®èšäºã§ã¯ãããŒã¿ãŠã§ã¢ããŠã¹ã®ããŒãã«æ§ç¯æ¹æ³ã«ã€ããŠãautomate_dv 1 ãå©çšããææ³ã«ã€ããŠç޹ä»ããŸãã ãŸãã¯ãååã®å
å®¹ãæ¯ãè¿ãã€ã€ãããŒã¿ãŠã§ã¢ããŠã¹ã®èšèšã«é¢ããçšèªãç°¡åã«èª¬æããŸãã çšèª æå³ ELT ããŒã¿ãŠã§ã¢ããŠã¹ã«ãããããŒã¿ã®åãèŸŒã¿æ¹æ³ã®1ã€ã§ãããŒã¿ãæœåºããŠãã倿ããæåŸã«ããŒãããææ³ã§ã dbt ããŒã¿ãŠã§ã¢ããŠã¹ã®æ§ç¯ã管çãããããã®ãªãŒãã³ãœãŒã¹ã®ããŒã«ã§ããSQL(äžéšJinja)ã§ããŒã¿ãã€ãã©ã€ã³ãèšè¿°ã§ããŸãã DataVault ããŒã¿ãŠã§ã¢ããŠã¹ã®èšèšãã¿ãŒã³ã®1ã€ã§ãæ¡åŒµæ§ãæè»æ§ã®é«ããªã©ãç¹åŸŽã§ãã Hub DataVaultã§æ§ç¯ããäžã§ã®äžå¿ãšãªãããŒãã«ã§ãããžãã¹ããŒãšãã®ããã·ã¥ããŒã§æ§æãããŸãã Link DataVaultã§ã®Hubå士ã®é¢ä¿æ§ã衚ãããŒãã«ã§ãé¢é£ããHubã®ããã·ã¥ããŒãšé¢é£ãã2ã€ã®Hubã®ããžãã¹ããŒãconcatããŠããã·ã¥åããããŒã§æ§æãããŸãã Satellite HubãLinkã®ããŒã«å¯ŸããŠä»éãã屿§æ
å ±ãä¿æããããŒãã«ã§ãã屿§æ
å ±ããŸãšããŠããã·ã¥åããHashdiffã«ã©ã ã«ãã£ãŠãå€åãæ€ç¥ããããšãã§ããŸãã æ¬¡ã®ã»ã¯ã·ã§ã³ã§ã¯ãautomate_dvã䜿çšããŠDataVaultã®ããŒãã«æ§ç¯ãè¡ãæ¹æ³ã«ã€ããŠãæé ã解説ããŸãã automate_dvã«ã€ã㊠automate_dvãšã¯ïŒ dbtã®ããã±ãŒãžã®1ã€ã§ãDataVault2.0ã¢ãã«ã«åºã¥ããããŒã¿ãŠã§ã¢ããŠã¹ã®æ§ç¯ããµããŒãããŠãããŸãããã®ããã±ãŒãžã®ãã¯ããå©çšããããšã§ãDataVaultã¢ããªã³ã°ã«é¢ããããŒãã«ã®SQLå®è£
ãç°¡åã«è¡ãããšãã§ããŸãã 䟿å©ãªç¹ã¯äœïŒ automate_dvãå©çšããããšã§ãHub/Link/SatelliteããŒãã«ãããã·ã¥åããããŒãå«ãããŒãã«ã®äœæã容æã«ããããšãã§ããŸãã ããã«ãããSQLã®èšè¿°éã倧å¹
ã«åæžããããŒãã«ã®å®è£
ã«ãããæéãççž®ããããšãã§ããŸãã æ³šæç¹ automate_dvã¯äŸ¿å©ãªããŒã«ã§ãããå©çšããŠãããã©ãããã©ãŒã ã«ãã£ãŠã¯äœ¿çšã§ããªããã¯ããããããæ³šæãå¿
èŠã§ãã äŸãã°ãäžéšã®ãã¯ãã¯RedshiftãPostgreSQLã§ã¯å©çšã§ããŸãããäžæ¹ã§ãSnowflake/BigQuery/MS SQL Serverã§ã¯ãçŸåšäžè¬æäŸãããŠãããã¯ãã䜿çšããããšãã§ããŸãã ãã©ãããã©ãŒã ã«ãã£ãŠå©çšã§ããæ©èœã«éããããããã䜿çšããåã«ããã¥ã¡ã³ããåç
§ããããšãããããããŸãã automate_dvã®äœ¿ãæ¹ ã€ã³ã¹ããŒã«æ¹æ³ automate_dvã䜿ãã«ã¯dbtãå®è¡ããç°å¢ã«ã€ã³ã¹ããŒã«ããå¿
èŠããããŸããdbtã®ãããžã§ã¯ãçŽäžã«ããã package.yml ã«ä»¥äžã®ããã«automate_dvããã±ãŒãžã®äŸåé¢ä¿ã远å ãã dpt deps ãå®è¡ããŠãã ããã packages : - package : Datavault-UK/automate_dv version : 0.9.5 Hub/Link/Satelliteã®å®è£
dbt run or dbt build ãå®è¡ããããšã§äœæããSQLããåçš®ãã¥ãŒãšããŒãã«ãäœæããŠãããŸãã Hub/Link/Satelliteãäœæããããã®åSQLã®æžãæ¹ã以äžã§èª¬æããŠãããŸãã ããã·ã¥ããŒã®çæ Hub/Link/Satelliteãå®è£
ããã«ã¯ããããããã·ã¥åãããããŒãšhashdiffãå¿
èŠã«ãªããŸãã ãã®ãããå®è£
ãã3ããŒãã«ãåç
§ãããã¥ãŒãå¿
èŠã«ãªãããããŸãã¯ããã·ã¥ããŒãå«ããã¥ãŒãautomate_dvã䜿ã£ãŠå®è£
ããŠãããŸãã å®è£
äŸãšããŠã orders , customers , products , customers_orders , orders_products ãšãã5ã€ã®ãœãŒã¹ããŒãã«ããããã·ã¥ããŒãå«ããã¥ãŒãäœæããŠãããŸãã hashed_ordersã®SQLã¯äžèšã®ããã«ãªããŸãã(hashed_orders以å€ã®SQLã«ã€ããŠã¯æãç³ãã§ãããŸãã®ã§è©³çްãç¥ãããæ¹ã¯å±éããŠããã ããã°ãšæããŸãã) {%- set yaml_metadata -%} -- â source_model: -- â¡ orders derived_columns: RECORD_SOURCE: " !ORDER_MANAGEMENT " hashed_columns: ORDER_HASH_DIFF: is_hashdiff: true columns: - ORDER_NAME - ORDER_AMOUNT - CREATED_AT ORDER_HK: - ORDER_ID {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.stage( include_source_columns= true , source_model=metadata_dict[ " source_model " ], derived_columns=metadata_dict[ " derived_columns " ], hashed_columns=metadata_dict[ " hashed_columns " ], null_columns=none, ranked_columns=none, ) }} -- ⢠â : jinjaã®èšè¿°æ¹æ³ã§yaml圢åŒã§åçš®ãã©ã¡ãŒã¿ãèšå®ãã â¡ : ããã·ã¥ããŒã®èšå®ãåç
§ããã¢ãã«ã®èšå®ãªã©ããã ⢠: automate_dvã®stageãã¯ãã䜿ãåçš®ãã©ã¡ãŒã¿ãåŒæ°ã«èšå®ãã ãã®ä»ã®ã¢ãã«ã®ã¯ãšãª hashed_products {%- set yaml_metadata -%} source_model: products derived_columns: RECORD_SOURCE: "!ORDER_MANAGEMENT" hashed_columns: HASH_DIFF: is_hashdiff: true columns: - PRODUCT_NAME - PRICE - CREATED_AT PRODUCT_HK: - PRODUCT_ID {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.stage( include_source_columns=true, source_model=metadata_dict["source_model"], derived_columns=metadata_dict["derived_columns"], hashed_columns=metadata_dict["hashed_columns"], null_columns=none, ranked_columns=none, ) }} hashed_customers {%- set yaml_metadata -%} source_model: customers derived_columns: RECORD_SOURCE: "!ORDER_MANAGEMENT" hashed_columns: HASH_DIFF: is_hashdiff: true columns: - EMAIL - PREFECTURE - CREATED_AT CUSTOMER_HK: - CUSTOMER_ID {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.stage( include_source_columns=true, source_model=metadata_dict["source_model"], derived_columns=metadata_dict["derived_columns"], hashed_columns=metadata_dict["hashed_columns"], null_columns=none, ranked_columns=none, ) }} hashed_customers_orders {%- set yaml_metadata -%} source_model: customers_orders derived_columns: RECORD_SOURCE: "!ORDER_MANAGEMENT" hashed_columns: HASH_DIFF: is_hashdiff: true columns: - CREATED_AT CUSTOMER_HK: - CUSTOMER_ID ORDER_HK: - ORDER_ID CUSTOMER_ORDER_HK: - CUSTOMER_ID - ORDER_ID {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.stage( include_source_columns=true, source_model=metadata_dict["source_model"], derived_columns=metadata_dict["derived_columns"], hashed_columns=metadata_dict["hashed_columns"], null_columns=none, ranked_columns=none, ) }} hashed_orders_products {%- set yaml_metadata -%} source_model: orders_products derived_columns: RECORD_SOURCE: "!ORDER_MANAGEMENT" hashed_columns: HASH_DIFF: is_hashdiff: true columns: - CREATED_AT ORDER_HK: - ORDER_ID PRODUCT_HK: - PRODUCT_ID ORDER_PRODUCT_HK: - ORDER_ID - PRODUCT_ID {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.stage( include_source_columns=true, source_model=metadata_dict["source_model"], derived_columns=metadata_dict["derived_columns"], hashed_columns=metadata_dict["hashed_columns"], null_columns=none, ranked_columns=none, ) }} ç¶ããŠãä»äœæããããã·ã¥ããŒå«ãã¢ãã«ãããHub/Link/SatelliteãäœæããŠãããŸãã Hubã®çæ ordersã®HubããŒãã«ã®äœæã¯ä»¥äžã®SQLã®ããã«ãªããŸãã(customersãšproductsã®HubããŒãã«ã¯æãããã¿å
ã«SQLãèšèŒããŠããŸã) {{ config(materialized= " incremental " ) }} {%- set yaml_metadata -%} source_model: hashed_orders src_pk: ORDER_HK src_nk: ORDER_ID src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.hub( src_pk=metadata_dict[ " src_pk " ], src_nk=metadata_dict[ " src_nk " ], src_ldts=metadata_dict[ " src_ldts " ], src_source=metadata_dict[ " src_source " ], source_model=metadata_dict[ " source_model " ], ) }} ãã®ä»ã®ã¢ãã«ã®ã¯ãšãª hub_products {{ config(materialized="incremental") }} {%- set yaml_metadata -%} source_model: hashed_products src_pk: PRODUCT_HK src_nk: PRODUCT_ID src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.hub( src_pk=metadata_dict["src_pk"], src_nk=metadata_dict["src_nk"], src_ldts=metadata_dict["src_ldts"], src_source=metadata_dict["src_source"], source_model=metadata_dict["source_model"], ) }} hub_customers {{ config(materialized="incremental") }} {%- set yaml_metadata -%} source_model: hashed_customers src_pk: CUSTOMER_HK src_nk: CUSTOMER_ID src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.hub( src_pk=metadata_dict["src_pk"], src_nk=metadata_dict["src_nk"], src_ldts=metadata_dict["src_ldts"], src_source=metadata_dict["src_source"], source_model=metadata_dict["source_model"], ) }} Linkã®çæ orderãšproductã®é¢ä¿æ§ãèšè¿°ããLinkããŒãã«ã®äœæã¯ä»¥äžã®SQLã®ããã«ãªããŸãã(link_customers_ordersã¯æãããã¿å
ã«SQLãèšèŒããŠããŸãã) {{ config(materialized= " incremental " ) }} {%- set yaml_metadata -%} source_model: hashed_orders_products src_pk: ORDER_PRODUCT_HK src_fk: - ORDER_HK - PRODUCT_HK src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv. link ( src_pk=metadata_dict[ " src_pk " ], src_fk=metadata_dict[ " src_fk " ], src_ldts=metadata_dict[ " src_ldts " ], src_source=metadata_dict[ " src_source " ], source_model=metadata_dict[ " source_model " ], ) }} ãã®ä»ã®ã¢ãã«ã®ã¯ãšãª link_customers_orders {{ config(materialized="incremental") }} {%- set yaml_metadata -%} source_model: hashed_customers_orders src_pk: CUSTOMER_ORDER_HK src_fk: - CUSTOMER_HK - ORDER_HK src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.link( src_pk=metadata_dict["src_pk"], src_fk=metadata_dict["src_fk"], src_ldts=metadata_dict["src_ldts"], src_source=metadata_dict["src_source"], source_model=metadata_dict["source_model"], ) }} Satelliteã®çæ orderã®å±æ§æ
å ±ãèšè¿°ããSatelliteããŒãã«ã®äœæã¯ä»¥äžã®SQLã®ããã«ãªããŸãã(sat_customersãšsat_productsã¯æãããã¿å
ã«SQLãèšèŒããŠããŸãã) {{ config(materialized= ' incremental ' ) }} {%- set yaml_metadata -%} source_model: hashed_orders src_pk: ORDER_HK src_hashdiff: HASH_DIFF src_payload: - ORDER_NAME - ORDER_AMOUNT - CREATED_AT src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.sat( src_pk=metadata_dict[ " src_pk " ], src_hashdiff=metadata_dict[ " src_hashdiff " ], src_payload=metadata_dict[ " src_payload " ], src_ldts=metadata_dict[ " src_ldts " ], src_source=metadata_dict[ " src_source " ], source_model=metadata_dict[ " source_model " ], ) }} ãã®ä»ã®ã¢ãã«ã®ã¯ãšãª sat_customers {{ config(materialized='incremental') }} {%- set yaml_metadata -%} source_model: hashed_customers src_pk: CUSTOMER_HK src_hashdiff: HASH_DIFF src_payload: - EMAIL - PREFECTURE - CREATED_AT src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.sat( src_pk=metadata_dict["src_pk"], src_hashdiff=metadata_dict["src_hashdiff"], src_payload=metadata_dict["src_payload"], src_ldts=metadata_dict["src_ldts"], src_source=metadata_dict["src_source"], source_model=metadata_dict["source_model"], ) }} sat_products {{ config(materialized='incremental') }} {%- set yaml_metadata -%} source_model: hashed_products src_pk: PRODUCT_HK src_hashdiff: HASH_DIFFF src_payload: - PRODUCT_NAME - PRICE - CREATED_AT src_ldts: LOADED_AT src_source: RECORD_SOURCE {%- endset -%} {% set metadata_dict = fromyaml(yaml_metadata) %} {{ automate_dv.sat( src_pk=metadata_dict["src_pk"], src_hashdiff=metadata_dict["src_hashdiff"], src_payload=metadata_dict["src_payload"], src_ldts=metadata_dict["src_ldts"], src_source=metadata_dict["src_source"], source_model=metadata_dict["source_model"], ) }} automate_dvã䜿ãéã®å°æ automate_dvã®ããã¥ã¡ã³ãã§ã¯ãstageãã¯ããå®è¡ããéã®source_modelã«ã¯ææ°åã ããå«ãã ã¢ãã«ã察象ã«ããã®ããã¹ããã©ã¯ãã£ã¹ãšãããŠããŸãã ã§ãããæ¢ããåŸãåç
§ããã¢ãã«ã«è€æ°æ¥ä»ãå«ãŸããŠããããšããããããããŸããããã®å Žåãææ°åã ããååšããå±€ãäœããšããããšãã§ããŸãããstageãã¯ãã䜿ãã€ã€ãææ°åã ãã®ãã¥ãŒãäœæããããšãå¯èœã§ãã å
·äœçãªæé ãšããŠã¯ã automate_dv.stage éšåãwithå¥ã«å
¥ããŠäžæããŒãã«ãšããŠå®çŸ©ããæ¥ä»ãçµãæ¹æ³ã§ãã以äžã®ããã«èšè¿°ããããšã§å®çŸã§ããŸã -- ãã©ã¡ãŒã¿ã®èšå®ã¯çç¥ with stage as ( {{ automate_dv.stage( include_source_columns= true , source_model=metadata_dict[ " source_model " ], derived_columns=metadata_dict[ " derived_columns " ], hashed_columns=metadata_dict[ " hashed_columns " ], null_columns=none, ranked_columns=none, ) }} ) select * from stage where loaded_at = ' YYYY-MM-DD ' automate_dvã䜿çšããŠããŠããã£ããã€ã³ã automate_dvã¯ãããŒã¿ãŠã§ã¢ããŠã¹ã®æ§ç¯ã«ãããŠé«ãããã©ãŒãã³ã¹ãçºæ®ããããŒã«ã§ãããæ¥æ¬èªã®æ
å ±ãå°ãªãããã£ãŠããŸã£ãæã«åå ãçè§£ããã®ã«æéãããããšæããŸããããã§ã¯ãå®éã«DataVaultå±€ã®SatelliteããŒãã«ãæ§ç¯ããéã«ããã£ããã€ã³ãã«ã€ããŠèª¬æããŠãããŸãã å±¥æŽåãããŠããããŒã¿ãœãŒã¹ãåã蟌ãå Žå SatelliteããŒãã«ã«åã蟌ã察象ãã忥ããã®å¢åã®ã¿ã§ããã°ãautomate_dvã®Satelliteãã¯ãã䜿çšããŠãåé¡ãããŸãããããããSatelliteãåç
§ããã¢ãã«ãéå»ããçŸåšãŸã§ã®å
šæéã®ã¬ã³ãŒããæã£ãŠããå Žåãäºæããªãçµæã«ãªãå¯èœæ§ããããŸããå
·äœçãªãµã³ãã«ãšåäœãèŠãŠãããŸãããã åã蟌ã¿ããã¢ãã«ã®æ§é ã«ã©ã å history_id customer_id amount created_at loaded_at 察象ã®ã¢ãã«ã®ããŒã¿ã®å€å 2023-04-02 history_id customer_id amount created_at loaded_at 1 1 1000 2023-04-01 12:00:00 2023-04-02 00:00:00 2 2 1500 2023-04-01 13:00:00 2023-04-02 00:00:00 2023-04-03 history_id customer_id amount created_at loaded_at 1 1 1000 2023-04-01 12:00:00 2023-04-03 00:00:00 2 2 1500 2023-04-01 13:00:00 2023-04-03 00:00:00 3 1 1000 2023-04-02 12:00:00 2023-04-03 00:00:00 4 1 3000 2023-04-02 12:30:00 2023-04-03 00:00:00 2023-04-04 history_id customer_id amount created_at loaded_at 1 1 1000 2023-04-01 12:00:00 2023-04-04 00:00:00 2 2 1500 2023-04-01 13:00:00 2023-04-04 00:00:00 3 1 1000 2023-04-02 12:00:00 2023-04-04 00:00:00 4 1 3000 2023-04-02 12:30:00 2023-04-04 00:00:00 5 3 5000 2023-04-03 15:00:00 2023-04-04 00:00:00 6 1 1000 2023-04-03 16:00:00 2023-04-04 00:00:00 automate_dvã䜿ã£ãŠSatelliteããŒãã«ãäœã£ãéã®çµæ SatelliteããŒãã«ã®Selectçµæ äžèšã®SatelliteããŒãã«ã®èµ€ãæ ã§å²ãŸãã1è¡ç®ãš5è¡ç®ã¯åäžã®ã¬ã³ãŒãã«ãªã£ãŠããŸããå
šæéã®å±¥æŽãä¿æãããããªã¢ãã«ã«å¯ŸããŠSatelliteãã¯ãã䜿ããšãã®ãããªçŸè±¡ãèµ·ããŠããŸããŸãã ãã®ãããªç¶æ
ã«ãªã£ãŠããŸãåå ã¯ãautomate_dvã®ãã¯ãã䜿çšããããšã§çæãããSQLãèŠãããšã§ããããŸãã 以äžçæãããSQLãæç² WITH source_data AS ( SELECT a.CUSTOMER_HK, a.HASH_DIFF, a.AMOUNT, a.CREATED_AT, a.LOADED_AT, a.RECORD_SOURCE FROM hs_order_histories AS a WHERE a.CUSTOMER_HK IS NOT NULL ), latest_records AS ( SELECT a.CUSTOMER_HK, a.HASH_DIFF, a.LOADED_AT FROM ( SELECT current_records.CUSTOMER_HK, current_records.HASH_DIFF, current_records.LOADED_AT, RANK () OVER ( PARTITION BY current_records.CUSTOMER_HK ORDER BY current_records.LOADED_AT DESC ) AS rank FROM sat_order_histories AS current_records JOIN ( SELECT DISTINCT source_data.CUSTOMER_HK FROM source_data ) AS source_records ON current_records.CUSTOMER_HK = source_records.CUSTOMER_HK ) AS a WHERE a. rank = 1 ), records_to_insert AS ( SELECT DISTINCT stage.CUSTOMER_HK, stage.HASH_DIFF, stage.AMOUNT, stage.CREATED_AT, stage.LOADED_AT, stage.RECORD_SOURCE FROM source_data AS stage LEFT JOIN latest_records ON latest_records.CUSTOMER_HK = stage.CUSTOMER_HK AND latest_records.HASH_DIFF = stage.HASH_DIFF WHERE latest_records.HASH_DIFF IS NULL ) SELECT * FROM records_to_insert ãã®SQLã§ã¯ããœãŒã¹ã¢ãã«(ããã·ã¥ããŒãå«ãã¢ãã«)ãšSatelliteããŒãã«ããæ¡ä»¶ãä»äžããŠæœåºããããŒãã«ãçµåããŠãæ°ãã«è¿œå ããã¬ã³ãŒããäœæããŠããŸãã withå¥ã®2çªç®ã«å®çŸ©ãããŠããææ°ã¬ã³ãŒã(latest_records)ãèŠããšRankã䜿çšããŠææ°ã®ã¬ã³ãŒããæœåºããŠããŸãã SatelliteããŒãã«ã«è¿œå ãããã¬ã³ãŒãã¯ããœãŒã¹ããŒã¿ãšææ°ã¬ã³ãŒããçµåãããœãŒã¹ããŒã¿ã«ã ãååšããã¬ã³ãŒããæœåºããŠããŸãã (whereå¥ã§ã® latest_records.HASH_DIFF IS NULL ã§ãœãŒã¹ããŒã¿ã«ã ãååšããã¬ã³ãŒããåãããšããŠããŸãã) ä»åäŸãšããŠåãäžãããããªãåãããã·ã¥ããŒã«å¯ŸããŠãç°ãªãHASH_DIFFã远å ããããããªããŒãã«ã®å ŽåãSatelliteãã¯ãã¯äºæããåäœãããªãããšããããŸãã 察å¿çãšããŠã¯ãæåã«è¿°ã¹ãããã«ãSatelliteããŒãã«ãåç
§ããã¢ãã«ã«ã¯ææ°åããå«ãŸãªãããã«ããããšã§ããäžéå±€ã«æãå ããããªãå Žåã¯ãSQLã§SatelliteããŒãã«ã«ååšããªãã¬ã³ãŒãã®ã¿ãæœåºãããããªSQLãæžããŸãã select distinct customer_hk, hash_diff, amount, created_at, loaded_at, record_source from hs_order_history as stg where customer_hk is not null {% if is_incremental() %} and not exists ( select 1 from ( select customer_hk, hash_diff from {{ this }} as cur where stg.customer_hk = cur.customer_hk and stg.hash_diff = cur.hash_diff ) ) {% endif %} RawVaultå±€ãŸã§ã®å®è£
ã«automate_dvã䜿ã£ãææ³ãªã© 1ã€ã®ããžãã¹ããŒã«å¯ŸããŠããã°ãæºãç¶ãããããªããŒã¿ãœãŒã¹ãããå ŽåãSatelliteãã¯ãã䜿çšããéã«çæãããSQLãèŠããšãäœèšãªã¬ã³ãŒããçæãããŠããŸãããšããããŸãããã®ãããããŒã¿ãœãŒã¹ã«ãã£ãŠã¯ãã¯ãã®äœ¿çšã®æç¡ãåç
§ã¢ãã«ã®ä¿®æ£ãªã©ã䜿ãåããæãå ããå¿
èŠããããå°ãé¢åã«æããŸããã ãã¯ãã䜿çšããããšã§ãããŒã®ããã·ã¥åãªã©ãSQLã§èšè¿°ããå Žåã«ã¯å€ãã®ã³ãŒãéãå¿
èŠãªéšåããã¯ãã§çœ®ãæããããã®ã¯ããã䟿å©ã«æããŸããã äºæããªãåäœãçºçããå Žåãã³ã³ãã€ã«ãããSQLãèŠãããšã§åå ãçè§£ããããšã¯ã§ããŸãããautomate_dvã®å®è£
ãèŠããšãã¯ãå
ã§ãã¯ããåŒã³åºããŠããããautomate_dvã§ã®ä»çµã¿ãçè§£ããã®ã«ã¯æéããããå°è±¡ã§ããã ããŒã¿ãœãŒã¹åŽã§ç©çåé€ãããŠããå ŽåãSatelliteã§ã¯ã¬ã³ãŒããåé€ããããã©ããã倿ã§ããªãã®ã§ãStatusTrackingSatelliteãæ¬²ãããªãå Žé¢ããããŸããããçŸåš(2023幎6ææç¹)ã¯automate_dvã«ãã¯ãã¯ãŸã å®è£
ãããŠããªãã®ã§SQLãçŽæ¥æžãå¿
èŠããããŸããã(EffectiveSatelliteã®ãã¯ãã¯ããã®ã§ãããããããã¡ãã§å¯Ÿå¿ã§ãããããããŸãã) ãŸãšããšæ¬¡åäºå ä»åã®èšäºã§ã¯DataVaultã¢ããªã³ã°ã®äžå¿ãšãªãHub/Link/Satelliteãçæããããã®dbtã®ããã±ãŒãžã®1ã€ã§ããautomate_dvã«ã€ããŠç޹ä»ãããŠããã ããŸããã automate_dvã¯SQLã ãã§ãHub/Link/Satelliteã®çæã¯ãšãªãå®è£
ããã®ã«æ¯ã¹ãŠãéåžžã«å°ãªãèšè¿°éã§åçš®ããŒãã«ã®å®è£
ãè¡ãããšãã§ããŸãã åœããåã§ããããã¯ãã«ãã£ãŠçæãããSQLãã©ããªããšãããŠããã®ããçè§£ããã®ãã䜿ãããªãäžã§éèŠã«ãªã£ãŠãããšæããŸãã æ¬¡åã¯ãBusinessVaultãä»åã®èšäºã§ååã ãåºãŠããŠããEffectiveSatelliteãStatusTrackingSatelliteã®ç޹ä»ããããžãã¹ããŒãNullã ã£ãå Žåã®å¯Ÿå¿æ¹æ³ãªã©ã«ã€ããŠç޹ä»ãããŠããã ããã°ãšæããŸãã åèè³æ automate_dvããã¥ã¡ã³ãããŒãž èšäºãæžãå§ããæã¯dbtvaultã§ããããæ°ã¥ãããautomate_dvã«å€ãã£ãŠãŸãã... ↩