ããã«ã¡ã¯ãã€ãããŒã·ã§ã³ã»ã³ã¿ãŒã®éŽã¶å¶ºã§ãã æ¬èšäºã§ã¯ã NVIDIA Dynamo ã vLLM ãªã©ã® LLM æšè«ãã¬ãŒã ã¯ãŒã¯åãã«èšèšãããé«éã»äœé
å»¶ã®æœè±¡å転éã©ã€ãã©ãªã§ãã NVIDIA Inference Xfer Library (NIXL) ã«ã€ããŠè§£èª¬ããŸãã ãŸããNVIDIA Dynamo ã«é¢ããŠã¯ãã¡ãã§è§£èª¬ããŠããŸãã®ã§åèã«ããŠããã ãããšå¹žãã§ãã engineers.ntt.com ãŸããLLM æšè«é«éå(KV Cache)ã«ãããã¡ã¢ãªè»¢éã®èæ¯ãšèª²é¡ãã玹ä»ããããã解決ãã NIXL ã®æŠèŠã説æããŸãã NIXL 㯠Plugin ã«ããä»»æã®è»¢éæ¹åŒãå®è£
å¯èœãªã¢ãŒããã¯ãã£ãšãªã£ãŠããŸããå®éã« Custom Plugin ãå®è£
ããæ¹æ³ã«ã€ããŠã玹ä»ããŸãã èæ¯ãšèª²é¡ NVIDIA Inference Xfer Library (NIXL) GPUDirect RDMA ã«ãã VRAM to VRAM ã®ããŒã¿è»¢é GPUDirect Storage ã«ãã VRAM to FILE ã®ããŒã¿è»¢é Custom Plugin ã®å®è£
æ¹æ³ã«ã€ã㊠ãŸãšã èæ¯ãšèª²é¡ LLM ã®æšè«é«éåã¯ãã³ã¹ãåæžãäœé
å»¶ãªå¿çã«ãããŠãŒã¶ããªãã£åäžãšãã£ãããŒãºãããããŸããŸãªæ¹è¯ãé²ããããŠããŸããäžã§ããKV Cacheãã¯ãéå»ããŒã¯ã³ã«å¯Ÿããèšç®æžã¿ã®ããŒã»ããªã¥ãŒè¡åãä¿æããæ¬¡ã®ããŒã¯ã³çææã«åèšç®ãçç¥ããããšã§ãæšè«é床ã倧å¹
ã«åäžãããéèŠãªæè¡ã§ãã 1 äžæ¹ã§ãKV Cache ãä¿æããç¶æ
ã¯ã·ãŒã±ã³ã¹é·ã«æ¯äŸããŠå¢å ãããããã¡ã¢ãªæ¶è²»ãå¢å€§ããŸãããŸãããã®ãã£ãã·ã¥ãè€æ°ã® GPU ãããŒãéã§å
±æããéã«ã¯ãäœé
å»¶ã§è»¢éã§ããä»çµã¿ãæ±ããããŸããããã«ããã£ãã·ã¥ã®è»¢éã«äœ¿çšããã¡ã¢ãªãã¹ãã¬ãŒãžã®çš®é¡ã«ãã£ãŠãæé©ãªãããã³ã«ïŒ NVLink ã GPUDirect Storage/RDMA ãªã©ïŒãç°ãªããå®è£
ã®è€éãã課é¡ãšãªããŸãã ãããã®èª²é¡ã解決ãããããå€åœ©ãªã¡ã¢ãªã»ã¹ãã¬ãŒãžãéä¿¡ãããã³ã«ãæœè±¡åããé«éã»äœé
å»¶ãªè»¢éãå¯èœãšããã©ã€ãã©ãªãæ±ããããŠããŸãã NVIDIA Inference Xfer Library (NIXL) NVIDIA Inference Xfer LibraryïŒNIXLïŒ ã¯ãLLM æšè«ãã¬ãŒã ã¯ãŒã¯åãã«èšèšãããé«éã»äœé
å»¶ã®è»¢éã©ã€ãã©ãªã§ããç¹åŸŽãšããŠãVRAM ã DRAM, FILE, Block, Object Storage ãªã©ç°çš®ã®ã¡ã¢ãªã»ã¹ãã¬ãŒãžãçµ±äžçã«æœè±¡åãã API ãæäŸããUCXïŒUnified Communication XïŒ ã GPUDirect Storage ãšãã£ãè€æ°ã®ããã¯ãšã³ããã©ã°ã€ã³ãåçã«éžæããŠæé©ãªéä¿¡çµè·¯ãèªåçã«æ§ç¯ããŸãïŒäžå³åç
§ïŒãéä¿¡æ¹åŒã¯ Plugin 圢åŒã§ä»»æã«æ¡åŒµå¯èœãªãããç¬èªã®ãã£ãã·ã¥ã·ã¹ãã ãæ§ç¯å¯èœã§ãã åŒçš: https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md#overview 2025 幎 5 ææç¹ã®å¯Ÿå¿ Plugin ã¯ä»¥äžã«ãªããŸãã cuda_gds DMA(Direct Memory Access)ã«ãããGPU Memory ãš Storage éãé«é転éãã GPUDirect Storage(GDS)ã䜿çšããããã¯ãšã³ã NVMe SSD, NVMe-oF, NFS over RDMA, 忣ãã¡ã€ã«ã·ã¹ãã (DDN EXAScaler, VAST NFS, WekaFS)ãªã©ã§å©çšå¯èœ mooncake LLM Serving platform ã® Kimi ã§å©çšããã KV Cache System ã® Mooncake ã䜿çšããããã¯ãšã³ã posix libaio ã liburing ã䜿çšãã POSIX æºæ ã® I/O åŠçãããããã¯ãšã³ã ucx é«åž¯åã»äœé
å»¶éä¿¡ã®æœè±¡åã©ã€ãã©ãªã§ãã UCX ã䜿çšããããã¯ãšã³ã ããã©ã«ãã§ã¯ãã® Plugin ãèšå®ãããŸã ucx_mo UCX v1.18 ã§ã¯ 1 ã€ã® UCX Context ã§è€æ° GPU ããµããŒãããŠããªãããã Multi-Object (MO) UCX 㯠GPU ããšã«ç°ãªã UCX Worker ãé¢é£ã¥ããå®è£
ã«æ¹è¯ããŠãã 2 NIXL ã¯ããŸããŸãª LLM æšè«ãã¬ãŒã ã¯ãŒã¯ãžã®æ¡çšãé²ãã§ããŸããæ¬¡ã®å³ã®ããã« vLLM ã®ãµããŒãã 2025 幎 4 æ 11 æ¥ã«ã¢ããŠã³ã¹ãããŸããã Shaping NIXL-based PD Disaggregation in vLLM V1 ãŸãã SGLang ã§ã以äžã®ããã« NIXL 察å¿ã® PR ãããŒãžãããŸããã [PD] Add NIXL transfer backend #5477 NIXL ã®éä¿¡éçšã¯æ¬¡ã®ããã«ãªããŸãã åããŒãã®ãšãŒãžã§ã³ãåæå VRAM, DRAM ã®ç»é² 転éã®ããã®ã¡ã¿ããŒã¿ã®äº€æ ããŒã¿è»¢é åŒçš: https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md#example-procedure 次ã®ç« ã§å®éã« NIXL ãå®è¡ããŸãã GPUDirect RDMA ã«ãã VRAM to VRAM ã®ããŒã¿è»¢é ããã§ã¯ NIXL ã® example ãåäœãããŠããŒãéã§ GPU ã¡ã¢ãªã UCX ã«ãã GPUDirect RDMA ãçšããŠè»¢éããŸããNIXL 㯠prebuild ã®ãã®ã pip install nixl ã§ã€ã³ã¹ããŒã«ã§ããŸãããä»åã¯çè§£ãã«ã¹ã¿ãã€ãºã®ããã«èªåã§ build ãããã®ã䜿çšããŸãã äºåã« cuda ã gdrcopy ã«ã€ããŠã¯ã€ã³ã¹ããŒã«ããŠãã ããã ãŸãåãã«ãUCX ãã€ã³ã¹ããŒã«ããŸãã wget https://github.com/openucx/ucx/releases/download/v1. 18 . 0 /ucx-1. 18 . 0 .tar.gz tar xzf ucx-1. 18 . 0 .tar.gz cd ucx-1. 18 . 0 ./configure \ --enable-shared \ --disable-static \ --disable-doxygen-doc \ --enable-optimizations \ --enable-cma \ --enable-devel-headers \ --with-cuda = /usr/local/cuda \ --with-verbs \ --with-dm \ --with-gdrcopy = /usr/ local \ --enable-mt \ --prefix = /opt/ucx-1. 18 . 0 make -j sudo make install # add ~/.bashrc export PATH =/opt/ucx-1. 18 . 0 /bin: $PATH export LD_LIBRARY_PATH =/opt/ucx-1. 18 . 0 /lib: $LD_LIBRARY_PATH export PKG_CONFIG_PATH =/opt/ucx-1. 18 . 0 /lib/pkgconfig: $PKG_CONFIG_PATH 次㫠NIXL ãã€ã³ã¹ããŒã«ããŠãexample ã§ãã blocking_send_recv_example.py ãå®è¡ããŸãã target ãã torch.ones(10, dtype=torch.float32) ã®ã¡ã¢ãªã initiator ã«è»¢éããŠããŸãã git clone https://github.com/ai-dynamo/nixl.git cd nixl git checkout 503fe5ccb86b5963b828ee5663672fcba66b92d2 python3 -m venv venv source venv/bin/activate pip install . cd examples/python # UCXã«ããããRDMAã®NICãéä¿¡æ¹æ³ãèšçœ® export UCX_NET_DEVICES = [ RNIC Device ] export UCX_TLS =rc,cuda # target node ./blocking_send_recv_example.py --ip 0 . 0 . 0 . 0 --mode target --use_cuda 1 ## MD listener is listening on port 8888... ## Backend UCX was instantiated ## Initialized NIXL agent: initiator ## initiator Tensors: [tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0'), tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')] ## Initiator sending to [Node A IP Address] ## Ready for transfer ## initiator Data verification passed - [tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0'), tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0')] ## Test Complete. # initiator node ./blocking_send_recv_example.py --ip [ Node A IP Address ] --mode initiator --use_cuda 1 ## MD listener is listening on port 5555... ## Backend UCX was instantiated ## Initialized NIXL agent: target ## target Tensors: [tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0'), tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0')] ## Waiting for transfer ## Test Complete. å®è¡ãããšãäžèšã®ããã«ããŒãéã§ GPU ã¡ã¢ãªãå
±æãããããšã確èªã§ããŸãã GPUDirect Storage ã«ãã VRAM to FILE ã®ããŒã¿è»¢é ããã§ã¯ GPUDirect Storage(GDS)ãçšã㊠GPU ã¡ã¢ãªãçŽæ¥ã¹ãã¬ãŒãžã«è»¢éãããµã³ãã«ãäœæããŠå®è¡ããŸãã DRAM ãã GDS ã§ã¹ãã¬ãŒãžã«æžã蟌ã example ã§ãã nixl_gds_example.py ãåèã« VRAM ãã GDS ã§ã¹ãã¬ãŒãžã«æžãèŸŒãæ¬¡ã®ã¹ã¯ãªãããäœæããŸãã nixl_gds_example_vram_to_file.py import os import sys import torch import subprocess import nixl._utils as nixl_utils from nixl._api import nixl_agent, nixl_agent_config if __name__ == "__main__" : agent_config = nixl_agent_config(backends=[ "GDS" ]) nixl_agent1 = nixl_agent( "GDSTester" , agent_config) # init VRAM float32 1.0(0x0000803f) tensors = [torch.ones( 10 , dtype=torch.float32, device= 'cuda:0' )] agent1_vram_descs = nixl_agent1.register_memory(tensors) agent1_xfer_vram = agent1_vram_descs.trim() file_path = sys.argv[ 1 ] agent1_fd = os.open(file_path, os.O_RDWR | os.O_CREAT) assert agent1_fd >= 0 agent1_file_list = [( 0 , tensors[ 0 ].numel() * tensors[ 0 ].element_size(), agent1_fd, "b" )] agent1_file_descs = nixl_agent1.register_memory(agent1_file_list, "FILE" ) assert agent1_file_descs is not None agent1_xfer_files = agent1_file_descs.trim() xfer_handle_1 = nixl_agent1.initialize_xfer( "WRITE" , agent1_xfer_vram, agent1_xfer_files, "GDSTester" ) if not xfer_handle_1: print ( "Creating transfer failed." ) exit() state = nixl_agent1.transfer(xfer_handle_1) assert state != "ERR" done = False while not done: state = nixl_agent1.check_xfer_state(xfer_handle_1) if state == "ERR" : print ( "Transfer got to Error state." ) exit() elif state == "DONE" : done = True print ( "Initiator done" ) nixl_agent1.release_xfer_handle(xfer_handle_1) nixl_agent1.deregister_memory(agent1_vram_descs) nixl_agent1.deregister_memory(agent1_file_descs) os.close(agent1_fd) # check file binary p = subprocess.run([ "hexdump" , "-Cv" , file_path], stdout=subprocess.PIPE) print ( "$ hexdump -Cv" , file_path) print (p.stdout.decode()) å®è¡ããæ§åã以äžã®ããã«ãªããŸãã ãã¡ã€ã«å
容ãèŠããš float32 1.0 ã® 0x0000803f ãæžã蟌ãŸããŠããããšãåãããŸãã > python nixl_gds_example_vram_to_file.py /path/to/gds-support-dir/ones.bin Backend GDS was instantiated Initialized NIXL agent: GDSTester Initiator done $ hexdump -Cv /path/to/gds-support-dir/ones.bin 00000000 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f |...?...?...?...?| 00000010 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f |...?...?...?...?| 00000020 00 00 80 3f 00 00 80 3f |...?...?| 00000028 Custom Plugin ã®å®è£
æ¹æ³ã«ã€ã㊠NIXL 㯠Plugin ã«ããä»»æã®è»¢éæ¹åŒãå®è£
å¯èœãªã¢ãŒããã¯ãã£ãšãªã£ãŠããŸãã ããã§ã¯ããŒã«ã«ã§ DRAM ãš FILE ã«ãã転éãå¯èœãªãµã³ãã«ã® Plugin ãå®è£
ããæ¹æ³ã«ã€ããŠç޹ä»ããŸãã åºæ¬çã«ã¯ã nixlBackendEngine ãç¶æ¿ããŠãæäœã§ãçŽç²ä»®æ³é¢æ°(äŸ: virtual void f() = 0; ) 3 ãå®è£
ããããšã§æ°ããªè»¢éæ¹æ³ã远å ã§ããŸãã ãããžã§ã¯ãæ§æä»¥äžã®ããã« nixl ã® plugins é
äžã«æ°ãã« local ãã£ã¬ã¯ããªãäœæããŠå®è£
ã³ãŒãã眮ããŸãã nixl âââ src â  âââ plugins â  â  âââ local â  â  â  âââ local_backend.cpp â  â  â  âââ local_backend.h â  â  â  âââ local_plugin.cpp â  â  â  âââ meson.build â  â  âââ meson.build æ°ãã« Plugin ã远å ããããã nixl/src/plugins/meson.build ã«ä»¥äžã远èšããŸãã subdir('local') 次ã®ããã«è¿œå ãã Plugins ãç»é²ããå®è¡ã远å ããŸãã Plugin ã®ååã远å ã®ãªãã·ã§ã³ãã©ã¡ãŒã¿ããµããŒãå¯èœãªã¡ã¢ãªãŒã®çš®é¡ãèšå®ããŸãã local_plugin.cpp #include "backend/backend_plugin.h" #include "local_backend.h" static const char * PLUGIN_NAME = "LOCAL" ; static const char * PLUGIN_VERSION = "0.1" ; static nixlBackendEngine* create_local_engine ( const nixlBackendInitParams* init_params) { return new nixlLocalEngine (init_params); } static void destroy_local_engine (nixlBackendEngine* engine) { delete engine; } static const char * get_plugin_name () { return PLUGIN_NAME; } static const char * get_plugin_version () { return PLUGIN_VERSION; } static nixl_b_params_t get_backend_options () { nixl_b_params_t params; return params; } // ãµããŒãå¯èœãªã¡ã¢ãªãŒã®çš®é¡ static nixl_mem_list_t get_backend_mems () { nixl_mem_list_t mems; mems. push_back (DRAM_SEG); mems. push_back (FILE_SEG); return mems; } static nixlBackendPlugin plugin = {NIXL_PLUGIN_API_VERSION, create_local_engine, destroy_local_engine, get_plugin_name, get_plugin_version, get_backend_options, get_backend_mems}; #ifdef STATIC_PLUGIN_LOCAL nixlBackendPlugin* createStaticLocalPlugin () { return &plugin; } #else extern "C" NIXL_PLUGIN_EXPORT nixlBackendPlugin* nixl_plugin_init () { return &plugin; } extern "C" NIXL_PLUGIN_EXPORT void nixl_plugin_fini () {} #endif 以éã§å®éã«è»¢éåŠçãè¡ãããžãã¯ãå®è£
ããŸãã postXfer 颿°ã§ã¡ã¢ãªã¢ãã¬ã¹ããã¡ã€ã«ãã¹ã¯ãªãã¿ãæž¡ãããããããããçšããŠè»¢éããŸãã äŸãã° UCX ã®å®è£
ãåèã«ãããšå®è»¢é㯠postXfer 颿°ã§å®è¡ãããŠããŸãã 4 local_backend.h #ifndef __LOCAL_BACKEND_H #define __LOCAL_BACKEND_H #include <nixl.h> #include <nixl_types.h> #include <unistd.h> #include "backend/backend_engine.h" class nixlLocalEngine : public nixlBackendEngine { private : public : nixlLocalEngine ( const nixlBackendInitParams *init_params); ~ nixlLocalEngine (); // éç¥æ©èœããµããŒããããã©ãã bool supportsNotif () const { return false ; } // å¥ããã»ã¹ããªã¢ãŒãããŒããžã®è»¢éããµããŒããããã©ãã bool supportsRemote () const { return false ; } // åäžã®ããã»ã¹ãããŒããžã®è»¢éããµããŒããããã©ãã bool supportsLocal () const { return true ; } // å
éšã«åŠçã®ããã¯ã°ã©ã³ãã¹ã¬ãããæã€ãã©ãã bool supportsProgTh () const { return false ; } // ãµããŒãå¯èœãªã¡ã¢ãªãŒã®çš®é¡ nixl_mem_list_t getSupportedMems () const { nixl_mem_list_t mems; mems. push_back (DRAM_SEG); mems. push_back (FILE_SEG); return mems; } nixl_status_t connect ( const std :: string &remote_agent) { return NIXL_SUCCESS; } nixl_status_t disconnect ( const std :: string &remote_agent) { return NIXL_SUCCESS; } nixl_status_t loadLocalMD (nixlBackendMD *input, nixlBackendMD *&output) { output = input; return NIXL_SUCCESS; } nixl_status_t unloadMD (nixlBackendMD *input) { return NIXL_SUCCESS; } nixl_status_t registerMem ( const nixlBlobDesc &mem, const nixl_mem_t &nixl_mem, nixlBackendMD *&out); nixl_status_t deregisterMem (nixlBackendMD *meta); nixl_status_t prepXfer ( const nixl_xfer_op_t &operation, const nixl_meta_dlist_t &local, const nixl_meta_dlist_t &remote, const std :: string &remote_agent, nixlBackendReqH *&handle, const nixl_opt_b_args_t *opt_args = nullptr ); nixl_status_t postXfer ( const nixl_xfer_op_t &operation, const nixl_meta_dlist_t &local, const nixl_meta_dlist_t &remote, const std :: string &remote_agent, nixlBackendReqH *&handle, const nixl_opt_b_args_t *opt_args = nullptr ); nixl_status_t checkXfer (nixlBackendReqH *handle); nixl_status_t releaseReqH (nixlBackendReqH *handle); }; #endif local_backend.cpp #include "local_backend.h" #include <string.h> #include <sys/stat.h> #include <iostream> nixlLocalEngine:: nixlLocalEngine ( const nixlBackendInitParams *init_params) : nixlBackendEngine (init_params) {} nixl_status_t nixlLocalEngine:: registerMem ( const nixlBlobDesc &mem, const nixl_mem_t &nixl_mem, nixlBackendMD *&out) { // åºæ¬çã«ããŒã«ã«ã§DRAM, FILEã®è»¢éãããéã«ã¯ããã§åŠçã¯ããªã圢ã§èšèš // äŸãã°RDMA(ib verbs)ã§ã¯ibv_reg_mrãGDSã§ã¯ãã¡ã€ã«ãã³ãã©ç»é²ããããããšãæãŸãããšæããã // å¥éããã§ãã¡ã€ã«ãã³ãã©ãªã©ã®ã¡ã¿ããŒã¿ãèšå®ããprepXferãpostXferãå©çšããããã«ã¯nixlBackendMDãç¶æ¿ããã¯ã©ã¹ãåŒæ°ã®outã«èšå®ãã // åè: https://github.com/ai-dynamo/nixl/blob/1c979f0999740e4b221d0b9b470efbac793ddcae/src/plugins/cuda_gds/gds_backend.cpp#L95 if (nixl_mem == FILE_SEG || nixl_mem == DRAM_SEG) return NIXL_SUCCESS; return NIXL_ERR_NOT_SUPPORTED; } nixl_status_t nixlLocalEngine:: deregisterMem (nixlBackendMD *meta) { return NIXL_SUCCESS; } nixl_status_t nixlLocalEngine:: prepXfer ( const nixl_xfer_op_t &operation, const nixl_meta_dlist_t &local, const nixl_meta_dlist_t &remote, const std :: string &remote_agent, nixlBackendReqH *&handle, const nixl_opt_b_args_t *opt_args) { // validation if ((local. descCount () != remote. descCount ()) || ((operation != NIXL_READ) && (operation != NIXL_WRITE))) { return NIXL_ERR_INVALID_PARAM; } return NIXL_SUCCESS; } nixl_status_t nixlLocalEngine:: postXfer ( const nixl_xfer_op_t &operation, const nixl_meta_dlist_t &local, const nixl_meta_dlist_t &remote, const std :: string &remote_agent, nixlBackendReqH *&handle, const nixl_opt_b_args_t *opt_args) { // DRAM, FILEã®è»¢éåŠçãå®è¡ãã if (local. getType () == DRAM_SEG && remote. getType () == DRAM_SEG) { // dram to dram int cnt = local. descCount (); for ( int i = 0 ; i < cnt; i++) { void *dst_addr; void *src_addr; if (operation == NIXL_READ) { dst_addr = ( void *)local[i].addr; src_addr = ( void *)remote[i].addr; } else if (operation == NIXL_WRITE) { dst_addr = ( void *)remote[i].addr; src_addr = ( void *)local[i].addr; } memcpy (dst_addr, src_addr, local[i].len); } } else if (local. getType () == FILE_SEG && remote. getType () == FILE_SEG) { // file to file int cnt = local. descCount (); for ( int i = 0 ; i < cnt; i++) { int in_fd; int out_fd; if (operation == NIXL_READ) { in_fd = remote[i].devId; out_fd = local[i].devId; } else if (operation == NIXL_WRITE) { in_fd = local[i].devId; out_fd = remote[i].devId; } struct stat st; if ( fstat (in_fd, &st) < 0 ) { return NIXL_ERR_INVALID_PARAM; } if ( copy_file_range (in_fd, 0 , out_fd, 0 , st.st_size, 0 ) < 0 ) { return NIXL_ERR_INVALID_PARAM; } } } else if ((local. getType () == FILE_SEG && remote. getType () == DRAM_SEG && operation == NIXL_WRITE) or (local. getType () == DRAM_SEG && remote. getType () == FILE_SEG && operation == NIXL_READ)) { // file to dram int cnt = local. descCount (); for ( int i = 0 ; i < cnt; i++) { int fd = local. getType () == FILE_SEG ? local[i].devId : remote[i].devId; uintptr_t buf = local. getType () == FILE_SEG ? remote[i].addr : local[i].addr; size_t len = operation == NIXL_WRITE ? local[i].len : remote[i].len; if ( pread (fd, ( void *)buf, len, 0 ) < 0 ) { return NIXL_ERR_INVALID_PARAM; } } } else if ((local. getType () == DRAM_SEG && remote. getType () == FILE_SEG && operation == NIXL_WRITE) or (local. getType () == FILE_SEG && remote. getType () == DRAM_SEG && operation == NIXL_READ)) { // dram to file int cnt = local. descCount (); for ( int i = 0 ; i < cnt; i++) { int fd = local. getType () == FILE_SEG ? local[i].devId : remote[i].devId; uintptr_t buf = local. getType () == FILE_SEG ? remote[i].addr : local[i].addr; size_t len = operation == NIXL_WRITE ? local[i].len : remote[i].len; if ( pwrite (fd, ( void *)buf, len, 0 ) < 0 ) { return NIXL_ERR_UNKNOWN; } } } else { return NIXL_ERR_NOT_SUPPORTED; } return NIXL_SUCCESS; } nixl_status_t nixlLocalEngine:: checkXfer (nixlBackendReqH *handle) { // éåææ©èœæªãµããŒãã®ããã峿ã§SUCCESSãè¿ã // éåæå®è£
ã«ã€ããŠã¯ä»¥äžåè // https://github.com/ai-dynamo/nixl/tree/main/src/plugins/posix return NIXL_SUCCESS; } nixl_status_t nixlLocalEngine:: releaseReqH (nixlBackendReqH *handle) { return NIXL_SUCCESS; } nixlLocalEngine::~ nixlLocalEngine () {} æåŸã«è¿œå ãã Plugin ã® meson ãæ¬¡ã®ããã«èšèŒããŸãã nixl/src/plugins/local/meson.build if 'LOCAL' in static_plugins local_backend_lib = static_library('LOCAL', 'local_backend.cpp', 'local_backend.h', 'local_plugin.cpp', dependencies: [nixl_infra, nixl_common_dep], include_directories: [nixl_inc_dirs, utils_inc_dirs], install: true, cpp_args: ['-fPIC'], name_prefix: 'libplugin_', install_dir: plugin_install_dir) else local_backend_lib = shared_library('LOCAL', 'local_backend.cpp', 'local_backend.h', 'local_plugin.cpp', dependencies: [nixl_infra, nixl_common_dep], include_directories: [nixl_inc_dirs, utils_inc_dirs], install: true, cpp_args: ['-fPIC'], name_prefix: 'libplugin_', install_dir: plugin_install_dir) if get_option('buildtype') == 'debug' run_command('sh', '-c', 'echo "LOCAL=' + local_backend_lib.full_path() + '" >> ' + plugin_build_dir + '/pluginlist', check: true ) endif endif local_backend_interface = declare_dependency(link_with: local_backend_lib) 远å åŸã¯æ¬¡ã®ããã«ã€ã³ã¹ããŒã«ããŸãã cd nixl pip install . 远å ãã Plugin ãæ€èšŒããã³ãŒããæ¬¡ã«èšè¿°ããŸããã backends ãšã㊠agent ã«ä»å远å ãã LOCAL ãèšå®ããŸãã nixl_local_example.py import os import sys import torch import subprocess from nixl._api import nixl_agent, nixl_agent_config if __name__ == "__main__" : agent_config = nixl_agent_config(backends=[ "LOCAL" ]) nixl_agent = nixl_agent( "LOCAL_TEST" , agent_config) t1 = [torch.ones( 10 , dtype=torch.float32) for _ in range ( 1 )] t1_reg_descs = nixl_agent.get_reg_descs(t1) t1_xfer_descs = nixl_agent.get_xfer_descs(t1) assert nixl_agent.register_memory(t1_reg_descs) is not None t2 = [torch.zeros( 10 , dtype=torch.float32) for _ in range ( 1 )] t2_reg_descs = nixl_agent.get_reg_descs(t2) t2_xfer_descs = nixl_agent.get_xfer_descs(t2) assert nixl_agent.register_memory(t2_reg_descs) is not None t3 = [torch.zeros( 10 , dtype=torch.float32) for _ in range ( 1 )] t3_reg_descs = nixl_agent.get_reg_descs(t3) t3_xfer_descs = nixl_agent.get_xfer_descs(t3) assert nixl_agent.register_memory(t3_reg_descs) is not None print ( "=====Init=====" ) print ( 't1:' , t1) print ( 't2:' , t2) print ( 't3:' , t3) print ( "=====Write t1 to FILE(/tmp/ones.bin)=====" ) fd = os.open( "/tmp/ones.bin" , os.O_RDWR | os.O_CREAT) file_list = [( 0 , t1[ 0 ].numel() * t1[ 0 ].element_size(), fd, "b" )] file_descs = nixl_agent.register_memory(file_list, "FILE" ) assert file_descs is not None file_xfer_files = file_descs.trim() xfer_handle_1 = nixl_agent.initialize_xfer( "WRITE" , t1_xfer_descs, file_xfer_files, "LOCAL_TEST" ) if not xfer_handle_1: print ( "Write t1 to FILE failed." , file =sys.stderr) exit(- 1 ) state = nixl_agent.transfer(xfer_handle_1) assert state == "DONE" # check file binary p = subprocess.run([ "hexdump" , "-Cv" , "/tmp/ones.bin" ], stdout=subprocess.PIPE) print ( "$ hexdump -Cv /tmp/ones.bin" ) print (p.stdout.decode()) print ( "=====Read t2 from FILE(/tmp/ones.bin)=====" ) xfer_handle_2 = nixl_agent.initialize_xfer( "READ" , t2_xfer_descs, file_xfer_files, "LOCAL_TEST" ) if not xfer_handle_2: print ( "Read t2 from FILE failed." , file =sys.stderr) exit(- 1 ) state = nixl_agent.transfer(xfer_handle_2) assert state == "DONE" print ( "t2:" , t2) print ( "=====Write t1 to t3=====" ) xfer_handle_3 = nixl_agent.initialize_xfer( "WRITE" , t1_xfer_descs, t3_xfer_descs, "LOCAL_TEST" ) if not xfer_handle_3: print ( "Read t2 from FILE failed." , file =sys.stderr) exit(- 1 ) state = nixl_agent.transfer(xfer_handle_3) assert state == "DONE" print ( 't3:' , t3) # cleanup nixl_agent.release_xfer_handle(xfer_handle_1) nixl_agent.release_xfer_handle(xfer_handle_2) nixl_agent.release_xfer_handle(xfer_handle_3) nixl_agent.deregister_memory(t1_reg_descs) nixl_agent.deregister_memory(t2_reg_descs) nixl_agent.deregister_memory(t3_reg_descs) nixl_agent.deregister_memory(file_descs) os.close(fd) å®è¡çµæã¯æ¬¡ã®ããã«ãªããŸãã t1 , t2 , t3 ã® 3 ã€ã® torch.Tensor ããããããã¡ã€ã«ã«æžã蟌ã¿ãèªã¿èŸŒã¿ãã¡ã¢ãªéã®è»¢éã宿œããŠããæ§åãåãããŸãã > python nixl_local_example.py Backend LOCAL was instantiated Initialized NIXL agent: LOCAL_TEST ===== Init = ==== t1: [ tensor( [ 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 . ] ) ] t2: [ tensor( [ 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 . ] ) ] t3: [ tensor( [ 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 ., 0 . ] ) ] ===== Write t1 to FILE ( /tmp/ones.bin ) ===== $ hexdump -Cv /tmp/ones.bin 00000000 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f |...?...?...?...?| 00000010 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f |...?...?...?...?| 00000020 00 00 80 3f 00 00 80 3f |...?...?| 00000028 ===== Read t2 from FILE ( /tmp/ones.bin ) ===== t2: [ tensor( [ 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 . ] ) ] ===== Write t1 to t3 = ==== t3: [ tensor( [ 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 ., 1 . ] ) ] ãŸãšã æ¬èšäºã§ã¯ãLLM æšè«ãã¬ãŒã ã¯ãŒã¯åãã«èšèšããããé«éãã€äœé
å»¶ãªæœè±¡å転éã©ã€ãã©ãªãNVIDIA Inference Xfer LibraryïŒNIXLïŒãã«ã€ããŠè§£èª¬ããŸããããŸããexample ã®å®è¡æ¹æ³ã Custom Plugin ã®å®è£
æ¹æ³ã«ã€ããŠã玹ä»ããŸããã NIXL ãå°å
¥ããããšã§ãè€æ° GPU ããã«ãããŒãç°å¢ã«ãããã¬ã€ãã³ã·ã®åæžãã¹ã±ãŒã«ã¢ãŠãã容æã«ãªããå°æ¥çã«ã¯ Plugin ãæŽ»çšããæ°ããã¹ãã¬ãŒãžæè¡ãéä¿¡ãããã³ã«ãžã®å¯Ÿå¿ãæåŸ
ã§ããŸãã LLM ãã¯ããã¯ã®ç¿åŸ: æšè«ã®æé©å - NVIDIA æè¡ããã° ↩ https://github.com/ai-dynamo/nixl/commit/b0085154d2aa4347c332bb121293a77ab733a871 ↩ Abstract class - cppreference.com ↩ https://github.com/ai-dynamo/nixl/blob/503fe5ccb86b5963b828ee5663672fcba66b92d2/src/plugins/ucx/ucx_backend.cpp#L892-L898 ↩