ãã«ããã£ãã«æåèµ·ããã¹ããªãŒãã³ã°ã¯ã Amazon Transcribe ã®æ©èœã®äžã€ã§ãå€ãã®å ŽåãŠã§ããã©ãŠã¶ã§å©çšã§ããŸãããã®ã¹ããªãŒã ãœãŒã¹ã®äœæã«ã¯ããã€ãã®å¶çŽããããŸããã JavaScript Web Audio API ã䜿çšãããšãåç»ãé³å£°ãã¡ã€ã«ããã€ã¯ãªã©ã®ããŒããŠã§ã¢ãªã©ãããŸããŸãªãªãŒãã£ãªãœãŒã¹ãæ¥ç¶ããŠçµã¿åãããæåèµ·ãããäœæã§ããŸãã ãã®èšäºã§ã¯ã2 ã€ã®ãã€ã¯ããªãŒãã£ãªãœãŒã¹ãšããŠäœ¿çšãããããã 1 ã€ã®ãã¥ã¢ã«ãã£ãã«ãªãŒãã£ãªã«çµåããå¿
èŠãªãšã³ã³ãŒããå®è¡ã㊠Amazon Transcribe ã«ã¹ããªãŒãã³ã°ããæ¹æ³ã説æããŸãããã©ãŠã¶ã« 2 ã€ã®ãã€ã¯ãæ¥ç¶ããéã«å¿
èŠãšãã Vue.js ã¢ããªã±ãŒã·ã§ã³ã®ãœãŒã¹ã³ãŒããæäŸãããŠããŸãããã ãããã®ã¢ãããŒãã®æ±çšæ§ã¯ãã®ãŠãŒã¹ã±ãŒã¹ã«ãšã©ãŸãããããŸããŸãªããã€ã¹ããªãŒãã£ãªãœãŒã¹ã«å¯Ÿå¿ããããã«èª¿æŽã§ããŸãã ãã®ã¢ãããŒãã§ã¯ã1 åã® Amazon Transcribe ã»ãã·ã§ã³ã§ 2 ã€ã®ãœãŒã¹ã®æåèµ·ãããååŸã§ããããããœãŒã¹ããšã«åå¥ã®ã»ãã·ã§ã³ã䜿çšããå Žåãšæ¯èŒããŠãã³ã¹ãåæžãªã©ã®ã¡ãªãããåŸãããŸãã 2ã€ã®ãã€ã¯ã䜿çšããéã®èª²é¡ ä»åã®ãŠãŒã¹ã±ãŒã¹ã§ã¯ã2 ã€ã®ãã€ã¯ã§ã·ã³ã°ã«ãã£ãã«ã®ã¹ããªãŒã ã䜿çšãã Amazon Transcribe ã®ã¹ããŒã«ãŒã©ãã«è奿©èœ ãæå¹ã«ããŠã¹ããŒã«ãŒãèå¥ããã°ååãããããŸããããããã€ãèæ
®ãã¹ãç¹ããããŸãã ã¹ããŒã«ãŒã©ãã«ã¯ã»ãã·ã§ã³éå§æã«ã©ã³ãã ã«å²ãåœãŠããããããã¹ããªãŒã éå§åŸã«ã¢ããªã±ãŒã·ã§ã³ã§çµæããããã³ã°ããå¿
èŠããããŸãã 䌌ããããªå£°è²ãæã€ã¹ããŒã«ãŒã誀ã£ãŠã©ãã«ä»ããããå¯èœæ§ãããã人éã§ããåºå¥ãå°é£ã§ãã 2 人ã®ã¹ããŒã«ãŒã 1 ã€ã®ãªãŒãã£ãªãœãŒã¹ã§åæã«è©±ããšãé³å£°ãéãªãåãå¯èœæ§ããããŸãã ãã€ã¯ã§ 2 ã€ã®ãªãŒãã£ãªãœãŒã¹ã䜿çšããåãã©ã³ã¹ã¯ãªãããåºå®ã®å
¥åãœãŒã¹ããååŸããããšã§ããããã®æžå¿µã«å¯ŸåŠã§ããŸããã¹ããŒã«ãŒã«ããã€ã¹ãå²ãåœãŠãããšã§ãã¢ããªã±ãŒã·ã§ã³ã¯ã©ã®ãã©ã³ã¹ã¯ãªããã䜿çšããããäºåã«èªèã§ããŸãããã ããè¿ãã«ãã 2 ã€ã®ãã€ã¯ãè€æ°ã®é³å£°ãæŸã£ãŠããå Žåãé³å£°ãéãªãåãå¯èœæ§ããããŸããããã¯ãæåæ§ãã€ã¯ãé³é管çãAmazon Transcribe ã®åèªã¬ãã«ã® ä¿¡é ŒåºŠã¹ã³ã¢ ã䜿çšããããšã§è»œæžã§ããŸãã ãœãªã¥ãŒã·ã§ã³ã®æŠèŠ æ¬¡ã®å³ã¯ãœãªã¥ãŒã·ã§ã³ã®ã¯ãŒã¯ãããŒã瀺ããŠããŸãã 2ã€ã®ãã€ã¯ã®ã¢ããªã±ãŒã·ã§ã³å³ Web Audio API ã§ã¯ã2ã€ã®ãªãŒãã£ãªå
¥åã䜿çšããŸãããã® API ã䜿ããšããã€ã¯ A ãšãã€ã¯ B ã®2ã€ã®å
¥åã1ã€ã®ãªãŒãã£ãªããŒã¿ãœãŒã¹ã«çµ±åã§ããŸããå·Šãã£ã³ãã«ããã€ã¯ Aãå³ãã£ã³ãã«ããã€ã¯ B ã衚ããŸãã æ¬¡ã«ããã®ãªãŒãã£ãªãœãŒã¹ã PCM (ãã«ã¹ç¬Šå·å€èª¿) ãªãŒãã£ãªã«å€æããŸããPCM ã¯ãªãŒãã£ãªåŠçã§äžè¬çãªãã©ãŒãããã§ãããAmazon Transcribe ããªãŒãã£ãªå
¥åã«å¿
èŠãšãããã©ãŒãããã®1ã€ã§ããæåŸã«ãPCM ãªãŒãã£ãªã Amazon Transcribe ã«ã¹ããªãŒãã³ã°ããŠæåèµ·ãããè¡ããŸãã åææ¡ä»¶ 以äžã®ç°å¢ãäºåã«çšæããããšãå¿
èŠã§ãã GitHub ãªããžã㪠ããã®ãœãŒã¹ã³ãŒãã Bun ãŸã㯠Node.js ã JavaScript ã©ã³ã¿ã€ã ãšããŠã€ã³ã¹ããŒã«ãããŠããããšã Web Audio API ãšäºææ§ã®ãããŠã§ããã©ãŠã¶ããã®ãœãªã¥ãŒã·ã§ã³ã¯ãGoogle Chrome ããŒãžã§ã³ 135.0.7049.85 ã§åäœããããšããã¹ããããŠããŸãã 2 ã€ã®ãã€ã¯ãã³ã³ãã¥ãŒã¿ã«æ¥ç¶ããããã©ãŠã¶ãããããã®ãã€ã¯ã« ã¢ã¯ã»ã¹ã§ããããš ã Amazon Transcribe ã®æš©éãæã€ AWS ã¢ã«ãŠã³ããäŸãšããŠãAmazon Transcribe ã«ã¯æ¬¡ã® AWS Identity and Access Management ããªã·ãŒã䜿çšã§ããŸãã { "Version": "2012-10-17", "Statement": [ { "Sid": "DemoWebAudioAmazonTranscribe", "Effect": "Allow", "Action": "transcribe:StartStreamTranscriptionWebSocket", "Resource": "*" } ] } ã¢ããªã±ãŒã·ã§ã³ãèµ·åãã ã¢ããªã±ãŒã·ã§ã³ãèµ·åããã«ã¯ã以äžã®æé ãå®è¡ããŠãã ããã ã³ãŒããããŠã³ããŒãããã«ãŒããã£ã¬ã¯ããªã«ç§»åããŸãã env.sample ãã¡ã€ã«ãã AWS ã¢ã¯ã»ã¹ããŒãèšå®ããããã® .env ãã¡ã€ã«ãäœæããŸãã ããã±ãŒãžãã€ã³ã¹ããŒã«ãã bun install ãå®è¡ããŸãïŒNode.js ã䜿çšããŠããå Žå㯠node install ãå®è¡ããŸãïŒã Web ãµãŒããŒãèµ·åãã bun dev ãå®è¡ããŸãïŒNode.js ã䜿çšããŠããå Žå㯠node dev ãå®è¡ããŸãïŒã ãã©ãŠã¶ã§ http://localhost:5173/ ãéããŸãã. 2ã€ã®ãã€ã¯ãæ¥ç¶ã㊠http://localhost:5173 ã§å®è¡ãããŠããã¢ããªã±ãŒã·ã§ã³ ã³ãŒãã®èª¬æ ãã®ã»ã¯ã·ã§ã³ã§ã¯ãå®è£
ã®ããã®éèŠãªã³ãŒãéšåã解説ããŸãã æåã®ã¹ãããã¯ããã©ãŠã¶ API navigator.mediaDevices.enumerateDevices() ã䜿çšããŠãæ¥ç¶ãããŠãããã€ã¯ã®äžèЧãååŸããããšã§ãã const devices = await navigator.mediaDevices.enumerateDevices(); return devices.filter((d) => d.kind === 'audioinput'); 次ã«ãæ¥ç¶ãããŠãããã€ã¯ããšã«MediaStreamãªããžã§ã¯ããååŸããå¿
èŠããããŸããããã¯ããŠãŒã¶ãŒã®ã¡ãã£ã¢ããã€ã¹ïŒã«ã¡ã©ããã€ã¯ãªã©ïŒãžã®ã¢ã¯ã»ã¹ãå¯èœã«ãã navigator.mediaDevices.getUserMedia() APIã䜿çšããŠå®è¡ã§ããŸãããã®åŸããããã®ããã€ã¹ããã®é³å£°ãŸãã¯åç»ããŒã¿ã衚ãMediaStreamãªããžã§ã¯ããååŸã§ããŸãã const streams = [] const stream = await navigator.mediaDevices.getUserMedia({ audio: { deviceId: device.deviceId, echoCancellation: true, noiseSuppression: true, autoGainControl: true, }, }) if (stream) streams.push(stream) è€æ°ã®ãã€ã¯ããã®é³å£°ãçµåããã«ã¯ãé³å£°åŠççšã® AudioContextã€ã³ã¿ãŒãã§ãŒã¹ ãäœæããå¿
èŠããããŸãããã® AudioContext å
ã§ã ChannelMergerNode ã䜿çšããŠãç°ãªããã€ã¯ããã®é³å£°ã¹ããªãŒã ãçµåã§ããŸãã connect(destination, src_idx, ch_idx) ã¡ãœããã®åŒæ°ã¯æ¬¡ã®ãšããã§ãã destination â åºåå
ããã®äŸã§ã¯ mergerNode ã§ãã src_idx â ãœãŒã¹ãã£ã³ãã«ã®ã€ã³ããã¯ã¹ããã®äŸã§ã¯äž¡æ¹ãšã0ã§ãïŒåãã€ã¯ãã·ã³ã°ã«ãã£ã³ãã«ã®é³å£°ã¹ããªãŒã ã§ããããïŒã ch_idx â åºåå
ã®ãã£ã³ãã«ã€ã³ããã¯ã¹ããã®äŸã§ã¯ãããã0ãš1ã§ãã¹ãã¬ãªåºåãäœæããŸãã // audioContextã®ã€ã³ã¹ã¿ã³ã¹ const audioContext = new AudioContext({ sampleRate: SAMPLE_RATE, }) // ãã€ã¯ã®ã¹ããªãŒã ããŒã¿ãåŠçããããã«äœ¿çš const audioWorkletNode = new AudioWorkletNode(audioContext, 'recording-processor', {...}) // microphone A const audioSourceA = audioContext.createMediaStreamSource(mediaStreams[0]); // microphone B const audioSourceB = audioContext.createMediaStreamSource(mediaStreams[1]); // 2ã€ã®å
¥åçšã®ãªãŒãã£ãªããŒã const mergerNode = audioContext.createChannelMerger(2); // ãªãŒãã£ãª ãœãŒã¹ã mergerNode ã®å®å
ã«æ¥ç¶ã audioSourceA.connect(mergerNode, 0, 0); audioSourceB.connect(mergerNode, 0, 1); // mergerNodeãAudioWorkletNodeã«æ¥ç¶ merger.connect(audioWorkletNode); ãã®ãã€ã¯ããŒã¿ã¯ AudioWorklet tã§åŠçãããæå®ãããé²é³ãã¬ãŒã æ°ããšã«ããŒã¿ã¡ãã»ãŒãžãéä¿¡ãããŸãããããã®ã¡ãã»ãŒãžã«ã¯ãAmazon Transcribeã«éä¿¡ããPCM圢åŒã§ãšã³ã³ãŒããããé³å£°ããŒã¿ãå«ãŸããŸãã p-event ã©ã€ãã©ãªã䜿çšãããšãWorkletããã®ã€ãã³ããéåæçã«å埩åŠçã§ããŸãããã®Workletã®è©³çްã«ã€ããŠã¯ããã®èšäºã®æ¬¡ã®ã»ã¯ã·ã§ã³ã§èª¬æããŸãã import { pEventIterator } from 'p-event' ... // ã¯ãŒã¯ã¬ãããç»é²ãã try { await audioContext.audioWorklet.addModule('./worklets/recording-processor.js') } catch (e) { console.error('Failed to load audio worklet') } // éåæã€ãã¬ãŒã¿ const audioDataIterator = pEventIterator<'message', MessageEvent<AudioWorkletMessageDataType>>( audioWorkletNode.port, 'message', ) ... // AsyncIterableIterator: ã¯ãŒã¯ã¬ããã `SHARE_RECORDING_BUFFER` ã¡ãã»ãŒãžãå«ãã€ãã³ããçºè¡ãããã³ã«ããã®ã€ãã¬ãŒã¿ã¯å¿
èŠãª AudioEvent ãªããžã§ã¯ããè¿ãã const getAudioStream = async function* ( audioDataIterator: AsyncIterableIterator<MessageEvent<AudioWorkletMessageDataType>>, ) { for await (const chunk of audioDataIterator) { if (chunk.data.message === 'SHARE_RECORDING_BUFFER') { const { audioData } = chunk.data yield { AudioEvent: { AudioChunk: audioData, }, } } } } Amazon Transcribeãžã®ããŒã¿ã®ã¹ããªãŒãã³ã°ãéå§ããã«ã¯ãäœæããã€ãã¬ãŒã¿ã䜿çšãã NumberOfChannels: 2 ãš EnableChannelIdentification: true ãæå¹ã«ããŠãã¥ã¢ã«ãã£ãã«ã®æåèµ·ãããæå¹ã«ããŸãã詳现ã«ã€ããŠã¯ã AWS SDK StartStreamTranscriptionCommand ã®ããã¥ã¡ã³ããã芧ãã ããã import { LanguageCode, MediaEncoding, StartStreamTranscriptionCommand, } from '@aws-sdk/client-transcribe-streaming' const command = new StartStreamTranscriptionCommand({ LanguageCode: LanguageCode.EN_US, MediaEncoding: MediaEncoding.PCM, MediaSampleRateHertz: SAMPLE_RATE, NumberOfChannels: 2, EnableChannelIdentification: true, ShowSpeakerLabel: true, AudioStream: getAudioStream(audioIterator), }) ãªã¯ãšã¹ããéä¿¡ãããšããªãŒãã£ãªã¹ããªãŒã ããŒã¿ãš Amazon Transcribe ã®çµæã亀æããããã® WebSocket æ¥ç¶ãäœæãããŸãã const data = await client.send(command) for await (const event of data.TranscriptResultStream) { for (const result of event.TranscriptEvent.Transcript.Results || []) { callback({ ...result }) } } result ãªããžã§ã¯ãã«ã¯ã ch_0 ã ch_1 ãªã©ããã€ã¯ã®ãœãŒã¹ãèå¥ããããã«äœ¿çšã§ãã ChannelId ããããã£ãå«ãŸããŸãã 詳现: ãªãŒãã£ãªã¯ãŒã¯ã¬ãã ãªãŒãã£ãªã¯ãŒã¯ã¬ããã¯å¥ã¹ã¬ããã§å®è¡ããããšã§ãéåžžã«äœã¬ã€ãã³ã·ãªãªãŒãã£ãªåŠçãå®çŸããŸããå®è£
ãšãã¢ã®ãœãŒã¹ã³ãŒãã¯ã public/worklets/recording-processor.js ãã¡ã€ã«ã«ãããŸãã ä»åã®ã±ãŒã¹ã§ã¯ããã®ã¯ãŒã¯ã¬ããã䜿çšããŠäž»ã«2ã€ã®ã¿ã¹ã¯ãå®è¡ããŸãã mergerNode ã®ãªãŒãã£ãªãå埩åŠçããŸãããã®ããŒãã¯äž¡æ¹ã®ãªãŒãã£ãªãã£ã³ãã«ãå«ã¿ãã¯ãŒã¯ã¬ãããžã®å
¥åãšãªããŸãã mergerNode ããŒãã®ããŒã¿ãã€ãã PCM 笊å·ä»ã 16 ããã ãªãã«ãšã³ãã£ã¢ã³ ãªãŒãã£ãªåœ¢åŒã«ãšã³ã³ãŒãããŸãããã®åŠçã¯ãå埩åŠçããšã«ããŸãã¯ã¢ããªã±ãŒã·ã§ã³ã«ã¡ãã»ãŒãžãã€ããŒããéä¿¡ããå¿
èŠããããšãã«è¡ããŸãã ãããå®è£
ããããã®äžè¬çãªã³ãŒãæ§é ã¯æ¬¡ã®ãšããã§ãã class RecordingProcessor extends AudioWorkletProcessor { constructor(options) { super() } process(inputs, outputs) {...} } registerProcessor('recording-processor', RecordingProcessor) ãã®Workletã€ã³ã¹ã¿ã³ã¹ã«ã¯ã processorOptions 屿§ã䜿çšããŠã«ã¹ã¿ã ãªãã·ã§ã³ãæž¡ãããšãã§ããŸãããã¢ã§ã¯ãæ°ããã¡ãã»ãŒãžãã€ããŒããéä¿¡ããã¿ã€ãã³ã°ã決å®ããããã®ãããã¬ãŒãã¬ã€ããšããŠã maxFrameCount: (SAMPLE_RATE * 4) / 10 ãèšå®ããŠããŸããã¡ãã»ãŒãžã®äŸã¯ä»¥äžã®ãšããã§ãã this.port.postMessage({ message: 'SHARE_RECORDING_BUFFER', buffer: this._recordingBuffer, recordingLength: this.recordedFrames, audioData: new Uint8Array(pcmEncodeArray(this._recordingBuffer)), // PCM encoded audio format }) 2ãã£ã³ãã«ã®PCMãšã³ã³ãŒã æãéèŠãªã»ã¯ã·ã§ã³ã®äžã€ã¯ã2ãã£ã³ãã«ã®PCMãšã³ã³ãŒãæ¹æ³ã§ãã Amazon Transcribe APIãªãã¡ã¬ã³ã¹ ã®AWSããã¥ã¡ã³ãã«ãããšãAudioChunk㯠Duration (s) * Sample Rate (Hz) * Number of Channels * 2 ã§å®çŸ©ãããŸãã2ãã£ã³ãã«ã®å Žåã16000Hzã§1ç§ã¯ã1 * 16000 * 2 * 2 = 64000 bytesã§ãããšã³ã³ãŒã颿°ã¯ä»¥äžã®ããã«ãªããŸãã // å
¥åã¯é
åã§ãããåèŠçŽ ã¯ AudioWorkletProcessor ããã® -1.0 ïœ 1.0 ã® Float32 å€ãæã€ãã£ãã«ã§ããããšã«æ³šæããŠãã ããã const pcmEncodeArray = (input: Float32Array[]) => { const numChannels = input.length const numSamples = input[0].length const bufferLength = numChannels * numSamples * 2 // 2 bytes per sample per channel const buffer = new ArrayBuffer(bufferLength) const view = new DataView(buffer) let index = 0 for (let i = 0; i < numSamples; i++) { // åãã£ã³ãã«ããšã«ãšã³ã³ãŒã for (let channel = 0; channel < numChannels; channel++) { const s = Math.max(-1, Math.min(1, input[channel][i])) // 32 ãããæµ®åå°æ°ç¹æ°ã 16bit PCM ãªãŒãã£ãªæ³¢åœ¢ãµã³ãã«ã«å€æããŸãã // æå€§å€: 32767 (0x7FFF)ãæå°å€: -32768 (-0x8000) view.setInt16(index, s < 0 ? s * 0x8000 : s * 0x7fff, true) index += 2 } } return buffer } ãªãŒãã£ãªããŒã¿ãããã¯ã®åŠçæ¹æ³ã®è©³çްã«ã€ããŠã¯ã AudioWorkletProcessor: process() ãœãããåç
§ããŠãã ãããPCM圢åŒã®ãšã³ã³ãŒãã®è©³çްã«ã€ããŠã¯ã Multimedia Programming Interface and Data Specifications 1.0 ãåç
§ããŠãã ããã çµè« ãã®èšäºã§ã¯ããã©ãŠã¶ã® Web Audio API ãš Amazon Transcribe ã¹ããªãŒãã³ã°ã䜿çšããŠããªã¢ã«ã¿ã€ã ã®ãã¥ã¢ã«ãã£ãã«æåèµ·ãããå®çŸãããŠã§ãã¢ããªã±ãŒã·ã§ã³ã®å®è£
ã®è©³çްã«ã€ããŠèª¬æããŸããã AudioContext ã ChannelMergerNode ã AudioWorklet ãçµã¿åãããããšã§ã2 ã€ã®ãã€ã¯ããã®é³å£°ããŒã¿ãã·ãŒã ã¬ã¹ã«åŠçããã³ãšã³ã³ãŒãããAmazon Transcribe ã«éä¿¡ããŠæåèµ·ãããè¡ãããšãã§ããŸãããç¹ã« AudioWorklet ã䜿çšããããšã§ãäœã¬ã€ãã³ã·ãŒã®é³å£°åŠçãå®çŸããã¹ã ãŒãºã§å¿çæ§ã®é«ããŠãŒã¶ãŒãšã¯ã¹ããªãšã³ã¹ãæäŸã§ããŸããã ãã®ãã¢ãåºã«ãäŒè°ã®é²é³ããé³å£°å¶åŸ¡ã€ã³ã¿ãŒãã§ãŒã¹ãŸã§ãå¹
åºããŠãŒã¹ã±ãŒã¹ã«å¯Ÿå¿ãããããé«åºŠãªãªã¢ã«ã¿ã€ã æåèµ·ããã¢ããªã±ãŒã·ã§ã³ãäœæã§ããŸãã ãã²ãã®ãœãªã¥ãŒã·ã§ã³ãã詊ãããã ããã³ã¡ã³ãæ¬ã«ãã£ãŒãããã¯ããå¯ããã ããã åæã¯ ãã¡ã ã§ãã About the Author Jorge Lanzarotti is a Sr. Prototyping SA at Amazon Web Services (AWS) based on Tokyo, Japan. He helps customers in the public sector by creating innovative solutions to challenging problems.