2025-08-29/gpt-realtimeを追う

Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.

「こんにちは。元気？」をテキストで聞いた時の返答

「こんにちは！元気だよ、ありがとう。あなたはどう？最近調子はどうかな？」

明らかにフランクになっている

インストラクションを適当に書き換えてみる

「Your voice and personality should be warm and engaging, with a lively and playful tone.」を消してそれっぽい文を追加

code:custom_instruction.txt

Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you're asked about them.

ユーザーからの口調の指示がなければ、できるだけ敬語(丁寧語)で、事務的だが協力的に対応する。

「こんにちは。元気？」をテキストで聞いた時の返答

「こんにちは、お声がけありがとうございます。はい、元気ですよ！そちらはいかがですか？何かお手伝いできることがあれば、遠慮なく言ってくださいね。」

調整は可能だった

インストラクションがよりしっかり反映されるようになったということかな

声「Marin」「Cedar」は、gpt-realtimeで試すと確かに他の声より自然かも。あと雑音が少ない

finish_session を試す

プロンプトガイドに書いてある関数名と説明文をそのままセッションに指定

code:session.json

{ tools: [{

type: "function",

name: "finish_session",

description: "Call this when a customer says they're done with the session or doesn't want to continue. If it's ambiguous, confirm with the customer before calling.",

parameters: {

type: "object",

properties: {},

required: [],

}] }

「もういいよ、ありがとう」と言ってみたら、即 finish_session({}) を呼び出してくれた。いいね

実運用だと『「お電話ありがとうございました」と言ってからcallして』みたいなプロンプトにする感じかな

gpt-4o-realtime-preview 2024-12-17 で試したらシステムエラーになった

Azure側

日本時間 2025-09-04 9:00 に確認したらリリースされてた。

gpt-realtime-mini は、 2025-10-09 9:00 に確認したらモデル一覧に追加されてる

自分の個人アカウントだとロックされてるけど

https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/announcing-gpt-realtime-on-azure-ai-foundry/4449666

引き続き、リージョンはeastus2, swedencentralのみ？

Azure AI Foundry上の廃止日は2026-09-01 9:00

gpt-4o(-mini)-realtime-preview 2024-12-17 の廃止日が空欄なのは変わらず

新機能を使わない場合、旧APIのデプロイ名を書き換えるだけで使える

https://your-endpoint.openai.azure.com/openai/realtime?api-version=2025-04-01-preview&deployment=

旧APIでも新しい声「Marin」と「Cedar」は使える (プレイグラウンドで使われてる)

Azureでの新APIの呼び出し方

WebSocketの場合、 https://your-endpoint.openai.azure.com/openai/v1/realtime?model=デプロイ名に繋ぐ

9/12時点で公式ドキュメントに書いてあるURLだと404 Resource Not Foundで繋がらない。。。

「Was this page helpful?」→「No」から、「ここ違うよ」というフィードバックはしておいた

10/14に直された

openai-node ライブラリを使う例 (v2.19.1)

code:javascript

const openAIClient = new OpenAI({

apiKey: apiKey,

baseURL: https://your-endpoint.openai.azure.com/openai/v1,

});

const realtimeClient = new OpenAIRealtimeWS(

{

model: deploymentName,

options: {

headers: { "api-key": apiKey }, // これ無いとエラーになった

openAIClient,

);

旧APIからの移行

セッションの型が変わった

https://platform.openai.com/docs/api-reference/realtime_client_events/session/update や、新しい型を見て書き直す必要がある

openai-nodeライブラリだと、importからbeta/を外すと新しい型になる。型エラーを直せばよい

code:ts

// before

import type { OpenAIRealtimeError } from "openai/beta/realtime/internal-base";

import { OpenAIRealtimeWS } from "openai/beta/realtime/ws";

import type { SessionUpdateEvent } from "openai/resources/beta/realtime/realtime";

// after

import type { OpenAIRealtimeError } from "openai/realtime/internal-base";

import { OpenAIRealtimeWS } from "openai/realtime/ws";

import type { SessionUpdateEvent } from "openai/resources/realtime/realtime";

細かいところでは SessionUpdateEvent.Session を SessionUpdateEvent["session"] に変えないといけなかった

あるいは、実際にsession.updateを投げてみて、型が違うとエラーが返るのでその都度直していく

code:エラー例.json

{"type":"invalid_request_error","code":"missing_required_parameter","message":"Missing required parameter: 'session.type'.","param":"session.type","event_id":null}

イベント名も変わっている

table:イベント名変更例

旧新

response.text.delta response.output_text.delta

response.audio.delta response.output_audio.delta

response.audio_transcript.delta response.output_audio_transcript.delta

これも旧イベント名だと型エラーになるので対応可能

使えるようになった機能

画像アップロード

session.audio.output.speed による、AIの応答の再生速度調整

使えなくなった機能

session.temperature

code:json

{"type":"invalid_request_error","code":"unknown_parameter","message":"Unknown parameter: 'session.temperature'.","param":"session.temperature","event_id":null}

「変えてもほぼ意味なかったので 0.8 に固定しといたよ」みたいな感じっぽい

他の謎

What's new (GitHub) にあるこれ何

Conversation Mode: Real-world turn-taking behavior for natural interactions. Conversation mode uses VAD to prompt users if no response is detected, improving real-world usability for phone-like interactions.

以下Azureリリース当時のメモ（読まなくていい）

2025-09-03 9:00

AzureのWhat's newが更新された。「gpt-4o-realtime」「gpt-4o-audio」としてリリース？

eastus2, swedencentral のモデル一覧に出てこない……。

What's newとmodels, quotaだけ更新されてる。。。

https://github.com/MicrosoftDocs/azure-ai-docs/commit/005fa2f16dd28f91c58f75f2a8023630f1fa4aca

2025-09-03 6:35:31 AM にマージされている

modelsを見ると、gpt-4o-realtimeが追加されてて、gpt-4o-audioは追加されてない。。。

リリースされた時に試すこと

クライアントコードはどのくらい変わるか

v1-preview 使う？以前のまま？

speedパラメータは使えるか

新ボイスは使えるか (Marin, Cedar)

finish_sessionはどんな感じか

https://cookbook.openai.com/examples/realtime_prompting_guide#no-audio-or-unclear-audio

What's new にある Conversation Mode とは何か関連しているのだろうか

OpenAIとの機能差が埋まったものがあるかどうか

speed

transcribeの途中経過

conversation.item.input_audio_transcription.delta

Semantic Turn Detection

WebRTC対応と同時に来てて、 api-version=2025-04-01-preview の時点で使えてた。。。気づかんかった

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/realtime-audio#semantic-vad

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/realtime-audio-reference#realtimeturndetection

2025-09-04 9:00

Azureのeastus2, swedencentralに来た

結局、名前の -4o は消えた

https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/announcing-gpt-realtime-on-azure-ai-foundry/4449666

Azure AI Foundry上の廃止日は2026-09-01 9:00

gpt-4o(-mini)-realtime-preview 2024-12-17 の廃止日が空欄なのは変わらず

2025-09-04 18:30

接続先URL変わった？

https://github.com/MicrosoftDocs/azure-ai-docs/commit/b35320f1f04abb13eb48554b990fb79505b6266c

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/realtime-audio-websockets?tabs=ga#connection-and-authentication

クエリパラメータがdeploymentからmodelに代わっている

この書き方だと、api-version 2025-08-28 だと旧URL使えない？

最新の openai ライブラリがこの変更に対応してないし

とりあえず openai-node ライブラリで接続時に model, deployment を両方つけるように改造して api-version=2025-08-28 に繋いでみたけど、 404 Resource Not Found のまま・。・・・・

api-version=2025-04-01-preview だとこの状態でも繋がるから、 2025-08-28 がまだ準備されてない？

準備されてないAPIをドキュメントに記載するのどうなんだ

イベント名とかセッションのプロパティ名も変わってる？

https://github.com/openai/openai-node/commit/477e5038a147f9ce49f88d58c65ed98a0fefa054

OpenAI playgroundで試した限り元のままに見えるけど

とりあえず api-version=2025-04-01-preview だと普通に gpt-realtime モデルも使えるな

新しい声はまだ使えなさそう。marin、cedarを使おうとするとエラーが返った

code:json

{

"type": "invalid_request_error",

"code": "invalid_value",

"message": "Voice marin is not available for your organization.",

"param": "session.voice",

"event_id": null

}

ただ、存在しない音声を指定した時とは別のエラーだった。謎

code:json

{

"type": "invalid_request_error",

"code": "invalid_value",

"message": "Invalid value: 'me'. Supported values are: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', and 'verse'.",

"param": "session.voice",

"event_id": null

}

2025-09-08 10:00

Azure、/openai/realtime じゃなくて /openai/v1/realtime なら新しいAPIで繋がる！！！！

modelパラメータかつapi-keyリクエストヘッダーの指定が必須 (Authorization: Bearer api-key だけだと404)。

やっぱりsession.updateに以前のものをそのままは渡せない

code:json

{"type":"invalid_request_error","code":"missing_required_parameter","message":"Missing required parameter: 'session.type'.","param":"session.type","event_id":null}

https://platform.openai.com/docs/api-reference/realtime_client_events/session/update や、新しい型を見て書き直す必要がある

openai-nodeライブラリだと、importからbeta/を外すと新しい型になる。基本は型エラーを直せばよい

code:json

// before

import type { OpenAIRealtimeError } from "openai/beta/realtime/internal-base";

import { OpenAIRealtimeWS } from "openai/beta/realtime/ws";

import type { SessionUpdateEvent } from "openai/resources/beta/realtime/realtime";

// after

import type { OpenAIRealtimeError } from "openai/realtime/internal-base";

import { OpenAIRealtimeWS } from "openai/realtime/ws";

import type { SessionUpdateEvent } from "openai/resources/realtime/realtime";

table:イベント名変更例

旧新

response.text.delta response.output_text.delta

response.audio.delta response.output_audio.delta

response.audio_transcript.delta response.output_audio_transcript.delta

使える機能

"marin", "cedar" はまだ使えない。。。

code:json

{

"type": "invalid_request_error",

"code": "invalid_value",

"message": "Voice marin is not available for your organization.",

"param": "session.audio.output.voice",

"event_id": null

}

2025-09-09 10:00 使えるようになってた！！！

サーバーイベント conversation.item.input_audio_transcription.delta は来てなさそう

session.audio.output.speed は使える

使えなくなった機能

session.temperature が無くなった

code:json

{"type":"invalid_request_error","code":"unknown_parameter","message":"Unknown parameter: 'session.temperature'.","param":"session.temperature","event_id":null}

何故？

OpenAIのフォーラムで聞いている人が居るけど、9/12時点で公式からの返答は無い。。。

↓ 公式説明があることに気づいた（その旨返答した）

「変えてもほぼ意味なかったので 0.8 に固定しといたよ」みたいな感じっぽい

非公開メモ: /heguro0/2025-08-29/gpt-realtimeを追う