Assistant

Disclaimer

This document is a prelimary version and work-in-progress,
Details discussed in this document will be subject to future modifications.

PBX Speech

The dedicated PBX Speech server managing speech recognition and speech synthesis will invoke the Delos instance hosting the account of the user, the PBX Speech will submit the speech as JSON conversation similar to OpenAI model and it will obtain the response as a JSON conversation similar to OpenAI including the assistant message infered to synthesize speech in the audio stream of the call:

The workflow for PBX Speech is summarized through the major steps :

1. On new incoming call arriving at this PBX Speech, the PBX will invoke the preconfigured WebHook URL /SpeechStart at regional server fr1.buzzee.tel. The response will contain the subsequent WebHook URL to invoke on the host where is located the user (http://fr01a.buzzee.tel/JSON/SpeechAssistant). The initial response will contain the basic instructions and greeting message, the PBX will synthesize speech of the last message of role "assistant".
2. at each turn of the user conversation, the PBX will invoke the WebHook URL /SpeechAssistant obtained in initial response, the PBX will provide the last user message (and possibly the multi-turn conversation as received from previous call to Delos instance).
Delos instance will provide the deflated conversation including the last message as role "assistant" to synthesize speech to the audio channel.
3. When the call is being hangup by the caller, the PBX Speech will invoke the preconfigured WebHook URL /SpeechHangup to terminate the session of the assistant for this WMSG_ID
7. Delos instance may request the PBX Speech to initiate a speech call by invoking /SpeechMakeCall, PBX will originate the call and start the workflow from step #1 /SpeechStart.

HTTP Authorization and Protocol

Authorization for communications between PBX and host servers fr01a.buzzee.tel (fr02a.buzzee.tel, ...) will be ByPass per IP.
The API will be HTTP requests.

/SpeechStart (PBX Speech -> Regional server)

On new incoming call arriving at the PBX Speech server, the PBX will invoke the regional server fr1.buzzee.tel to obtain the main WebHook URLs "Assistant" and "Hangup" to locate the server endpoint hosting the user of this speech line.
TO BE DISCUSSED: This first call may also return the initial system instructions, and greeting to be synthesize to voice by the PBX and played to the user.
The "user" is the vocabulary to name the contact on the phone line.
The PBX Speech server will invoke /SpeechStart when starting a new speech session (a call being answered on the PBX side).

PBX Speech will invoke a HTTP GET request to the regional Delos instance:

http://fr1.buzzee.tel/JSON/SpeechStart?CalledID=33612345678&CallerID=33698765432&CallID=123456_abcdef

Params Description
CalledID The phone number of the user being called.
Possibly the CalledID may be the LineID in case of custom dedicated line when the user has not redirected his phone number to the generic LineID.
In this document: 33612345678 in case of the phone number of the user,
or 33912345678 in vase of dedicated line associated for the User having the phone number 33612345678
CallerID The phone number of the caller.
In this document: 33698765432
CallID This is the UniqueID of the Astersik server, it is an Opaque ID of the call which will be used to identify the session of the assistant for this call.

The Delos instance will respond with main fields "Assistant" and "Hangup" which are the WebHook URLs to invoke to pursue the speech session. It may also return the initial system instructions, and opening message (greeting) to be synthesize to voice by the PBX and played to the user.

{
  "Assistant": "http://fr01a.buzzee.tel/JSON/SpeechAssistant?XMLC_UserID=100001&WMSG_ID=2002002",
  "Hangup": "http://fr01a.buzzee.tel/JSON/SpeechHangup?XMLC_UserID=100001&WMSG_ID=2002002",
  "Message": "Welcome to Cafe Paname. How can I help?",
  "Language": "fr", 
  "Voice": "Joy",
  "Body": { 
    "model": "gpt-oss-120b"
    "messages": [
      { "role": "system", "content": "You are a reservation agent." },
      { "role": "assistant", "content": "Welcome to Cafe Paname. How can I help?" }
    ]
  }
} 

 

Fields Description
Assistant The WebHook URL to invoke foreach completion of user sentence during the call.
The response will contain the deflated multi-turn conversation to pursue.
This URL is the endpoint URI of the server where the user is hosted.
Example: http://fr01a.buzzee.tel/JSON/SpeechAssistant?XMLC_UserID=100001&WMSG_ID=2002002
Hangup The WebHook URL to invoke when call is terminated.
This URL is the endpoint URI of the server where the user is hosted.
Example: http://fr01a.buzzee.tel/JSON/SpeechHangup?XMLC_UserID=100001&WMSG_ID=2002002
Message The opening message (the greeting), that the PBX Speech should synthesize as voice.
Example: "Message": "Welcome to Cafe Paname. How can I help?" 
Note: This message is also duplicated as the last entry as "role": "assistant" in the multi-turn conversation.
The PBX Speech need to locate this last message, since in case of re-call following interruption, the conversation multi-turn will continue from this last message "role": "assistant".
Example: "messages": [ { "role": "assistant", "content": "Welcome to Cafe Paname. How can I help?" } ]
Language Optional. Locale code for the language of the message  to be synthesize as speech and played to the user.
fr = FRANCAIS, en = ENGLISH, es = ESPANOL, de = DEUTSCH, it = ITALIANO
If this parameter is not provided, the PBX Speech will synthesize using its default language.
Voice Optional. The voice to be used for synthesize of the message.

The following table summarizes the different error codes:

Errors Description
ERR_USER_NOT_FOUND Cannot find user associated with the CalledID phone number.
ERR_INVALID_CALLERID Invalid phone number for the parameter CallerID
ERR_INVALID_CALLEDID Invalid phone number for the parameter CalledID
ERR_BLANK_ASSISTANT Assistant not found for CalledID.
Probably the configuration is not setup properly.
PBX Speech should play a message to inform the caller.

/SpeechAssistant (PBX Speech -> Delos)

After obtaining the main WebHook URLs Assistant, the PBX Speech will invoke this WebHook whenever a sentence is completed by the user during the call, and POST the transcript of this sentence since the last invocation.
The "user" is the vocabulary to name the contact on the phone line.

PBX Speech will invoke a HTTP POST request to the WebHook URL "Assistant" : "/SpeechAssistant" received in the initial call to /SpeechStart

http://fr1.buzzee.tel/JSON/SpeechAssistant?XMLC_UserID=100001&WMSG_ID=2002002

{
  "model": "gpt-oss-120b"
  "messages": [ 
    { "role": "system", "content": "You are a reservation agent." },
    { "role": "assistant", "content": "Welcome to Cafe Paname. How can I help?" },
    { "role": "user", "content": "I would like a table for 2 this evening" }
  ]
} 

The body of the request content show the latest sentence of the user : "I would like a table for 2 this evening"
Note: the PBX Speech may decide to skip previous messages of the multi-turn converation, and send only the last user message, since the whole session is maintained also on the Delos server.
The 2 parameters of the WebHook URL "Assistant" can be consider as Opaque since they are part of the WebHook:

Params Description
XMLC_UserID The ID of the user. It is used to identify the Scope (the database).
ie: 100001
WMSG_ID The ID of the session of the assistant on the Delos side obtained in response to /SpeechStart
ie: WMSG_ID=2002002

The Delos instance will respond with the multi-turn conversation deflated and augmented of the message to be synthesize to speech as the last message with "role": "assistant":

{
  "Message": "At what name would you like the reservation?",
  "Language": "fr", 
  "Voice": "Joy",
  "Body": { 
    "model": "gpt-oss-120b"
    "messages": [ 
      { "role": "system", "content": "You are a reservation agent." },
      { "role": "assistant", "content": "Welcome to Cafe Paname. How can I help?" },
      { "role": "user", "content": "I would like a table for 2 this evening" },
      { "role": "assistant", "content": "At what name would you like the reservation?" }
    ]
  }
} 

The response to be synthesize as speech is the last message of "role": "assistant"
"At what name would you like this reservation?"

Fields Description
Message The message that the PBX Speech should synthesize as voice.
Example: "Message": "At what name would you like this reservation?" 
Note: This message is also duplicated as the last entry as "role": "assistant" in the multi-turn conversation.
The last message "role": "assistant" the PBX Speech should synthesize as voice.
This message is the last entry as "role": "assistant" in the multi-turn conversation.
Example: "messages": [ { "role": "assistant", "content": "At what name would you like this reservation?" } ]
Language Optional. Locale code for the language of the message to be synthesize as speech and played to the user.
fr = FRANCAIS, en = ENGLISH, es = ESPANOL, de = DEUTSCH, it = ITALIANO
If this parameter is not provided, the PBX Speech will synthesize using its default language.
Voice Optional. The voice to be used for synthesize of the message.

The following table summarizes the different error codes:

Errors Description
ERR_USER_NOT_FOUND Cannot find user associated with the CalledID phone number.
ERR_WMSG_NOT_FOUND Invalid session number for this call.
Cannot locate session of the assistant for this CallID.
ERR_BLANK_ASSISTANT Assistant not defined for this session.
Probably the configuration is not setup properly.
PBX Speech should play a message to inform the user.

The following table summarizes the different error codes:

Errors Description
ERR_USER_NOT_FOUND XMLC_UserID does not exist at this server
ERR_CALLID_NOT_FOUND Cannot locate session of the assistant for this CallID.
Probably /Phone_NewAssistant has not been called.

/SpeechHangup (PBX Speech -> Delos)

When the call is hangup by the caller or terminated by the PBX, the PBX Speech server will HTTP GET invoke Delos instance with the WebHook URL "Hangup" : "/SpeechHangup"

http://fr01a.buzzee.tel/JSON/SpeechHangup?XMLC_UserID=100001&WMSG_ID=2002002

The 2 parameters of the WebHook URL "Hangup" can be consider as Opaque since they are part of the WebHook:

Params Description
XMLC_UserID The ID of the user. It is used to identify the Scope (the database).
ie: 100001
WMSG_ID The ID of the session of the assistant on the Delos side obtained in response to /SpeechStart

The Delos instance will acknoledge the termination of the call with Status=OK.
Delos instance will terminate this session of the assistant.

{ "Status": "OK" } 

The following table summarizes the different error codes:

Errors Description
ERR_USER_NOT_FOUND XMLC_UserID does not exist at this server
ERR_WMSG_NOT_FOUND Invalid session number for this call.
Cannot locate session of the assistant for this CallID.

/SpeechMakeCall (Delos -> PBX Speech)

Delos may initiate a call to start a new speech assistant session.

Delos will invoke a HTTP GET request to the PBX Speech server

http://frspeech.buzzee.tel/SpeechMakeCall?Source=33612345678&Line=33912345678&Destination=33609876543

Fields Description
Source This is the phone number of the user,
formatted as 33612345678
Line Optional: the line to be used to initiate the call, but the caller should appear to be Source.
Destination This is the phone number of the contact to call,
formatted as 33609876543

The PBX server will respond with Status=Originating :

{
  "Status": "Originating"
} 

The following table summarizes the different error codes:

Errors Description
ERR_INVALID_SOURCE Invalid phone number for the parameter Source
ERR_INVALID_DESTINATON Invalid phone number for the parameter Destination

Assistant

The workflow for Assistant describes the major calls between PhoneApp and Delos instance.

1. PhoneApp will query /Phone_ObtainAssistant on the Delos instance to get the phone number to setup call forwarding
2. Delos will relay this query to the Regional server fr1.buzzee.tel to /ObtainLineSpeech to obtain in response the Line to setup redirection
3. PhoneApp will dial the redirection of all calls to the line of the assistant
4..8 these steps are described in the section PBX Speech
9. During the multi-turn conversation, Delos instance may require a validation from the user, it will trigger a notification to wakeup PhoneApp
10..11 PhoneApp will query the state of the assistant, the user will select a custom answer and invoke /Phone_Stream with the text to be streamed in the audio channel of the ongoing call maintained by the assistant
12. When the call is hangup, the PBX Speech will invoke /Phone_Hangup to Delos instance to terminate the session of the assistant.
13. Delos instance may initiate a new speech session by invoking /MakeCallSpeech, the BX Speech will initiate the call and start over the workflow from step #3 /Phone_NewAssistant

/Phone_ObtainAssistant (PhoneApp -> Delos)

PhoneApp will GET /Phone_ObtainAssistant on the Delos server of the user to get the phone number to setup call forwarding to the assistant phone number:

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_ObtainAssistant?Phone=33612345678

Fields Description
Phone Optional. The line to be redirected to the assistant.
It may not be necessarily the user phone number, but a secondary line.
If the parameter is not provided, it will be considered that this is the primary phone number of the user that will be configured to be call forwarded.
This parameter may be used by the PBX server to pool the redirections of different users.

Delos instance will forward this request to the Regional server fr1.buzzee.tel, and will respond with a JSON document:

{ "Line" : "33912345678" }

The major fields are:

Fields Description
Line The line to setup call fowarding.

The following table summarizes the different error codes:

Errors Description
ERR_BLANK_PHONE

The Phone number cannot be located from the Phone number supplied.

/ObtainLineSpeech (Delos -> fr1.buzzee.tel)

Delos will forward the request /Phone_ObtainAssistant received by the PhoneApp to the Regional server by invoking /ObtainLineSpeech

https://fr1.buzzee.tel/XML/ObtainLineSpeech?Phone=33612345678

Fields Description
Phone The phone line of the user to be redirected to the PBX Speech service.
The Regional connect server may use this information to select a Line from a pool to avoid all users being redirected to a single phone number managed by the PBX Speech.
It is specifically useful when the user does not want to redirect his phone line, and let the PBX Speech operate the assisstant on a dedicated line.

Regional connect server fr1.buzzee.tel will respond with a JSON document:

{ "Line" : "33912345678" }

The major fields are:

Fields Description
Line The line to setup call fowarding.
In this document, it is considered to be 33912345678

The following table summarizes the different error codes:

Errors Description
ERR_POOL_EXCEEDED The pool of available LineID has been exhausted. No more LineID is available at this time.