AI ASSist

Disclaimer

This document is a prelimary version and work-in-progress,
Details discussed in this document will be subject to future modifications.

Playground

Developer may use the following URL to help implement the specificiations described in this document.

https://fr01a.buzzee.tel/Delos/Phone_FormTestAssist?Assistant=EditSummary&WMSG_ID=1001001

Speech assistant to customize instructions

The speech assistant in the PhoneApp will submit the speech as text to the Delos instance and the response will contain the text message to be speech synthesis and played to the user:

The workflow for speech sockets is summarized through the major steps :

1. PhoneApp will start a new session for the speech assistant by invoking /Phone_AssistStart which will return the greeting message to be speech synthesis and played to the user.
The recording is starting.
2. When the SpeechRecognition detects a sentence as complete (or a silence), the PhoneApp will submit the user message to /Phone_Assist which is in charge to assemble the conversation, perform the inference and return the generated message.
PhoneApp will speech synthesis the message to be played to the user. During SpeechSynthesis the recording may be suspended to avoid recording the audio being played.
3. To terminate the assistant session PhoneApp will invoke /Phone_AssistTerminate

HTTP Authorization and Protocol

Authorization between PhoneApp and Delos instance of the user requires XMLC_Credential, a token value encoding XMLC_UserID + XMLC_Session.
And so all HTTP requests to Delos instance from the PhoneApp will be considered in this specification formatted as: 
https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_AssistStart
where 1xxx1abcdef is the XMLC_Credential encoding XMLC_UserID+XMLC_Session
This credential is obtained after initial registration.

/Phone_AssistStart (PhoneApp -> Delos)

PhoneApp will start a new session for the speech assistant by invoking /Phone_AssistStart 

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_AssistStart?Assistant=PhoneCallAssist&WMSG_ID=1001002

Params Description
Assistant The name of the intake assistant which will drive the conversation to edit/modify/collect/customize infos.
Example: PhoneCallAssist in the context of the phone call form.
Example: ContactAssist in the context of the contact overview form.
WMSG_ID Optional. Used to update the proper field of this message.
This parameter may be used with assistant in case of PhoneCallAssist
CPSN_ID Optional. Used in case to customize the overview of a specific contact.
This parameter may be used with assistant ContactAssist

Delos server will return a SessionID for this new conversation with the assistant

{ "SessionID" : "123456", "Language": "en", 
  "Message": "You are assisted by AI. Tell me what you are looking to do.", "Voice": "Joy" }

The major fields are:

Fields Description
SessionID This value should be retained by the client and passthrough on subsequent calls to /Phone_Assist
Language Locale code for the language of the message to be played.
fr = FRANCAIS, en = ENGLISH, es = ESPANOL, de = DEUTSCH, it = ITALIANO
SpeechLang This is the parameter value required for SpeechRecognition and SpeechSynthesis WebAPI.
Example: fr-FR, en-US, es-ES, de-DE, it-IT
Message The opening message (the greeting), that the client application need to synthesis as voice and play.
Voice Optional. The voice to be used for synthesis of the message.

The following table summarizes the different error codes:

Errors Description
ERR_BLANK_ASSISTANT The mandatory parameter Assistant is missing.

/Phone_Assist (PhoneApp -> Delos)

Once the greeting has been played, the client app start the recording of instructions, and when the client library detects the end of a sequence or a silence, the client app will POST to /Phone_Assist the transcription and wait for the server to return the next turn of the conversation:

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_Assist
SessionID=123456
Message=I would like to edit the summary of this phone call
or once the proper assistant takes over:
Message=Add the emoticon on the first line of the summary to display the importance or priority of this conversation

Params Description
SessionID The ID of the session returned by the initial action /Phone_AssistStart
ie: 123456
Message The sentence transcribed by the client library.

The Delos instance will respond with the message to be speech synthesis and played to the user.

{
  "Language": "en", "Message": "I have added your instructions and regenerated the summary.
  Are you pleased with this new version? 
  You can still add more instructions to refine the summary further.", "Voice": "Joy",
  "Assistant": "EditSummary",
  "RefreshFields": "WMSG_INFO",
  "Text": "Here is the new version of the summary",
  "HTML": "Here is the HTML version of the markdown version of the text"
} 

 

Fields Description
Language Locale code for the language of the message to be played.
fr = FRANCAIS, en = ENGLISH, es = ESPANOL, de = DEUTSCH, it = ITALIANO
Message The message infered by the model.
The switching assistant will ask for confirmation, and once user confirmed his intention, the assistant will return: "Alright." and switch to the editing assistant.
The editing assistant returns usually a brief explanation of what has been applied to the Text (see below).
Voice Optional. The voice to be used for synthesis of the message.
Assistant Optional. NEW-AS-OF 2025-11-19 The current assistant running the AI logic. Since the introduction of the switching assistant, the client application may use this information to determine to where is the user in the context of the AI Assist session.
RefreshFields Optional. NEW-AS-OF 2025-11-19 The fields to which applies the Text.
Example: "RefreshFields": "WMSG_INFO"
Text Optional. NEW-AS-OF 2025-10-25 This is the text modified after applying the instructions of the user. 
Depending on the assistant and the original Text stored in the backend, it may be markdown.
HTML Optional. NEW-AS-OF 2025-10-25 This is the HTML version of the markdown Text.
RefreshArtifact Optional. It is a hint for the client app to refresh the content where he started the session.
If the custom instructions provided by the user have been applied, Delos instance is in charge to run functions and tools which have altered the content of the view.
The client app may refresh the view by querying the server on the same URL it started this Ai Assist session.
Hangup Optional. If the model has ended the AI assistance following the instructions of the user wishing to close the session. The application can use this flag to close speech recognition, after playing the Message.
When Hangup=1, there is no need to call Phone_AssistTerminate.
Example: Hangup=1

The following table summarizes the different error codes:

Errors Description
ERR_SESSIONID_NOT_FOUND Cannot locate session of the assistant for this conversation.
Probably /Phone_AssistStart has not been called.

/Phone_AssistTerminate (PhoneApp -> Delos)

The user may decide to terminate the session at any time. The client app informs the server with the SessionID to terminate.

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_Assist?SessionID=123456

Params Description
SessionID The ID of the session returned by the initial action /Phone_AssistStart
ie: 123456

The Delos instance will acknoledge the termination of the session with Status=OK.
No further /Phone_Assist request will be accepted by the server for this SessionID

{ "Status": "OK" } 

The following table summarizes the different error codes:

Errors Description
ERR_SESSIONID_NOT_FOUND Cannot locate session of the assistant for this conversation.
Probably /Phone_AssistStart has not been called.