Record (R2T)

Disclaimer

This document is a prelimary version and work-in-progress,
Details discussed in this document will be subject to future modifications.

Workflow

The workflow for RecordToText (R2T) is summarized through the major steps : audio recording, upload, transcript.
The R2T application will initiate the audio recording with a contact, the application will upload the audio recording as multi-parts in background every minute or so, and finally will retrieve the transcription to be displayed to the user.
The workflow of this process is described below:

1. R2T application will query /Phone_GetCPSN on the Delos instance to lookup the contact details based on the phone number of the contact
2. If the contact is unknown or if the details of the contact needs to be updated, the R2T will invoke /Phone_BatchCPSN to update the contact details on the Delos instance. The contact details provide valuable hint to the server in charge of transcript.
3. R2T uploads the audio recording at once, or only the first part by invoking Delos instance using /Phone_InsertRecord
4. R2T uploads the subsequent parts of the audio recording every minute or so using /Phone_ConcatRecord, and set parameter EOF=1 for the last part.
Delos performs the transcription in background for every part, and concatenate the text, preparing while waiting the final part to be uploaded.
5. Optional. R2T may invoke /Phone_Transcript to initiate the transcript of the conversation, even if Delos is now managing the transcription in background for every parts of the audio recording being uploaded.
6. Delos server invokes /sendNotification on PhoneNotifications server to inform the requested transcript is ready for this WMSG[WMSG_ID]
7.The PhoneNotifications will relay the notifications through FCM (FireBase Cloud Messaging) to the R2T mobile app
8. R2T mobile app intercept the notification and request /Phone_FormWMSG on Delos server, to obtain the details of summary and transcript of the conversation

HTTP Authorization and Protocol

Authorization between Phone application RecordToText (R2T) and Delos instance of the user requires XMLC_Credential, a token value encoding XMLC_UserID + XMLC_Session.
All HTTP requests targeting Delos instance from the R2T phone application will be considered in this specification formatted as:
https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_ListWMSG
where 1xxx1abcdef is the XMLC_Credential encoding XMLC_UserID+XMLC_Session
This credential is obtained after initial registration

Audio File Format

SpeechToText server (S2T) request to have WAV file (uncompressed).
The application R2T RecordToText will convert on the client device any MP4, MP3 file, before uploading the file to WAV format.

/Phone_GetCPSN (R2T -> Delos)

The R2T application SHOULD lookup contact details from Delos instance to check if informations are up-to-date.
These data facilitate the process of transcript for the recorded conversation,
R2T sends a HTTP GET request to Delos instance:

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_GetCPSN?Phone=33609876543

Fields	Description
Phone	The phone number used to lookup CPSN_ID with informations. Use either parameter Phone or CPSN_ID

Delos instance will respond with a JSON document:

{ "CPSN" : 
  { "CPSN_ID" : "1000202", "CPSN_VERSION" : "1000042"
    "CPSN_FIRST_NAME" : "Dave", "CPSN_LAST_NAME" : "THEUGLY",
    "CCOMS" : { "CCOM" :
          [ { "CCOM_ID" : "1001", "CCOMKND_ID" : "-3515", "CCOM_VALUE" : "33609876543" },
            { "CCOM_ID" : "1002", "CCOMKND_ID" : "-3501", "CCOM_VALUE" : "dave@acme.com" },  ….
          ] }
  },
  "CLOC" : { "CLOC_ID" : "10000201", "CLOC_VERSION" : "1000041", "CLOC_NAME" : "ACME" }
}

The major fields are:

Fields	Description
CPSN_LAST_NAME	Last name of the contact
CPSN_FIRST_NAME	First name of the contact
CLOC_NAME	Name of the organization
CLOC_CITY	City of the organization/contact

The following table summarizes the different error codes:

Errors	Description
ERR_BLANK_CPSN_ID	The Contact cannot be located from the Phone number supplied.

/Phone_BatchCPSN (R2T -> Delos)

If the PhoneApp detects contact informations are not up-to-date between local version and Delos database,
PhoneApp will POST a request to Delos instance with the content of the body as a JSON document:

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_BatchCPSN

{ "CPSN" : 
  { "CPSN_ID" : "1000202", "CPSN_VERSION" : "1000042"
    "CPSN_FIRST_NAME" : "Dave", "CPSN_LAST_NAME" : "THEUGLY",
    "CCOMS" : { "CCOM" :
          [ { "CCOM_ID" : "1001", "CCOMKND_ID" : "-3515", "CCOM_VALUE" : "33609876543" },
            { "CCOM_ID" : "1002", "CCOMKND_ID" : "-3501", "CCOM_VALUE" : "dave@acme.com" },  ….
          ] }
  },
  "CLOC" : { "CLOC_ID" : "10000201", "CLOC_VERSION" : "1000041", "CLOC_NAME" : "ACME" }
}

The major fields are:

Fields	Description
CPSN_ID	The Contact ID, this is the value WMSGPSN_PSN from the WMSG.
CPSN_LAST_NAME	Last name of the contact
CPSN_FIRST_NAME	First name of the contact
CLOC_ID	ID of the organization
CLOC_NAME	Name of the organization
CLOC_CITY	City of the organization/contact

The following table summarizes the different error codes:

Errors	Description
ERR_BLANK_CPSN_ID	The parameter CPSN_ID is missing

/Phone_InsertRecord (R2T -> Delos)

To upload the recorded conversation as a wav file, the HTTP request will use POST multipart/form-data.

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_InsertRecord?
CPSN_ID=1001002
WMSG_DATE=2023-12-26 12:34:56
WMSG_DURATION=375
File=Conversation.wav (or Audio1.wav)

Fields	Description
CPSN_ID	Contact ID retrieved from the response of one of the previous request Phone_GetCPSN or Phone_BatchCPSN
WMSG_DATE	2023-12-26 12:34:56 OR needs to be formatted in the regional settings of the user. en: m/d/yyyy fr: dd/mm/yyyy If blank, the current date time of the server will be used.
WMSG_DURATION	Duration of the recording in seconds example: 375 means 6'15"
WMSG_INFO	Description of the recording.
EOF	NEW AS OF 2025-06-26. By default if EOF is not transmitted it will be considered the file is the complete audio recording. EOF=1 the file uploaded will also be considered as a complete audio recording. EOF=0 will explicitely mean that the file content uploaded is the first minute, and subsequent parts will be concatenated by invoking /Phone_ConcatRecord with the WMSG_ID returned by the action /Phone_InsertRecord.
File	Filename should include a valid extension: .wav File format should be WAV as PCM not compressed with proper header. As of 2025-06-26 if the file part is partial audio recordingm the filename should be formatted as Audio1.wav. Subsequent file parts will be Audio2.wav, Audio3.wav, ... uploaded using /Phone_ConcatRecord

Delos instance will respond with the fields WMSG_ID of the newly file attached to the the message:

{
  "WMSG_ID": "1001002"
}

/Phone_ConcatRecord (R2T -> Delos)

NEW AS OF 2025-06-26: This feature is used while the audio recording is in progress to upload every minute or so, partial WAV file.
The HTTP request will use POST multipart/form-data.

http://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_ConcatRecord?
WMSG_ID=1001002
File=Audio2.wav

Fields	Description
WMSG_ID	The value returned by the action /Phone_insertRecord
EOF	Optional for the parts from Audio2.wav upto the part before the last one. EOF=1 is mandatory for the last part of the audio recording.
File	File format should be WAV as PCM not compressed with proper header. Filename should include a valid extension ".wav". Filename should be formatted with the number of the part: Audio2.wav, Audio3.wav considering that the first part Audio1.wav has been uploaded during the initial action /Phone_insertRecord

Delos instance will respond with Status=OK

{
  "Status": "OK"
}

Delos will queue in async a request to transcribe this Wav file. Upon response, Delos will concatenate the transcription for this UniqueID, until Phone_InsertWMSG will be invoked at the end of the call.

/Phone_Transcript (R2T -> Delos)

R2T may trigger the transcript of the conversation by invoking a HTTP GET request:

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_Transcript?WMSG_ID=1001002

Fields	Description
WMSG_ID	Message ID of the phone call.
CPSN_TRANSCRIPT	Optional. 0=Never. 1=Always. 2=Prompt. CPSN_TRANSCRIPT=1 is a hint for Delos instance to store this parameter associated with this contact, in order for future conversation being uploaded to start transcript without prompting the user.

Delos instance will respond with the Status Queued, meaning the request has been forwarded to S2T server:

{ "Status" : "Queued" }

Note: PhoneApp MAY invoke Phone_UpdateCPSN_TRANSCRIPT to store for a contact (CPSN_ID) the behavior on new phone call recording. 0=Never. 1=Always. 2=Prompt.
https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_UpdateCPSN_TRANSCRIPT?CPSN_ID=1000202&CPSN_TRANSCRIPT=0

Workflow to transcribe a wav file

The workflow to transcribe a single WAV file, implies 3 requests:
1. Delos instance will invoke S2T /TranscribeFile to request the transcription of a WAV file by submitting a DownloadURL and a CallbackURL, including a custom prompt, language and model to use for the transcription.
2. When ready to process the transcription, S2T server will download the WAV file (GET) from the Delos instance using the parameter DownloadURL
3. Once the transcription of the WAV file has been completed by S2T, the server S2T POST the transcription to Delos instance to the CallbackURL parameter received from the TranscribeFile request

/TranscribeFile (Delos -> S2T)

Delos instance will use the initial transcript to extract main subjects of the discussion and split both streams of the WAV as tracks.
Delos will send a HTTP POST request to S2T server to initiate the transcription of a custom file URL, with a custom prompt as an initial context,

http://s2t.buzzee.tel/TranscribeFile?
XMLC_UserID=1000001
XMLC_Domain=fr01.buzzee.tel
XMLC_Host=fr01a.buzzee.tel
Language=en (new as of 2024-05-28)
Model=whisper-large
Prompt=This is the record of a contact for a job interview (new as of 2024-05-28)
Summary=1
DownloadURL=http://fr01a.buzzee.tel/JSON/DownloadFile?XMLC_UserID=100001&WMSG_ID=10010027&TrackID=1&Stream=in&Start=0&stop=90
CallbackURL=http://fr01a.buzzee.tel/JSON/PhoneUpdateTrack?XMLC_UserID=100001&WMSG_ID=10010027&TrackID=1&Stream=in&Start=0&stop=90

Fields	Description
XMLC_UserID	Can be used to trace request per user.
XMLC_Host	Can be used to trace request per server.
Language	Hint of the language probably used by the participants in the recorded file. Example: fr (en, es, ...) if the Language is blank or is not a registered Language, S2T will use its default value or simply leave it blank.
Model	Possible values: whisper-large if the Model is blank or the Model is not registered, S2T will use its default backend.
Prompt	Custom text containing user and contact names, possibly summary of previous conversations. In case of TrackID being present, it may contain the initial transcript of the conversation, reduced to the duration of the track.
Summary	Optional parameter. If Summary=1 the respons should contain the summary of the transcription
DownloadURL	URL to download the WAV file (PCM not compressed). Example: http://fr01a.buzzee.tel/JSON/DownloadFile?XMLC_UserID=100001&WMSG_ID=10010027&TrackID=1&Stream=in&Start=0&stop=90
CallbackURL	URL to POST the transcription once completed Example: http://fr01a.buzzee.tel/JSON/PhoneUpdateWMSG?XMLC_UserID=100001&WMSG_ID=10010027 or http://fr01a.buzzee.tel/JSON/PhoneUpdateTrack?XMLC_UserID=100001&WMSG_ID=10010027&TrackID=1&Stream=in&Start=0&stop=90

S2T server will respond with Status=Queued :

{
  "Status": "Queued"
}

/Phone_DownloadFile (S2T -> Delos)

The SpeechToText server will invoke the DownloadURL provided as a parameter during the request to TranscribeFile,
S2T will download the file by sending a HTTP GET request to Delos instance:

http://fr01a.buzzee.tel/JSON/Phone_DownloadFile?XMLC_UserID=1000001&WMSG_ID=1001001&Stream=in&TrackID=1&Start=0&Stop=30

All the parameters of DownloadURL are provided when invoking /TranscribeFile.

Fields	Description
XMLC_UserID	The UserID parameter provided with action /TranscribeFile
WMSG_ID	The Message ID for the conversation to transcribe.
Stream	Optional. This parameter may be blank, 'all', 'in' or 'out'.
TrackID	Optional. This parameter is used when splitting a larger conversation with multiple subjects in chunks (tracks). This parameter need to be transmitted back when S2T invokes Phone_UpdateTrack. Important: If this parameter is not present, then S2T MUST invoke Phone_UpdateWMSG. This parameter is transmitted when invoking TranscribeFile without WAV File being uploaded.
Start/Stop	Optional. This parameter is used to split the WAV file between Start and Stop expressed in seconds. Example: Start=90&Stop=180 means to extract from 1'30" from the beginning up to 3'00" This parameter is transmitted when invoking TranscribeFile without WAV File being uploaded.

The following table summarizes the different error codes:

Errors	Description
ERR_BLANK_WMSG_ID	The parameter WMSG_ID is missing
ERR_FILE_NOT_FOUND	There is no file attached to this message

/Phone_UpdateWSMG (S2T -> Delos)

Once the transcript has been completed by S2T, the S2T server will invoke the CallbackURL (provided as a parameter in request to TranscribeFile), to update the transcript of the message (optionally the summary to be discussed)

The HTTP request will be POST:

http://fr01a.buzzee.tel/JSON/Phone_UpdateWMSG?XMLC_UserID=1000001&WMSG_ID=1001002

The content of the request will contain the transcript formatted as JSON

WMSG_SUMMARY={"summary": "John doe, and 33609876543 are discussing improving...", "title": "Performance optimization",
WMSG_TRANSCRIPT=JSON document containing the details of the sequences

Most of the parameters of the CallbackURL are provided when invoking /TranscribeFile.

Fields	Description
XMLC_UserID	The UserID
WMSG_ID	The MessageID parameter provided
Stream	Optional. This parameter may be blank, 'all', 'in' or 'out'.
TrackID	New parameter as of 2024-05-28. Optional. This parameter is used when splitting a larger conversation in chunks (tracks).
Start/Stop	Optional. These parameters are used to split the WAV file between Start and Stop expressed in seconds. Example: Start=90&Stop=180 means to extract from 1'30" from the beginning up to 3'00"
WMSG_SUMMARY	{"summary": "John doe, and 33609876543 are discussing improving...", "title": "Performance optimization", "usage": { "prompt_tokens": 8, "completion_tokens": 16, "total_tokens": 24 }, "importants": [ {"type": "RDV", "date": "18 to 24 march", "description": "Travel to NY"} ] }
WMSG_TRANSCRIPT	JSON transcript of the conversation formatted as follow: [ { "user": "1000002", "start":1600, "stop":1900, "message":"Hello Dave" }, { "user": "1000202", "start": 2720, "stop":3440, "message":"Hi Bob" }, ... ]

Delos instance will respond with the fields WMSG_ID:

{
  "WMSG_ID": "1001002"
}

/Phone_UpdateTrack (S2T -> Delos)

Once the transcript has been completed by S2T, the S2T server will invoke the CallbackURL of Delos instance,
this CallbackURL has been provided during the request to /TranscribeFile,
The HTTP request will be POST

http://fr01a.buzzee.tel/JSON/Phone_UpdateTrack?XMLC_UserID=1000001&WMSG_ID=1001002&TrackID=1&Stream=in&Start=0&Stop=90

The content of the request will contain the transcript formatted as JSON

WMSG_SUMMARY={"summary": "John doe, and 33609876543 are discussing improving...", "title": "Performance optimization",
WMSG_TRANSCRIPT=JSON document containing the details of the sequences
TrackID=1

Most of the parameters of the CallbackURL are provided when invoking /TranscribeFile.

Fields	Description
XMLC_UserID	The XMLC_UserID parameter provided in /TranscribeFile
WMSG_ID	The WMSG_ID parameter provided in /TranscribeFile
TrackID	The TrackID parameter provided in /TranscribeFile
Stream	The Stream parameter provided in /TranscribeFile the values may be blank, 'all, 'in' or 'out'
Start/Stop	Optional. These parameters are used to split the WAV file between Start and Stop expressed in seconds. Example: Start=90&Stop=180 means to extract from 1'30" from the beginning up to 3'00"
WMSG_SUMMARY	{"summary": "John doe, and 33609876543 are discussing improving...", "title": "Performance optimization", "usage": { "prompt_tokens": 8, "completion_tokens": 16, "total_tokens": 24 }, "importants": [ {"type": "RDV", "date": "18 to 24 march", "description": "Travel to NY"} ] }
WMSG_TRANSCRIPT	Raw transcript of the conversation formatted as follow: { [ { "user":"1000002","start":1000,"stop":1200,"message":"Hello Bob. How are you?"}, { "user":"1000202","start":2000,"stop":2200,"message":"Great, and you?"}, ... ] }

Delos instance will respond with a Status OK

{
  "Status": "OK"
}

/sendNotification [type=transcript_ready] (Delos -> PhoneNotifications)

Once the transcript has been completed by S2T server, Delos instance sends a HTTP GET request to the PhoneNotifications server, with the following parameters:

https://phoneappnotifications.buzzee.tel/sendNotification?
XMLC_UserID=1000001&
XMLC_Credential=1xx1abcdef&
Host=fr01a.buzzee.tel&
Domain=fr01.buzzee.tel&
type=transcript_ready&
Application=Record&
Token=abcdef0123456789&
WMSG_ID=1001002

Fields	Description
XMLC_UserID	Delos internal user ID
XMLC_Credential	Credential for the user, it can be used to invoke Delos instance from the PhoneNotifications server in order to invalidate the device registration token of the user http://fr01a.buzzee.tel/JSON/1xx1abcdef/Phone_Register?Application=Record&Token=...
Host	Host of the Delos instance where is located the user of the message. This is the host to be used for the URL to invoke Delos instance.
Domain	Domain of the user XMLC_Domain + XMLC_UserID is used to lookup the connected PhoneApp at PhoneNotifications server
type	transcript_ready in order to have a unique endPoint URL for PhoneNotifications server, the action /sendNotification requires an additional parameter: type=transcript_ready
Application	The target application for this notification. "Record" in the case of the mobile application "R2T"
Token	This is the device registration token which has been stored using /Phone_Register?Application=Record&Token=abcdef0123456789
WMSG_ID	Message ID of the phone call. This ID will be used to request Phone_FormWMSG

/Phone_FormWMSG (PhoneApp -> Delos)

At this stage PhoneApp may retrieve the summary and the transcript by invoking a HTTP GET request

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_FormWMSG?WMSG_ID=1001002

Fields	Description
WMSG_ID	Message ID of the phone call.

Delos instance will respond with a JSON document corresponding to the message:

{ "WMSG" : {
    "WMSG_ID": "1001002",
    "WMSG_DATE" : "26/12/2023 09:00:00",   
    "WMSG_DURATION" : "00:10:00",
    "WMSG_INOUT" : "1",
    "WMSG_MEDIA" : "-3515",
    "WMSG_INFO" : "Once upon a time…",
    "WMSG_SUMMARY" : "Once upon a time…",
    "WMSG_TRANSCRIPT" : { "Sequence" :
      [ { "ID" : "00:00:00.980", "Time" : "09:00:00", From : "1000202", "Message" : "Hello Bob" },
        { "ID" : "00:00:02.140", "Time" : "09:00:02", From : "1000003", "Message" : "Hi Dave" },  ….
      ] }
} }

On real Delos servers the JSON document will return a larger document with a lot more fields, but these are the minimalists to be useful in the context of PhoneApp.

Fields	Description
WMSG_INFO	Summary of the conversation. WMSG_INFO can be edited with custom notes. At this stage WMSG_INFO and WMSG_SUMMARY have the same value.
WMSG_SUMMARY	Summary of the conversation
WMSG_TRANSCRIPT	Dataset containing the multiple "Sequence" of the conversation
Sequence	One line per person speaking
ID	Elapsed seconds and milliseconds since the begining of the call. This value can be used to identify a "Sequence"
Time	Time of the sequence calculated from WMSG_DATE + elapsed time. This value is rounded to the second.
From	The CPSN_ID saying this sequence. It can be either the User or the Contact
Message	The text of the transcript for this sequence

To obtain details of the contact R2T mobile app may request /Phone_GetCPSN?CPSN_ID=1000202

Workflow when opening R2T

When starting the R2T mobile app, it retrieves the list of messages (of type conversation recorded) by sending a Phone_ListWMSG request to Delos, and may request some details per conversation by invoking /Phone_FormWMSG

/Phone_ListWMSG (R2T -> Delos)

R2T may retrieve the list of recent conversations by invoking as HTTP GET request

https://fr01a.buzzee.tel/JSON/1xxx1abcdef/Phone_ListWMSG?WMSGKND_ID=-3528

Delos instance respond with a JSON document corresponding to the list of messages:

{ "WMSGS" : { "WMSG" : [ 
   { "WMSG_ID": "1001002", "WMSG_DATE": "12/12/2023 09:20:00", "WMSG_INOUT" : "2", 
     "WMSG_INFO" : "Custom summary for confcall #2",
     "WMSG_SUMMARY" : "Once upon a time...",
     "WMSGUSR_PSN" : "1000002", "WMSGUSR_FIRST_NAME" : "Bob", "WMSGUSR_LAST_NAME" : "THEGREAT",
     "WMSGPSN_PSN" : "1000202", "WMSGPSN_FIRST_NAME" : "Dave", "WMSGPSN_LAST_NAME" : "THEUGLY" },
   { "WMSG_ID": "1001004", "WMSG_DATE": "12/12/2023 09:00:00", "WMSG_INOUT" : "1",
     "WMSG_INFO" : "Summary ConfCall #1",
     "WMSG_SUMMARY" : "Once upon a time...",
     "WMSGUSR_PSN" : "1000002", "WMSGUSR_FIRST_NAME" : "Bob", "WMSGUSR_LAST_NAME" : "THEGREAT", 
     "WMSGPSN_PSN" : "1000202", "WMSGPSN_FIRST_NAME" : "Dave", "WMSGPSN_LAST_NAME" : "THEUGLY" },
   ...
   ] } 
}

On real Delos servers the JSON document will return a larger document with a lot more fields, but these are the minimalists to be useful in the context of PhoneApp to display the recent calls.