Реализация распознавания речи.
detect_speech <mod_name> <gram_name> <gram_path> [<addr>]
detect_speech grammar <gram_name> [<path>]
detect_speech grammaron <gram_name>
detect_speech grammaroff <gram_name>
detect_speech grammarsalloff
detect_speech nogrammar <gram_name>
detect_speech param <name> <value>
detect_speech pause
detect_speech resume
detect_speech start_input_timers
detect_speech stop
Нажмите, чтобы отобразить
-- Не расзпознано 1
local sounds_dir = "/usr/local/freeswitch/sounds/ivr3/";
local ivr_dir = "/usr/local/freeswitch/scripts/ivr3/";
dofile("/usr/local/freeswitch/scripts/detect.lua");
trans = {
["NOT_RECOG_1"] = ivr_dir.."divr_notrecog_1.lua",
["NOT_RECOG_2"] = ivr_dir.."divr_notrecog_2.lua",
--no
["нет"] = ivr_dir .. "divr_no_definitely_1.lua",
--not recog
["нетда"] = ivr_dir.."divr_notrecog_1.lua",
--yes
["да"] = ivr_dir.."divr_yes_info.lua",
}
local message1 = sounds_dir.."notrecog_1.wav";
-- Создадим пустую таблицу для записи результатов.
results = {};
session:setInputCallback("onInput");
session:sleep(200);
-- Воспроизведем приветственное сообщение.
session:streamFile(message1);
-- Активируем распознавание и укажем grammar.
session:execute("detect_speech", "pocketsphinx yesno yesno");
while (session:ready() == true) do
session:sleep(3000);
session:sleep(3000);
if ( results.text ~= nil ) then
session:execute("detect_speech", "stop");
freeswitch.consoleLog("info",dump(results));
--results.text, results.score
score = tonumber(results.score);
ftext = results.text:gsub("%s+", "");
if (score > 0) then
results = {};
if (tableHasKey(trans,ftext) ~= false) then
session:execute("lua", trans[ftext]);
else
session:execute("lua", trans["NOT_RECOG_1"]);
end
else
results = {};
session:execute("lua", trans["NOT_RECOG_1"]);
end
else
results = {};
session:execute("detect_speech", "resume");
session:execute("lua", trans["NOT_RECOG_1"]);
end
end
Старт распознавания и назначение grammar в одном событии:
SendMsg e2d1c628-f32c-4497-b813-7474ce406317
call-command: execute
execute-app-name: detect_speech
execute-app-arg:pocketsphinx yesno yesno
You should see DETECTED_SPEECH events with «Speech-Type: begin-speaking» when the recognizer notices the start of speech. For example: (using «plain» events)
Нажмите, чтобы отобразить
Content-Length: 1605
Content-Type: text/event-plain
Event-Name: DETECTED_SPEECH
Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d
FreeSWITCH-Hostname: vm1
FreeSWITCH-IPv4: 192.168.1.241
FreeSWITCH-IPv6: %3A%3A1
Event-Date-Local: 2010-03-09%2010%3A39%3A48
Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A48%20GMT
Event-Date-Timestamp: 1268149188380725
Event-Calling-File: switch_ivr_async.c
Event-Calling-Function: speech_thread
Event-Calling-Line-Number: 2430
Speech-Type: begin-speaking
Channel-State: CS_EXECUTE
Channel-State-Number: 4
Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
Call-Direction: outbound
Presence-Call-Direction: outbound
Channel-Presence-ID: 1000%40192.168.1.241
Answer-State: answered
Channel-Read-Codec-Name: PCMU
Channel-Read-Codec-Rate: 8000
Channel-Write-Codec-Name: PCMU
Channel-Write-Codec-Rate: 8000
Caller-Username: 1001
Caller-Dialplan: inline
Caller-Caller-ID-Name: Extension%201001
Caller-Caller-ID-Number: 1001
Caller-Network-Addr: 192.168.1.104
Caller-ANI: 1001
Caller-Destination-Number: 1000
Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
Caller-Source: mod_sofia
Caller-Context: default
Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
Caller-Profile-Index: 2
Caller-Profile-Created-Time: 1268149185069331
Caller-Channel-Created-Time: 1268149168974894
Caller-Channel-Answered-Time: 1268149169744923
Caller-Channel-Progress-Time: 1268149169164940
Caller-Channel-Progress-Media-Time: 0
Caller-Channel-Hangup-Time: 0
Caller-Channel-Transfer-Time: 0
Caller-Screen-Bit: true
Caller-Privacy-Hide-Name: false
Caller-Privacy-Hide-Number: false
If recognition is successful, you should also see a DETECTED_SPEECH event with «Speech-Type: detected-speech» and some XML describing what was detected. For example:
Нажмите, чтобы отобразить
Content-Length: 1791
Content-Type: text/event-plain
Event-Name: DETECTED_SPEECH
Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d
FreeSWITCH-Hostname: vm1
FreeSWITCH-IPv4: 192.168.1.241
FreeSWITCH-IPv6: %3A%3A1
Event-Date-Local: 2010-03-09%2010%3A39%3A49
Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A49%20GMT
Event-Date-Timestamp: 1268149189731224
Event-Calling-File: switch_ivr_async.c
Event-Calling-Function: speech_thread
Event-Calling-Line-Number: 2430
Speech-Type: detected-speech
Channel-State: CS_EXECUTE
Channel-State-Number: 4
Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
Call-Direction: outbound
Presence-Call-Direction: outbound
Channel-Presence-ID: 1000%40192.168.1.241
Answer-State: answered
Channel-Read-Codec-Name: PCMU
Channel-Read-Codec-Rate: 8000
Channel-Write-Codec-Name: PCMU
Channel-Write-Codec-Rate: 8000
Caller-Username: 1001
Caller-Dialplan: inline
Caller-Caller-ID-Name: Extension%201001
Caller-Caller-ID-Number: 1001
Caller-Network-Addr: 192.168.1.104
Caller-ANI: 1001
Caller-Destination-Number: 1000
Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
Caller-Source: mod_sofia
Caller-Context: default
Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
Caller-Profile-Index: 2
Caller-Profile-Created-Time: 1268149185069331
Caller-Channel-Created-Time: 1268149168974894
Caller-Channel-Answered-Time: 1268149169744923
Caller-Channel-Progress-Time: 1268149169164940
Caller-Channel-Progress-Media-Time: 0
Caller-Channel-Hangup-Time: 0
Caller-Channel-Transfer-Time: 0
Caller-Screen-Bit: true
Caller-Privacy-Hide-Name: false
Caller-Privacy-Hide-Number: false
Content-Length: 165
<?xml version="1.0"?>
<result grammar="holdr">
<interpretation grammar="yesno" confidence="98">
<input mode="speech">YES</input>
</interpretation>
</result>
Note: The XML body at the end there with our result has a Content-Length of 165. That is included as part of the overall count of 1791 at the beginning. |
It is common to play prompts while detecting speech. Making a change like this to the media will pause the recognizer. For example, if you start to play a file:
SendMsg ad375c14-ba41-46c8-b800-4aa2ef295bba
call-command: execute
execute-app-name: playback
execute-app-arg: say-yes-or-no.wav
you should immediately resume the recognizer:
SendMsg e2d1c628-f32c-4497-b813-7474ce406317
call-command: execute
execute-app-name: detect_speech
execute-app-arg: resume
Recognition will happen while the file is playing. You will need to have divert_event on to receive the ASR events while the file is being played.
Each start of the recognizer detects only one phrase so if you want a somewhat continuous recognition, you will need to resume the recognizer after each successful recognition as well.
When you are done, you'll want to stop the recognizer to save precious CPU cycles:
SendMsg e2d1c628-f32c-4497-b813-7474ce406317
call-command: execute
execute-app-name: detect_speech
execute-app-arg: stop