Speech recognition in ASP.NET CORE SpeechToText control

26 Mar 202515 minutes to read

Retrieving transcripts

You can use the transcript property to retrieve the transcribed text from the spoken text. This property allows to display the transcribed text once the speech recognition process is started.

@using Syncfusion.EJ2.Inputs

@{
    string transcript = "Hi, hello! How are you?";
}

<div id='speechtotext-container'>
    <ejs-speechtotext id="speech-to-text" created="onCreated" transcript="@transcript" transcriptChanged="onTranscriptChanged"></ejs-speechtotext>
    <ejs-textarea id="output-textarea" rows="5" cols="50" created="onTextAreaCreated" resizeMode="None" placeholder="Transcribed text will be shown here..."></ejs-textarea>
</div>

<script>
    var textareaObj;
    var speechToTextObj;
    function onCreated() {
        speechToTextObj = ej.base.getComponent(document.getElementById("speech-to-text"), "speech-to-text");
    }
    function onTextAreaCreated() {
        textareaObj = ej.base.getComponent(document.getElementById("output-textarea"), "textarea");
        textareaObj.value = speechToTextObj.transcript;
    }
    function onTranscriptChanged() {        
        textareaObj.value = speechToTextObj.transcript;
    }
</script>

<style>
    #speechtotext-container {
        gap: 20px;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
</style>
public ActionResult Transcript()
{
    return View();
}

Transcript

Setting language

You can use the lang property to specify the language for speech recognition. Setting this property ensures that the recognition engine interprets the spoken words correctly based on the specified locale such as en-US for American English, fr-FR for French, and more.

@using Syncfusion.EJ2.Inputs

<div id='speechtotext-container'>
    <ejs-speechtotext id="speech-to-text" lang="fr-FR" transcriptChanged="onTranscriptChanged"></ejs-speechtotext>
    <ejs-textarea id="output-textarea" rows="5" cols="50" value="" resizeMode="None" placeholder="Transcribed text will be shown here..."></ejs-textarea>
</div>

<script>
    function onTranscriptChanged(args) {
        var textareaObj = ej.base.getComponent(document.getElementById("output-textarea"), "textarea");
        textareaObj.value = args.transcript;
    }
</script>

<style>
    #speechtotext-container {
        gap: 20px;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
</style>
public ActionResult Language()
{
    return View();
}

Language

Allowing interim results

You can use the allowInterimResults property to enable or disable interim results. When set to true, the recognized speech will be displayed in real time as words are spoken. When set to false, only final results will be displayed after recognition is complete. By default, the value is true.

@using Syncfusion.EJ2.Inputs

<div id='speechtotext-container'>
    <ejs-speechtotext id="speech-to-text" allowInterimResults=false transcriptChanged="onTranscriptChanged"></ejs-speechtotext>
    <ejs-textarea id="output-textarea" rows="5" cols="50" value="" resizeMode="None" placeholder="Transcript will be displayed here once speech recognition is complete."></ejs-textarea>
</div>

<script>
    function onTranscriptChanged(args) {
        var textareaObj = ej.base.getComponent(document.getElementById("output-textarea"), "textarea");
        textareaObj.value = args.transcript;
    }
</script>

<style>
    #speechtotext-container {
        gap: 20px;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
</style>
public ActionResult InterimResults()
{
    return View();
}

InterimResults

Managing listening state

You can use the listeningState property to manage the listening state of the control. The possible values are Inactive, Listening and Stopped. By default, the value is Inactive.

Inactive

The control is in idle state with no active speech recognition.

Listening

It is actively listening which captures and transcribes speech with a stop icon and blinking animation.

Stopped

Denotes the speech recognition has ended, and no further speech is being processed.

Below sample demonstrates the usage of listeningState property.

@using Syncfusion.EJ2.Inputs

<div id="container">
    <div id="status-box-container" class="status-box inactive">
        <span>Status: <strong id="status-text">Inactive</strong></span>
    </div>
    <ejs-speechtotext id="speech-to-text" listeningState="Inactive" onStart="function(args){updateListeningState(args.listeningState)}" onStop="function(args){updateListeningState(args.listeningState)}"></ejs-speechtotext>
    <div class="waveform-container">
        <div id="waveform-item" class="waveform" style="display: none;">
            <span></span><span></span><span></span><span></span><span></span>
        </div>
        <p id="instruction-text">Click the button to start listening.</p>
    </div>
</div>

<script>
    function updateListeningState(state) {
        document.getElementById("status-text").innerText = state;

        var statusBox = document.getElementById("status-box-container");
        var waveform = document.getElementById("waveform-item");
        var instructionText = document.getElementById("instruction-text");

        if (state === "Listening") {
            statusBox.className = "status-box listening";
            waveform.style.display = "flex";
            instructionText.innerText = "Listening... Speak now!";
        } else if (state === "Stopped") {
            statusBox.className = "status-box stopped";
            waveform.style.display = "none";
            instructionText.innerText = "Recognition Stopped.";
        } else {
            statusBox.className = "status-box inactive";
            waveform.style.display = "none";
            instructionText.innerText = "Click the button to start listening.";
        }
    }
</script>

<style>
    .waveform-container {
        margin-top: 20px;
        font-weight: bold;
    }

    .waveform {
        display: flex;
        justify-content: center;
        align-items: center;
        height: 40px;
        gap: 5px;
    }

    .waveform span {
        display: block;
        width: 6px;
        height: 20px;
        background: #28a745;
        animation: wave-animation 1.2s infinite ease-in-out;
    }

    .waveform span:nth-child(1) {
        animation-delay: 0s;
    }

    .waveform span:nth-child(2) {
        animation-delay: 0.2s;
    }

    .waveform span:nth-child(3) {
        animation-delay: 0.4s;
    }

    .waveform span:nth-child(4) {
        animation-delay: 0.6s;
    }

    .waveform span:nth-child(5) {
        animation-delay: 0.8s;
    }

    @@keyframes wave-animation {
        0%, 100% {
            height: 10px;
        }
        50% {
            height: 30px;
        }
    }

    #container {
        text-align: center;
        margin: 50px auto;
        max-width: 400px;
        padding: 20px;
        border-radius: 10px;
        box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
        background: #fff;
    }

    .status-box {
        padding: 10px;
        border-radius: 5px;
        margin-bottom: 40px;
        font-weight: bold;
    }

    .status-box.listening {
        background-color: #d1e7dd;
        color: #0f5132;
    }

    .status-box.stopped {
        background-color: #f8d7da;
        color: #842029;
    }

    .status-box.inactive {
        background-color: #e2e3e5;
        color: #6c757d;
    }

    .visual-indicator {
        margin-top: 20px;
    }
</style>
public ActionResult ListeningState()
{
    return View();
}

ListeningState

Show or hide tooltip

You can use the showTooltip property to specify the tooltip text to be displayed on hovering the SpeechToText button. By default, the value is true.

@using Syncfusion.EJ2.Inputs

<div id='speechtotext-container'>
    <ejs-speechtotext id="speech-to-text" showTooltip=false transcriptChanged="onTranscriptChanged"></ejs-speechtotext>
    <ejs-textarea id="output-textarea" rows="5" cols="50" value="" resizeMode="None" placeholder="Transcribed text will be shown here..."></ejs-textarea>
</div>

<script>
    function onTranscriptChanged(args) {
        var textareaObj = ej.base.getComponent(document.getElementById("output-textarea"), "textarea");
        textareaObj.value = args.transcript;
    }
</script>

<style>
    #speechtotext-container {
        gap: 20px;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
</style>
public ActionResult Tooltip()
{
    return View();
}

Tooltip

Setting disabled

You can use the disabled property to disable the SpeechToText, preventing user interaction when set to true. By default, the value is false.

@using Syncfusion.EJ2.Inputs

<div id='speechtotext-container'>
    <ejs-speechtotext id="speech-to-text" disabled=true transcriptChanged="onTranscriptChanged"></ejs-speechtotext>
    <ejs-textarea id="output-textarea" rows="5" cols="50" value="" resizeMode="None" placeholder="Transcribed text will be shown here..."></ejs-textarea>
</div>

<script>
    function onTranscriptChanged(args) {
        var textareaObj = ej.base.getComponent(document.getElementById("output-textarea"), "textarea");
        textareaObj.value = args.transcript;
    }
</script>

<style>
    #speechtotext-container {
        gap: 20px;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
</style>
public ActionResult Disabled()
{
    return View();
}

Disabled

Setting html attributes

You can use the htmlAttributes property to assign custom attributes to the SpeechToText control for the button element.

Error handling

The SpeechToText control handles various errors that may occur during speech recognition. The following table lists the possible errors and their causes:

Error Cause
no-speech The microphone did not detect any speech input.
aborted The speech recognition process was intentionally terminated.
audio-capture The system was unable to detect a microphone device.
not-allowed Access to the microphone was denied by the user or browser settings.
service-not-allowed The current context does not permit the use of the speech recognition service.
network A network issue is preventing the speech recognition service from functioning.
unsupported-browser The browser being used does not support the SpeechRecognition API.
default An unidentified error occurred during the speech recognition process.

Browser support

The SpeechToText control relies on the Speech Recognition API for processing the speech input. Ensure that the browser supports this API before implementation.

Browser Supported versions
Chrome 25+
Edge 79+
Firefox Not Supported
Safari 12+
Opera 30+