Speech Recognition Python – Converting Speech to Text

Are you surprised about how the modern devices that are non-living things listen your voice, not only this but they responds too. Yes,Its looks like a fantasy, but now-a-days technology are doing the surprising things that were not possible in past. So guys, welcome to my new tutorial Speech Recognition Python.This is a very awesome tutorial having lots of interesting stuffs. In this tutorial we will learn about concept of speech recognition and it’s implementation in python. So let’s gets started.

As the technologies are growing more rapidly and new features are emerging in this way speech recognition is one of them. Speech recognition is a technology that have evolved exponentially over the past few years. Speech recognition is one of the popular and best feature in computer world. It have numerous applications that can boost convenience, enhance security, help law enforcement efforts, that are the few examples. Let’s start understanding the concept of speech recognition, it’s working and  applications.

What is Speech Recognition?

  • Speech Recognition is a process in which a computer or device record the speech of humans and convert it into text format.
  • It is also known as Automatic Speech Recognition(ASR), computer speech recognition or Speech To Text (STT).
  • Linguistics, computer science, and electrical engineering are some fields that are associated with Speech Recognition.

speech recognition python

Working Nature of Speech Recognition

Now we will discuss how it actually works?

speech recognition python
Speech Recognition Python

The above pictures shows the working principle of Speech Recognition very clearly.Now let’s understand the concept behind it.

It is based on the algorithm of  acoustic and language modeling.So now the question is -what is acoustic and language modeling?

  • Acoustic modeling represents the relationship between linguistic units of speech and audio signals.
  • Language modeling matches sounds with word sequences to help distinguish between words that sound similar.

Any speech recognition program is evaluated using two factors:

  • Accuracy (percentage error in converting spoken words to digital data).
  • Speed (extent to which the program can keep up with a human speaker).

Applications

The most frequent applications of speech recognition are following:

  • In-car systems.
  • Health care – Medical documentation and Therapeutic use
  • Military – High performance fighter aircraft ,Helicopters,Training air traffic controllers.
  • Telephony and other domains
  • Usage in Education and Daily life
  • People with disabilities.

Speech Recognition Python

Have you ever wondered how to add speech recognition to your Python project? If so, then keep reading! It’s easier than you might think.

Implementing Speech Recognition in Python is very easy and simple. Here we will be using two libraries which are Speech Recognition and PyAudio.

Creating new project

Create a new project and name it as SpeechRecognitionExample (Though the name doesn’t matter at all it can be anything). And then create a python file inside the project. I hope you already know about creating new project in python.

Installing Libraries

we have to install two library for implementing speech recognition.

Installing SpeechRecognition

  • Go to terminal and type

SpeechRecognition is a library that helps in performing speech  recognition in python. It support for several engines and APIs, online and offline e.g. Google Cloud Speech API, Microsoft Bing Voice Recognition, IBM Speech to Text etc.

Installing PyAudio

  • Go to terminal and type

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple Mac OS X / macOS.

Performing Speech Recognition

Now let’s jump into the coding part.

So this is the code for speech recognition in python.As you are seeing, it is quite simple and easy.

Explanation of code

So now we will start understanding the code line-by-line.

  • first of all we will import speech_recognition as sr.
  • Notice that we have speech_recognition in such  format  whereas earlier we have installed it in this way SpeechRecognition , so you need to have a look around the cases because this is case sensitive.
  • Now we have used as notation  because writing speech_recognition whole every time is not a good way.
  • Now we have to initialize r = sr.Recognizer() ,this will work as a recognizer to recognize our voice.
  • So, with sr.Microphone() as source: which means that we are initialising our source to sr.Microphone ,we can also use some audio files to convert into text but in this tutorial i am using Microphone voice.
  • Next we will print a simple statement that recommend the user to speak anything.
  • Now we have to use r.listen(source) command and we have to listen the source.So, it will listen to the source and store it in the audio.
  • It may happen some time the audio is not clear and you might not get it correctly ,so we can put it inside the try and except block .
  • So inside the try block, our text will be text = r.recognize_google(audio) ,now we have various options like recognize_bing(),recognize_google_cloud(),recognize_ibm(), etc.But for this one i am using recognize_google().And lastly we have to pass our audio.
  • And this will convert our audio into text.
  • Now we just have to print print(“You said : {}”.format(text)) ,this will print whatever you have said.
  • In the except block we can just write  print(“Sorry could not recognize your voice”) ,this will message you if your voice is not recorded clearly.

Output

The output of the above code will be as below.

Speech Recognition Python
Speech Recognition Python

So, its working fine.Obviously You must have enjoyed it, yeah am i right or not?

If you are working on a desktop that do not have a mic you can try some android apps like Wo Mic, from play store to use your smartphone as a mic. And if you’ve got a real mic or headphones with mic then you can try them too.

Finally Speech Recognition Python Tutorial completed successfully. So friends If you have any question, then leave your comments. If you found this tutorial helpful, then please SHARE it with your friends. Thank You 🙂

25 thoughts on “Speech Recognition Python – Converting Speech to Text”

  1. Errors on
    pip install pyaudio

    [1]
    Easily install SpeechRecognition 3.8.1 with
    !pip install SpeechRecognition
    the leading ! since I am within a cell in Jupyter Notebook on Microsoft Azure (http://www.notebooks.azure.com)

    [2]
    Errors on
    !pip install pyaudio
    Looks like it gcc build failed since there is no portaudio.h
    Any hints about pyaudio?
    DETAILS:
    Collecting pyaudio
    Downloading https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
    Building wheels for collected packages: pyaudio
    Running setup.py bdist_wheel for pyaudio … error
    Complete output from command /home/nbuser/anaconda3_501/bin/python -u -c “import setuptools, tokenize;__file__=’/tmp/pip-install-hgcg4y3h/pyaudio/setup.py’;f=getattr(tokenize, ‘open’, open)(__file__);code=f.read().replace(‘\r\n’, ‘\n’);f.close();exec(compile(code, __file__, ‘exec’))” bdist_wheel -d /tmp/pip-wheel-xnk_drv5 –python-tag cp36:
    running bdist_wheel
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    copying src/pyaudio.py -> build/lib.linux-x86_64-3.6
    running build_ext
    building ‘_portaudio’ extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/src
    gcc -pthread -B /home/nbuser/anaconda3_501/compiler_compat -Wl,–sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/nbuser/anaconda3_501/include/python3.6m -c src/_portaudiomodule.c -o build/temp.linux-x86_64-3.6/src/_portaudiomodule.o
    src/_portaudiomodule.c:29:23: fatal error: portaudio.h: No such file or directory
    compilation terminated.
    error: command ‘gcc’ failed with exit status 1 <<<<<<<<<<<<<<<<<<<< build/lib.linux-x86_64-3.6
    running build_ext
    building ‘_portaudio’ extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/src
    gcc -pthread -B /home/nbuser/anaconda3_501/compiler_compat -Wl,–sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/nbuser/anaconda3_501/include/python3.6m -c src/_portaudiomodule.c -o build/temp.linux-x86_64-3.6/src/_portaudiomodule.o
    src/_portaudiomodule.c:29:23: fatal error: portaudio.h: No such file or directory
    compilation terminated.
    error: command ‘gcc’ failed with exit status 1

    —————————————-
    Command “/home/nbuser/anaconda3_501/bin/python -u -c “import setuptools, tokenize;__file__=’/tmp/pip-install-hgcg4y3h/pyaudio/setup.py’;f=getattr(tokenize, ‘open’, open)(__file__);code=f.read().replace(‘\r\n’, ‘\n’);f.close();exec(compile(code, __file__, ‘exec’))” install –record /tmp/pip-record-ftuiec6_/install-record.txt –single-version-externally-managed –compile” failed with error code 1 in /tmp/pip-install-hgcg4y3h/pyaudio/

    Reply
  2. This is on some Microsoft server that hosts Microsoft Azure and Jupyter Notebooks.

    I am using using Chrome browser on Windows 10, but that should not matter.

    I login at https://notebooks.azure.com/

    In a Jupyter Notebook, the 2 Python commands:

    [1]

    os.path

    returns

    [2]

    os.name

    returns

    ‘posix’

    Hope that helps.

    Thanks.

    Edward Bujak

    Reply
  3. Thanks for the post, it is very helpful. I tried and it worked fine for me.
    But it converted only the first 4-5s of the audio file. (1 short sentence)
    What if I want to convert longer audio files? Do you have any recommendations?

    Thanks in advance.

    Reply
  4. hello sir thank you so much i tried with this code its working fine…i have one query that with this code its taking some time to give response(text) back .can i add loop in this code if(can u tell me the code) or any other methods how best i can improve the speed .please help f=me for this sir….WAITING FOR RESPONSE
    Thanks in advance.

    Reply
    • First of all thanks for your comment.Yes it takes some time to response.It may be depends upon your internet speed or speaker’s quality.

      Reply
  5. If voice is unclear to read , how can it eliminate around noisy things to get distinguished voice for returning text. Do you have any way?

    Reply
  6. hello sir! I run the code and it show no error but when i try to say something it can’t hear me, I try this in my laptop vaio sony core i3.
    It can’t record my voice, I am really in a trouble please help me. to solve this shit..
    Thanks

    Reply
  7. Hi i am unable to install pyaudio i am getting the following error:

    ERROR: Command “‘c:\users\ganesh.marella\appdata\local\programs\python\python37\python.exe’ -u -c ‘import setuptools, tokenize;__file__='”‘”‘C:\\Users\\GANESH~1.MAR\\AppData\\Local\\Temp\\pip-install-afndru1v\\pyaudio\\setup.py'”‘”‘;f=getattr(tokenize, ‘”‘”‘open'”‘”‘, open)(__file__);code=f.read().replace(‘”‘”‘\r\n'”‘”‘, ‘”‘”‘\n'”‘”‘);f.close();exec(compile(code, __file__, ‘”‘”‘exec'”‘”‘))’ install –record ‘C:\Users\GANESH~1.MAR\AppData\Local\Temp\pip-record-lqg1dul4\install-record.txt’ –single-version-externally-managed –compile” failed with error code 1 in C:\Users\GANESH~1.MAR\AppData\Local\Temp\pip-install-afndru1v\pyaudio\

    Please help me with this.

    Reply
    • Because you have not installed pip on your system. Search on youtube how to install pip according to your system type. Thanks

      Reply
  8. It is easy to write “import SpeechRecognition”, but it only works if you have your system set up to provide it.
    The hard part is to tell people precisely how to collect the libraries on all those platforms. Its not just “pip install SpeechRecognition”.

    Reply

Leave a Comment