Implement a speech-to-text bot using wit.ai API
wit.ai provides APIs for natural language processing.
It takes a human voice or text and convert it to a structured information for further use.
With this API and Telegram’s bot API, I’ll show you the whole way of building a speech-to-text bot which will receive your voice and response with the converted text message.
0. Prerequisite
All codes in this post will be written in golang, so you may have to installi it.
Also, due to the difference of voice file formats in Telegram bot API and wit.ai API, you’ll need to install ffmpeg for converting .ogg files to .mp3.
1. Install needed libraries
I will use my go libraries for Telegram bot API and wit.ai API:
2. Generate your Telegram and wit.ai API tokens
You can create your own Telegram bot with this guide, and a new wit.ai application on this page.
Generated tokens will be used later in this post.
3. Get a sample code
Here’s the gist for the sample:
Download it,
and edit variables that are commented as // XXX - Edit this value to yours.
It will not run as expected if you don’t put your TelegramApiToken and WitaiApiToken values correctly.
4. Build and run
Run the edited code:
If nothing goes wrong, you’ll be able to start chat with your bot and send your voice:
Converted text will be returned as you see.
5. Step by step explanation of the sample code
a. Bot receives a new message from you
We need to convert voice to text, so check if received message has a voice in it.
b. Bot converts received voice file into monaural .mp3 format
If a voice exists in the message, convert it with ffmpeg.
Telegram bot API transfers voices in .ogg format and wit.ai API doesn’t receive .ogg,
so we need to convert it to monaural .mp3 format.
c. Bot sends a request to wit.ai’s speech API and reads the result
Send a request to speech API and return the result.
d. Bot sends the converted text back to you
When the result is successfully returned, send it back through Telegram bot API.
Wrap-up
wit.ai is not only for speech-to-text but also for a lot more complicated tasks like building a personal assistant using its powerful natural language processing.
It is just a simple example, so I hope whoever reads this post would build more useful things from it :-)