How to build a text-to-speech Discord voice bot in Python
In this post, you'll learn to use the Uberduck API to build a Discord bot that can send text-to-speech messages in voice channels.
In this post I'm going to show you how to build your very own text-to-speech Discord bot using Python, the nextcord library, and the Uberduck API. Once we're finished you'll have your own Discord bot that can join Discord voice channels and send text-to-speech messages in those channels at your command. You can even clone your own voice and use your own text-to-speech in Discord with our private voice clone plan.
Here's a demo of what we'll have at the end:
You can find the full implementation of the bot on our GitHub.
Create a bot in the Discord Developer Portal
Before we dive into coding, we need to do some Discord setup. Head to the Discord Developer Portal and create a new application.
I'll call my application "Uberduck TTS Demo".
Once you create the Discord application, create a Discord bot user.
To let other users add your bot to their Discord servers, go to the OAuth2 section, select "In-app Authorization" as the Authorization Method, select the bot and applications.commands scopes, and check the Send Messages, Connect, and Speak scopes.
Then, to add the bot to your server, go to the OAuth2 URL Generator, select the same scopes and commands, and copy and paste the generated URL into your browser.
You should see something like this:
Go ahead and select one of your servers and click Continue to add the bot. Now the bot should be in your server and it's time to strap in and write some code!
Implement the bot code
Remember, you can find the full implementation in this GitHub repository.
The recommended way to implement Discord interactions is with slash commands—commands that begin with the / character and pop up a completion menu inside Discord. Our bot will implement three slash commands:
- /vc-join invites the bot to join a voice channel.
- /vc-kick kicks the bot out of a voice channel.
- /vc-quack generates text-to-speech audio from a specific voice and plays it in the voice channel.
You'll also need a Python environment with nextcord and a few other Python packages installed—follow the instructions in the README to get set up.
Create the bot and set up commands
We'll create the Discord bot and implement slash commands using the nextcord library.
You can run your code now and try out the /vc-join command to test that it prints out the message I'm not implemented yet!
Handle leaving and joining voice channels
Let's implement the bodies of the /vc-join and /vc-kick commands. We'll use a Python dict to keep track of voice channel clients and the time that the client was last used. (We'll use that last used time later to clean up idle voice clients.)
Now we can implement /vc-join and /vc-kick, including some edge case handling around switching from one voice channel to another and printing error messages if there are no voice channels to join or kick from.
Generate text-to-speech and play audio over the channel
Now that we have /vc-kick and /vc-join, let's implement /vc-quack. First, write code to query the Uberduck API. You'll need to generate an API key and secret, which you can do on your Uberduck account page.
The Uberduck API is free to use, but the free API queues requests alongside all other free users of our site. If you want API requests to be faster, you can upgrade to the Creator plan or the Clone plan (which gets you your own custom voice clone of yourself).
With the Uberduck API call in place, now we can implement /vc-quack.
Alright, now we have a working Discord bot! Run your bot script, and you should be able to join a voice channel, run /vc-join to invite the bot, generate speech by running /vc-quack voice:zwf speech:I like working on Uberduck, and then kick out the bot with /vc-kick.
Clean up idle voice clients
We just have one more step before our bot is ready for prime time. We don't want an idle bot to hang out unused in voice chat forever, so we'll build a mechanism to disconnect bots from chat when they're not being used. This is where the last_used timestamp stored along each voice client comes into play. Every five seconds, we'll loop over all the voice clients and disconnect any that haven't been used in 10 minutes.
Now we can modify our script run the termination script concurrently with the bot using asyncio.gather.
That's it! We now have a working Discord bot.
If you enjoyed this article or want to build a bot of your own, let us know in our Discord (where you can use the Uberduck Discord bot, a version of the bot we just built with a few more features) or email me at email@example.com.