Using Google Cloud Text-to-Speech API with Python

While this may be lengthy, you’re guaranteed to reach your desired destination if you diligently follow each step.

Prior knowledge of basic Python will be advantageous. If you haven’t had the chance to acquaint yourself with the Python Programming Language or haven’t installed it on your computer yet, I recommend checking out this informative tutorial (with VS Code or PyCharm code editor) or enrolling in a Python course available on edX.org.

1. Go to the Google Cloud Console page (console.cloud.google.com). Click on “Get started for free”. Google provide a free trial worth $300.

2. Click on “Start a free trial”.

3. Sign in to your Google account. If you don’t have one, create a new Google account.

4. Now set up your Google Cloud profile.

5. Create your first project.

6. Name your project. Your project ID will be automatically generated but if you want to customise it, you should do it now, you cannot change it later.

7. After creating your project, you need to enable the billing for your Google Cloud account if you haven’t already. Otherwise, go to Dashboard and skip until Step 12, or click on “APIs and services” and skip until Step 14.

8. Enable your billing and fill in the necessary fields. Then click “Continue”.

9. Select your account type.

10. Complete the rest of the fields. You can leave optional fields.

11. Select your payment method (Credit card, debit card, or bank account depending on your country).

12. On your project console, click on the menu in the top left corner of the page.

13. In the “APIs and services” section, click “Library”.

14. Search for “Google Cloud Text-to-Speech API”.

15. Select Cloud Text-to-Speech API.

16. Enable the API.

17. Next, you need to set up the authentication credentials to access the API. Click on the menu in the top left corner of the page, in the “APIs and services” section, click “Credentials”.

18. On the top of the page, click on “Create credentials”.

19. Choose “Service account”.

20. Fill in the necessary details for the service account, such as the service account name and role. Then click “Done”.

21. Click on your service account.

22. Click on the “Keys” tab.

23. In the “Add key” section, click “Create new key”.

24. Select the key type as JSON. This will download the JSON file containing your service account credentials.

25. Now the key file has been downloaded, check your Download folder. Save the JSON file in a secure location, as you will need it to authenticate your requests.

Congratulations! You’ve finished the first part, and you’re almost there!

26. Next, open VS Code or your Python application and install the client library with the following command on your prompt:

pip install --upgrade google-cloud-texttospeech

27. Create a “*.py” file for your script. Copy the following code.

The code below is for texts less than 5000 bytes. You have to split your text if you have one longer than 5000 bytes and merge your audio files later.

Google Cloud Text-to-Speech API can also work with long texts. However, the service currently is only available for texts in English and Spanish. Click here if you want to use the Long Audio API for Google Cloud Text-to-Speech.

If you want to write your text in a *.txt file instead of writing it directly on your Python script, read the Appendix section at the end of this post.

"""
FOR SHORT TEXTS LESS THAN 5000 BYTES

Synthesizes speech from the input string of text or ssml.
Make sure to be working in a virtual environment.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""
from google.cloud import texttospeech
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your/service-account-key.json"

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized, write your text after "text=" below:
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")

# Set the voice parameters:
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US", name="en-US-Neural2-J"
)

# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Set the desired output file path:
output_file_path = "Your folder path/output file name.mp3"

# Perform the text-to-speech request on the text input with the selected voice parameters and audio file type
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

# Save the audio to the output file
with open(output_file_path, "wb") as output_file:
    output_file.write(response.audio_content)
print("Audio file created:", output_file_path)

(1) Replace “path/to/your/service-account-key.json” with the actual path to your service account key JSON file. Remember where you store the key file.

(2) Replace “Hello, World!” with your text.

(3) Choose your language_code and voice name. You can go to the Google Cloud Text-to-Speech API product overview page and test all the voices available for the language of your text, in the Demo section. You can find the language_code and voice name of your choice in the JSON script below the voice settings.

(4) Set where the output file is to be stored. Replace “Your folder path/output file name.mp3” with your desired file path and output name.

Once your parameters are all set, you can run the code and find the audio output file in the output file folder.

Congratulations! You have done all the essential steps to use Google Cloud Text-to-Speech API in Python. Now you can create audio files from texts.

Appendix

Before using Google Cloud Text-to-Speech API, I usually proofread and edit the text to adjust the language and the flow of the speech. I don’t do this process on my Python script, but on a text file (*.txt).

To keep the script as clean as possible, rather than putting all the text in it, I use this code:

# Open the .txt file
file_path = 'your text file folder/your text file name.txt'
with open(file_path, 'r', encoding='utf-8') as file:
    mytext = file.read()
# Close the file
file.close()

The variable “mytext” contains the text from the *.txt file. Therefore, the rest of the script will look like this:

from google.cloud import texttospeech
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your/service-account-key.json"

output_file_path = "Your folder path/output file name.mp3"

client = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(text=mytext)

voice = texttospeech.VoiceSelectionParams(
    language_code="en-US", name="en-US-Neural2-J"
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

response = client.synthesize_speech(
    input=input_text, voice=voice, audio_config=audio_config
)

with open(output_file_path, "wb") as output_file:
    output_file.write(response.audio_content)
print("Audio file created:", output_file_path)