I am developing a website where users can input the text and click a button to hear it as audio. I have decided to use google cloud text to speech(TTS) api. All the documentation that I could find online shows how to use google TTS in a local computer by installing the google cloud library. I dont know how to get those client libraries in my website host panel and use the google api.
so my question is how to use the google cloud text to speech api in the website using client libraries?
PS: I dont want to use the webspeech API.
Advertisement
Answer
All you need to do is to recreate the documentation calls that you can see in curl
on your client side.
The reason it ask you to install libraries locally is to be able to get authentication credentials for you: (see below gcloud auth application-default print-access-token
is executing a function in your computer that returns an api-key string, if your replace that with your api it will work without installing anything).
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" --data "{ 'input':{ 'text':'I've added the event to your calendar.' }, 'voice':{ 'languageCode':'en-gb', 'name':'en-GB-Standard-A', 'ssmlGender':'FEMALE' }, 'audioConfig':{ 'audioEncoding':'MP3' } }" "https://texttospeech.googleapis.com/v1/text:synthesize"
In javascript this would be something like:
// Automatically generated with: https://kigiri.github.io/fetch/ const result = await fetch('https://texttospeech.googleapis.com/v1/text:synthesize/', { body: '{ "input": { "text":" ve added the event to your calendar." }, "voice": { "languageCode":"en-gb", "name": "en-GB-Standard-A", "ssmlGender": "FEMALE" }, "audioConfig" : {"audioEncoding":"MP3"} }', headers: { Authorization: 'Bearer API-KEY', 'Content-Type': 'application/json; charset=utf-8' }, method: 'POST' })
The Google Speech API is the same as any HTTP Based API, and you can communicate purely by sending HTTP/S requests over.
WARNING
There might be libraries or examples only to use on the client side as well, but doing this client side is not likely to be a great decision.
The reason is that to communicate against the Google Speech API you will need to send your credentials to the API and by doing this on the client side you will be basically leaking your credentials to everyone in the world.
To avoid that you need to set up a proxy / communicate with your own backend that then communicates with the Speech API…
Even so, you may benefit to have a backend to save already generated speeches to avoid having to convert the same countless times, saving a lot in costs.