Summary

Each of the two authentication mechanisms has its usage and both have their use-cases in properly securing your API calls.

In this section, we provide the short summary for both. Please refer to the following sections for in-depth descriptions and usage scenarios.

API Keys are randomly generated strings that you can obtain through the dashboard . They:

are easy to implement,
are long-living, do not require refreshing,
can be immediately revoked in case of the exposure.

At the same time, API Keys:

are not scoped and have full permissions, including the deletion of the personal voices,
are NOT suitable for the public clients authentication, such as web application front-end code, native mobile and desktop applications,
require the database request to match the key to your account.

Access tokens are JWTs obtained programmatically, through our OAuth 2.0 compatible API. They:

are short-lived,
can be scoped to the minimal required set of permissions,
require regular re-issuing,
are specially designed for authenticating public clients,
are faster to validate (no DB request needed).

Additionally, it is important to note that you must authenticate the end-users of your application, or otherwise it's impossible to control and secure the usage of the AI API through your account. This is the immanent property of the distributed systems, not specific to Speechify AI API: if you let unauthenticated users to query the API (directly or through any means of proxying), then anybody can make the same request at your expense.

API Keys

What are API Keys

API Keys are randomly generated strings that you can obtain through the dashboard. Each key is stored in the database, linked to your account.

When to Use API Keys

You CAN use the API Key to authenticate your Speechify AI API requests when ALL of the following conditions are met:

your application qualifies as confidential, i.e. it runs entirely on the server,
your applications doesn't directly or indirectly expose any Speechify-related calls to your users.

Example of such application: the call center app sending Speechify requests from the server and playing the audio back to the user, over the phone or otherwise.

Additionally, a valid API Key is needed to issue the Access Tokens to your users, see the Access Tokens section for details.

When NOT to Use API Keys

You should by no means use the API Keys:

directly from the front-end app,
directly from the mobile or desktop native app,
indirectly through the unauthenticated proxy endpoint (see below for the in-depth discussion of this scenario).

Using the API Key directly from the public client is equivalent to exposing your key to the entire Internet. This gives anybody full access to any API actions on your behalf, including:

creating and deleting the personal voices under your name, which means either violation of other people's privacy, or disruption of your application functioning,
generating audio on your behalf, which incurs expenses on your account.

It is important to emphasize once again that there's truly no way to secure the API Key usage in such scenarios. Anything that may come to mind (requesting them from the separate /config endpoint, encrypting the keys, storing them in the native app resources, encrypted or unencrypted) does NOT improve the security. If your public application is sending a request including the API Key in the Authorization header, such request can be intercepted, and the key extracted.

What About Proxy?

You may think that implementing the Proxy/Façade on your server can address the above mentioned concerns.

The typical implementations goes like this:

the API Key is securely stored on your server,
a new API endpoint is created on your server, something like /speechify-proxy,
your public client (i.e. the browser-based app) sends the request to this endpoint, such as POST /speechify-proxy/v1/audio/speech,
your server receives the request, adds the Authorization header, and forwards the request to the Speechify AI API: POST https://api.sws.speechify.com/v1/audio/speech,
Speechify AI API receives the request, matches the API Key to your account, synthesizes speech audio, bills you, and sends the response,
your server receives the response from the Speechify AI API and proxies it back to the client.

It may look like a good idea, because it's not exposing the API Key to the public client, which is true. At the same time, now everybody can send the request to the Speechify AI API through your proxy. In other words, if you create such wildcard proxy, now anybody can use it to perform any actions.

It should be mentioned that you can address this issue under the following conditions:

you are not blindly following all the requests, and instead expose the specialized endpoint on your side, e.g. the one that only sends the speech generation requests but doesn't let the user to, for example, send the voice deletion request,
you authenticate your users and validate the authentication on the proxy endpoint.

At the same time, the proxy approach still has two disadvantages that are impossible to work around:

by proxying the requests you add the extra latency on top of the Speechify response time,
in addition to paying for the Speechify server time and traffic, you're now paying the 2nd time for the same on your hosting.

Storing Your API Keys

While we have discussed that you shouldn't use the API Keys to authenticate the requests on behalf of the public clients, you still need to store your API Key server-side to be able to issue the Access Tokens.

Please follow the following security recommendations:

do NOT store your keys in your source code, even if the repository is private,
if you use the on-disk .env file for storing the secrets (which is generally not recommended), do NOT commit these files to your repository, always add them to .gitignore or a similar file for your version control system,
instead DO use the hosting tooling for secrets management. The specific approach depends on the PaaS or hosting that you use, for example, Vercel, Netlify, Cloud Run, and many other platforms support the Environment Variables (and sometimes Secrets) configurable from the platform UI or the CLI.

How to Use API Keys

With all nuances taken into account, the actual usage of the API Key-based authentication is trivial: you only need to add it to the auth header when querying the Speechify AI API:

Authorization: Bearer YOUR_API_KEY

This applies to every request, including the request to issue the access token for your users.

Access Tokens

What are Access Tokens

Logically, access tokens are the strings of characters that can be used to authenticate the API calls. In that sense, they are similar to the API Keys.

At the same time, they have many unique properties that make them suitable for the client auth, especially for the public clients, like the in-browser web apps, or native mobile or desktop apps. Let's look at the differences:

Access Tokens are JSON Web Tokens (JWTs), which means that:
- they embed the important information, such as the link to your account, as well as token own properties like lifetime and scope,
- they are cryptographically signed, which means the server can validate the token without a round-trip to the database, making the API calls resolve faster,
Access Tokens have the limited lifetime, which means that a single exposed token eventually expires without causing any extra harm,
Access Tokens are issued to the limited usage scope (see below), letting you, for example, authorize the client app to make TTS calls but not enabling it to delete your personal voice.

When to Use Access Tokens

You should always use access tokens based auth if your code is running in the public client, such as the in-browser web apps, or native mobile or desktop apps. Access Tokens provide the necessary security while letting the public client talk directly to the Speechify AI API, saving you unnecessary costs related to your server computation time and traffic.

You can also use the tokens for the pure server-to-server communication if you want to make sure you limit your actions to those intended and your code cannot accidentally perform the destructive operation (such as sending a DELETE request instead of POST).

When NOT to Use Access Tokens

There's truly no limitation, you can actually use the access token-based authentication for any request to the Speechify AI API (save for the one used to issue the access token itself). They can be used for both confidential client access (i.e. server-to-server) and public clients (browser-based web apps, native mobile and desktop apps).

The advantages of the access token apply everywhere:

they're short-lived, which means that the leaked token only exposes you for the limited time,
they are scoped, which means that even theoretically you cannot perform destructive operation (e.g. deletion of the personal voice) through the token only authorized for creative activities (i.e. TTS),
their validation is faster as it doesn't require a trip to the database.

Just don't forget to refresh the token well before it expires. See the Refreshing Access Tokens section below for the detailed explanation and a Recipe.

There's one disadvantage of the access token-based auth, though: there's no way to immediately revoke the leaked token. It will eventually expire, but because the token is not stored in the DB (see What are Access Tokens for the explanation), there's no way to revoke it. This is the trade-off that we pay for the fact that access tokens are validated almost immediately, without consulting the central authority (the database).

How to Use Access Tokens

With the access token API we're following the OAuth 2.0 Client Credentials flow. If you already know what it is, you're good. If not, let's look into it together.

For the client to use the token you first have to issue the token by making the request from your server to the Speechify AI API. Please note that the only way to reliably secure the usage of the API on your behalf is to authenticate your app's end-user (please see the End-User Authentication below for some explanation).

Issuing the Token

To issue the token you make the POST request to https://api.sws.speechify.com/v1/auth/token.

According to the OAuth 2.0 specification, the request must be in www-form-urlencoded format. But honestly (pssst!), we also accept JSON.

The grant_type param is mandatory and must be set to client_credentials, this is in conformance with the spec.

The scope param is optional and we recommend always passing the minimal scope that you need. It should either contain a single scope value, or multiple space-separated values. See below for the scopes list and the default value.

You must authorize this request using the API Key (and that's why is must only be done from your server, i.e. confidential client):

Authorization: Bearer YOUR_API_KEY

The response will look like this:

{
  "access_token": "abc.def.xyz",
  "token_type": "bearer",
  "expires_in": 3600,
  "scope": "audio:speech"
}

access_token is the token that you can use from now on to authorize the requests to the Speechify AI API.

token_type is always bearer, meaning you pass it using the Bearer scheme, see the next section.

expires_in is the token lifetime, in seconds, since the moment it was issued. As of the time of this writing, it's always 1 hour, or 3600 seconds.

scope repeats the scope passed when requesting the token, or the default scope if the param was omitted. See below for the explanation of scopes.

Authorizing Requests

Now, this is trivial, as it's done exactly the same as with the API Keys, using the Authorization header:

Authorization: Bearer your.access.token

Refreshing Access Tokens

Access tokens expire, that's one of their distinguishing properties that make them usable for the public client authentication.

This means that you need to regularly refresh them. There's no special procedure for that, you just request the new token when your old token is about to expire.

How do you know when it's about to expire? Based on the token request time and its expires_in property, returned in the AI API call, and defined in seconds.

The recommended way for the client application would be like this:

when your end-user authenticates, request the access token from your server, and store it in-memory,
similarly, when the authenticated user comes back (i.e. visits the webpage or launches the mobile app), request the new access token and store it in-memory,
expose a reusable method to access this in-memory token and use it to authorize all requests coming from your client app to the Speechify AI API,
after successfully receiving the access token, immediately schedule an asynchronous procedure (i.e. using the setTimeout in JS) to request a new one, for example, after half of the token's lifetime (in our example: 30 minutes),
let this async token refresh run, regularly obtaining the new token and storing it in-memory, replacing the old token,
if your end-user logs out, stop the async refresh (it should be failing anyway) and erase the previously stored token.

Check our Recipe for the reference JS implementation:

🦉

[ADVANCED] Using access tokens for the client auth

Open Recipe

Access Token Scopes

When requesting the token you can pass the optional scope parameter to the API.

The parameter can be a single scope value, or several values, separated by spaces.

The supported scopes are:

audio:speech
audio:stream
audio:all (combines the previous two)
voices:read
voices:create
voices:delete
voices:all (combining the previous three)

If you do not provide the scope, it will default to audio:all voices:read, meaning that the token is authorized to make any audio-related calls (normal TTS and streaming), and GET the list of the available voices.

The token scope is checked for each relevant API call, for example, if you try to DELETE the voice and authenticate your request with an access token that doesn't have the relevant scope, your request will be rejected with 401 Unauthorized.

End-User Authentication

It is important to understand that unless you authenticate your app's end-users, there's no way for you to secure and control the Speechify AI API usage on your behalf. If, for example, your application is a user-facing web app, and you provide the TTS generation endpoint, anybody can repeat the TTS request with the same or altered input, effectively using the Speechify AI API at your expense.