Overview

Our API is synchronous. This means that you send an API request to our endpoint with input parameters and get an immediate response.

Service API

⚙️ Create the first Endpoint
🔎 Check your endpoint's performance

Inference API

😎 Run GPU inference on:

Automatic1111 (eg. SD-XL 1.0, Stable Diffusion v2.1)
vLLM (eg. Mistral, LLaMA-2, MPT)
Audio (text2speech) (eg. Whisper)

Troubleshooting

Handling HTTP 429 Too Many Requests Error

The HTTP 429 status code indicates that too many requests have been sent in a given time period (“rate limiting”). Our service rejects excessive requests to protect resources from being overloaded. This can result from too many requests being sent in a short period by a single user or application.

The user should implement retry logic on their side to handle receiving responses with a 429 status code.

Agree and access repository on HuggingFace

Some repositories (models) on HuggingFace require users to grant permission and provide their contact information (email address and username) to the repository authors. Upon accessing the repository, you may see the following message:

Alt text

Click to give your consent. This is required to use the model. After clicking, the following message will appear:

Alt text

Next, generate a token in Hugging Face and follow the instructions.