Overview
Our API is synchronous. This means that you send an API request to our endpoint with input parameters and get an immediate response.
Service API
Inference API
😎 Run GPU inference on:
- Automatic1111 (eg. SD-XL 1.0, Stable Diffusion v2.1)
- vLLM (eg. Mistral, LLaMA-2, MPT)
- Audio (text2speech) (eg. Whisper)
Troubleshooting
Handling HTTP 429 Too Many Requests Error
The HTTP 429 status code indicates that too many requests have been sent in a given time period (“rate limiting”). Our service rejects excessive requests to protect resources from being overloaded. This can result from too many requests being sent in a short period by a single user or application.
The user should implement retry logic on their side to handle receiving responses with a 429 status code.
Agree and access repository on HuggingFace
Some repositories (models) on HuggingFace require users to grant permission and provide their contact information (email address and username) to the repository authors. Upon accessing the repository, you may see the following message:
Click to give your consent. This is required to use the model. After clicking, the following message will appear:
Next, generate a token in Hugging Face and follow the instructions.