Endpoints

Create first Endpoint

The endpoints are fully managed and scalable to handle any AI workfload. They are designed for various applications and support environments such as Automatic1111, vLLM and Whisper. Each endpoint is powered by a specific number of workers (with GPU), dependent on the current load.

To configure your initial endpoint, you will require:

an active modelserve AI account
a generated Access Token
sufficient funds (in USD) in your account

Discover more in the 🚀 Quickstart section.

To generate first AI Endpoint, use the endpoint below:

https://api.modelserve.ai/api/v1/clusters/

curl python javascript

curl -s -X POST \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer X' \
     -d '{"name": "string", "gpu_segment": 0, "package_type": "automatic", "model_repo_name": "string", "model_url": "string", "huggingface_auth_token": "string", "startup_script_url": "string"}' \
     'https://api.modelserve.ai/api/v1/clusters/'

import requests

r = requests.post(
    "https://api.modelserve.ai/api/v1/clusters/",
    headers={
        "Accept": "application/json",
        "Content-Type": "application/json",
        "Authorization": "Bearer X",
    },
    data={
        "name": "string",
        "gpu_segment": 0,
        "package_type": "automatic",
        "model_repo_name": "string",
        "model_url": "string",
        "huggingface_auth_token": "string",
        "startup_script_url": "string",
    },
)

fetch('https://api.modelserve.ai/api/v1/clusters/', {
  "method": "POST",
  "headers": {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Bearer X"
  },
  "body": JSON.stringify({"name": "string", "gpu_segment": 0, "package_type": "automatic", "model_repo_name": "string", "model_url": "string", "huggingface_auth_token": "string", "startup_script_url": "string"})
});

Values:

"name": "string" - custom name for endpoint
"gpu_segment": 0 - segment id (learn more in the Segments section)
"package_type": "string" - type of environment, you can choose from:
- automatic - Automatic1111
- vllm - vLLM
- speech2text - audio (eg. Whisper)
"model_repo_name": "string" - address of the model repository from HuggingFace
"model_url": "string" - address of the model URL from HuggingFace
"huggingface_auth_token": "string" (optional) - additional security token
"startup_script_url": "string" (optional) - additional script to modify the environment

Example payload:

data={
        "name": "My own Stable Diffusion",
        "gpu_segment": 8,
        "package_type": "automatic",
        "model_url": "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt",
    },

data={
        "name": "My own Mistral",
        "gpu_segment": 9,
        "package_type": "vllm",
        "model_repo_name": "mistralai/Mistral-7B-v0.1",
    },

data={
        "name": "My own Whisper",
        "gpu_segment": 9,
        "package_type": "speech2text",
        "model_repo_name": "openai/whisper-tiny",
    },

⚠️ Use model_repo_name for models running on vLLM and Whisper, while use model_url for models running on Automatic1111.

Display list of Endpoints

After creating the first Endpoint, you can view the list to obtain its ID. This way, you will be able to check the current list of Endpoints, their settings, and status.

To display list of Endpoints, use the endpoint below:

https://api.modelserve.ai/api/v1/clusters/

curl python javascript

curl -s -X GET \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer X' \
     'https://api.modelserve.ai/api/v1/clusters/'

import requests

r = requests.get(
    "https://api.modelserve.ai/api/v1/clusters/",
    headers={
        "Accept": "application/json",
        "Content-Type": "application/json",
        "Authorization": "Bearer X",
    },
)

fetch('https://api.modelserve.ai/api/v1/clusters/', {
  "method": "GET",
  "headers": {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Bearer X"
  }
});

-

Response (JSON)

"results": [
{
    "id": "eca2785b-d094-432b-8722-6c7d2f957eb2",
    "name": "SD 1.5",
    "gpu_segment": 3,
    "package_type": "automatic",
    "status": "terminated",
    "model_repo_name": null,
    "model_url": "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt",
    "startup_script_url": null,
    "address": "https://6c7d2f957eb2.app.modelserve.dev-test.golem.network",
    "running_workers": [],
    "created_at": "2024-01-22T09:24:19.947509Z",
    "last_update": "2024-01-22T12:38:42.907606Z"
}]

Example value:

"id": "eca2785b-d094-432b-8722-6c7d2f957eb2" - unique ID of endpoint
"name": "SD 1.5" - name for endpoint
"gpu_segment": 3 - segment ID
"package_type": "automatic" - type of environment
"status": "terminated" - current status
"model_repo_name": null - address of the model repository from HuggingFace
"model_url": "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt" - address of the model URL from HuggingFace
"startup_script_url": null - additional script to modify the environment
"address": "https://6c7d2f957eb2.app.modelserve.ai" - address URL of your endpoint
"running_workers": [] - more information about workers
"created_at": "2024-01-22T09:24:19.947509Z" - creation date of endpoint
"last_update": "2024-01-22T12:38:42.907606Z" - date of last update of endpoint

The most important values are:

"id": "eca2785b-d094-432b-8722-6c7d2f957eb2" - unique ID of endpoint
"address": "https://6c7d2f957eb2.app.modelserve.ai" - address URL of your endpoint

Remember to replace the "Bearer X" with your real Access Token. Where to find your Access Token (Bearer)? Learn more in the 🚀 Quickstart section.