vLLM

LLMは、OpenAIのCompletionsおよびChat APIを実装したHTTPサーバーを提供します。

APIとvLLMを使用して最初の推論を送信します。エンドポイントのアドレスを使用し、以下を含めてください：

チャットコンプリーション

/v1/chat/completions

例： https://71bd1b92256e.app.modelserve.ai/v1/chat/completions

Chat Completions（チャットコンプリーション）は、会話の文脈内でメッセージに応答するように設計された専門的なタイプのCompletionモデルです。これらのモデルは対話を行うために最適化されており、会話内の以前のメッセージの文脈を考慮するように訓練されています。

以下は、Chat Completionモデルの主な特徴と用途です：

カスタマーサポート: 顧客の問い合わせに自動的に応答し、問題を解決し、情報を提供します。
バーチャルアシスタント: タスク管理を支援し、質問に答え、日常活動をサポートします。
教育とトレーニング: 学生の質問に答え、インタラクティブな授業を行う学習ツールです。
エンターテインメント: プレイヤーとリアルな会話を行うゲームキャラクターを作成します。

チャットコンプリーションAPIの例は以下のようになります：

ペイロード：

{
  "model": "mistralai/Mistral-7B-Instruct-v0.3",
  "max_tokens": 300,
  "temperature": 0,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Who won the World Cup in 2022?"
    }
  ]
}

curl python javascript

curl -s -X POST \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer X' \
     -d '{"model": "mistralai/Mistral-7B-Instruct-v0.3", "max_tokens": 300, "temperature": 0, "messages": [{"role": "system","content": "You are a helpful assistant."},{"role": "user","content": "Who won the World Cup in 2022?"}]}' \
     'https://{address}/v1/chat/completions'

import requests

r = requests.post(
    "https://{address}/v1/chat/completions",
    headers={
        "Accept": "application/json",
        "Content-Type": "application/json",
        "Authorization": "Bearer X",
    },
    data={
        "model": "mistralai/Mistral-7B-Instruct-v0.3",
        "max_tokens": 300,
        "temperature": 0,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Cup in 2022?"},
        ],
    },
)

fetch('https://{address}/v1/chat/completions', {
  "method": "POST",
  "headers": {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Bearer X"
  },
  "body": JSON.stringify({"model": "mistralai/Mistral-7B-Instruct-v0.3", "max_tokens": 300, "temperature": 0, "messages": [{"role": "system","content": "You are a helpful assistant."},{"role": "user","content": "Who won the World Cup in 2022?"}]})
});

「Bearer X」を実際のアクセストークンに置き換えることを忘れないでください。アクセストークン（Bearer）の見つけ方については、クイックスタートセクションで詳細をご覧ください。

🚀 クイックスタート

以下のリンクをクリックして、vLLM APIの機能についてさらに学びましょう。また、テスト用の既成ソリューションが見つかるノートブックセクションも確認してください。

詳細はこちら

Embeddings

/v1/embeddings

例： https://71bd1b92256e.app.modelserve.ai/v1/embeddings

Embeddingsモデルは、テキストを数値ベクトル（「埋め込み」と呼ばれます）に変換し、そのテキストの意味を表現します。

以下は、Embeddingsモデルの主な特徴と応用例です：

セマンティック検索: 与えられたクエリに対して意味的に類似したテキストを検索します。
テキスト分類: テキストの意味に基づいてカテゴリを割り当てます。
感情分析: テキストに表現された感情を評価します。
テキストクラスタリング: 内容が類似したテキストをグループ化します。
テキスト比較: 異なるテキスト断片間の類似性の度合いを評価します。。
レコメンデーション: 他のテキストとの類似性に基づいてコンテンツを提案します。

Embeddings APIの例は以下のようになります：

ペイロード：

{
  "input": "Your text string goes here"
}

curl python javascript

curl -s -X POST \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer X' \
     -d '{"input": "Your text string goes here"}' \
     'https://{address}/v1/embeddings'

import requests

r = requests.post(
    "https://{address}/v1/embeddings",
    headers={
        "Accept": "application/json",
        "Content-Type": "application/json",
        "Authorization": "Bearer X",
    },
    data={"input": "Your text string goes here"},
)

fetch('https://{address}/v1/embeddings', {
  "method": "POST",
  "headers": {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Bearer X"
  },
  "body": JSON.stringify({"input": "Your text string goes here"})
});

Bearer X」を実際のアクセストークンに置き換えることを忘れないでください。アクセストークン（Bearer）の見つけ方については、クイックスタートセクションで詳細をご覧ください。

🚀 クイックスタート

以下のリンクをクリックして、vLLM APIの機能についてさらに学びましょう。また、テスト用の既成ソリューションが見つかるノートブックセクションも確認してください。

詳細はこちら