メインコンテンツまでスキップ

test-models-with-playground

Playground is an interactive environment that allows you to test deployed models in real time and verify their performance.

You can run tests directly from your browser without writing any code.

  • By default, it supports the OpenAI Compatible API.
  • For other formats, you can use the Custom API Request feature to perform tests.

Getting Started

In the list of deployed models, click the Action menu on the right and select Playground.

  • Playground only supports models that are already deployed.
  • If your model has not been deployed yet, please first complete the deployment process by following the user guide to Deploy a Model to a Cluster.

  • You can only test models in Playground that have been fully deployed.
  • If the Status in the Inference Service List is not Running, the Playground button will be disabled.

Key Features of Playground

  • **Automatic Registration:** When you select a model type, it is automatically registered in Playground.
  • **Instant Testing:** Check model responses without writing any code.
  • **Real-Time Responses:** For chat and text models, responses can be viewed via streaming.
  • **Supports Multiple Types:** Compatible with Chat, Text, Embedding, Image, Audio, and Custom API models.
  • **Parameter Adjustment:** You can configure settings such as Temperature, Max Tokens, and more.

Testing Process

Step 1: Select a Model

  • Choose the model you want to test from the Inference Service List.
  • On the model detail page, click Open in Playground.

Step 2: Register the Model

  • The model will be automatically registered through the Model Registration process.

  • Once registration is complete, click Test Model.

  • If you wish to cancel the registration, click Disconnect instead.

  • You can click “Re-register” to register the model again.

  • If the model isn’t running properly, registration may fail.
  • If the model is already registered, the Test Model button will appear immediately.

Step 3: Select the Model Type

Playground supports the following six model types:

TypeDescription
🗣️ ChatConversational models
✏️ Text CompletionText auto-completion
🧩 EmbeddingSentence-to-vector conversion
🔧 Custom APICustom API requests and testing

Click the type that matches your model and proceed with the test.

If you select a type that doesn’t match your model, it will not respond correctly.

For example: Using Chat type for a Stable Diffusion model → ❌


Step 4: Adjust Model Options (Optional)

If needed, you can modify the model settings and parameters.

Not all models support every available setting.

Chat Model Testing

Chat model testing is used to validate models with conversational structures such as GPT and LLaMA. Chat models are specialized for multi-turn conversation scenarios, remembering previous exchanges as context and generating the most appropriate and consistent responses to the user's current questions. Users can also assign specific roles or behaviors to the model through the 'System Prompt'.

Context

Chat models use conversation history as context and rely on the Attention Mechanism and System Prompt to generate appropriate responses and maintain a consistent persona.

Purpose 

Allows user to test the model via a web interface to evaluate accuracy, consistency, creativity, and to define the AI's role using the System Prompt.

How to Use

  1. Select the Chat model type and open the interface
  2. Enter your message and send
  3. Press Enter or click the send button
  4. View the model’s response

Adjusting Model Parameters

By clicking the settings button in the upper left corner, you can fine-tune various parameters to control the model’s responses:

ParameterRange/OptionsDescriptionRecommended Value
System PromptTextInstructions defining AI’s role and behavior-
Temperature0.0~2.0Controls creativity consistency (lower = more consistent)0.7 (general chat)
Max Tokens100~20,000Maximum number of tokens to generate (length limit)4,000 (medium length)
Top P0.0~1.0Controls diversity (used with Temperature)0.9~1.0
Frequency Penalty-2.0~2.0Discourages repetitive words/phrases0.1~0.3
Presence Penalty-2.0~2.0Encourages introduction of new topics0.0~0.2
Stop SequencesArray of textStops generation at specified text (can set multiple)-
SeedNumber (optional)Seed value for reproducible results-
MetadataKey-value pairsAdditional info for logging/analysis (optional)-
  • Max Tokens: Be careful not to exceed the maximum token limit for your model.
  • Stop Sequences: If any stop sequence is encountered, generation will halt.
  • Seed: Using the same input and seed will produce repeatable results.

Text Completion Test

Text Completion is used to test models that automatically complete sentences after a prompt is given. Text Completion models analyze the patterns and context of the user-provided input text (prompt) and continuously generate the most probable next words and phrases. Through this process, the model can automatically complete sentences, paragraphs, or even long texts.

Context

Text Completion models are based on auto-regressive generation, sequentially generating the next token using the model’s predicted token probability distribution. During this process, parameters intervene in token sampling to control the diversity of the response—this is the theoretical core.

Purpose 

Users input a prompt to automatically generate subsequent sentences and verify the quality of the generated text. In particular, the Suffix parameter can be used to add a suffix after the generated text, allowing tests that guide text completion in a specific direction.

How to Use

  1. Select the Text Completion type.

  2. Enter your prompt.

  3. Click Generate and the model will automatically produce the continuation of your prompt.

  4. You can copy the result or regenerate as needed.

Adjusting Model Parameters

By clicking the settings button in the upper left, you can fine-tune the response parameters for the Text Completion model.

ParameterRange/OptionsDescriptionRecommended Value
Temperature0.0~2.0Controls creativity and consistency1.0 (creative)
Max Tokens100~4,000Maximum number of tokens to generate1,000 (medium length)
Top P0.0~1.0Controls diversity (used with Temperature)1.0
Frequency Penalty-2.0~2.0Discourages repeated words/phrases0.0~0.3
Presence Penalty-2.0~2.0Encourages introduction of new topics0.0~0.2
SuffixText (optional)Adds a suffix to the generated text-
Stop SequencesArray of textStops generation at specified text-
Streamtrue/falseEnables real-time streaming for long texttrue (recommended)

Suffix Usage Examples

Use CaseSuffix ExampleEffect
Request a summary"In summary:"Guides the model to produce a summary
Conclusion"In conclusion,"Finish the text with a logical conclusion
Specific Format"### Main Points:"Generates text in a structured format
  • Stop Sequences: Use \\n\\n, ###, or end to create natural stopping points.
  • Stream: Enable streaming to view long text as it’s generated in real time.
  • Suffix: Use a suffix to guide the model’s completion direction.

Embedding Generation Test

Embedding is used to test models that convert text into meaningful vector values. The model transforms input text into a numerical array in high-dimensional space, where these vector values represent the semantic features of the text. The more similar the vectors, the more similar the original texts are in meaning.

Context

Embedding is based on the theory of distributed representation, converting the meaning of words or sentences into numerical vectors in a high-dimensional space. In this space, similarity measures such as “Cosine Similarity” are used to calculate the similarity between vectors, indicating the semantic similarity between the original texts.

Purpose 

By inputting single or multiple sentences, users can examine and verify the vector values generated by the model and the similarities between sentences. Additionally, parameters like Encoding Format and Dimensions can be adjusted to explore the optimal vector form for the intended environment.

How to Use

  1. Select the Embedding type

  2. Input one or multiple sentences (separate multiple sentences with line breaks)

  3. Click Generate Embeddings

  4. Check the vector values and similarity scores

Adjusting Model Parameters

You can customize the parameters for embedding vector generation.

ParameterOptions/RangeDescriptionRecommended Value
Encoding Formatfloat / base64Vector encoding formatfloat (default)
Dimensions512, 1024, 1536, 3072Number of vector dimensions (model-dependent)Model default
User IdentifierText (optional)Identifier for request tracking/monitoring-

Encoding Format Comparison

FormatCharacteristicsUse Cases
FloatArray of numbers, intuitiveDevelopment/debugging, manual computation
Base64Compressed string, efficientStorage/transmission, API responses
Example Use Cases:
  • Semantic Search: Find documents with similar meaning
  • Similarity-Based Recommendations: Content recommendation systems
  • Document Clustering: Group similar documents together
  • Duplicate Detection: Identify documents with similar content

  • Dimensions: Only some models support dimension customization
  • Encoding Format: Impacts API response size (Base64 is more efficient for large data)

Custom API Testing

Custom API is an advanced feature that enables you to configure and send HTTP requests directly. It is especially useful for testing specialized APIs or custom endpoints that are not covered by the standard model types.

Context

Custom API testing is based on RESTful architecture and HTTP protocol. Users manually set HTTP methods (GET, POST, etc.), headers, and body to send requests to the model server’s endpoint, then analyze the status code and response data to verify correct API operation.

Purpose 

It allows user to quickly verify and debug request/response structures during API development by checking endpoint behavior and response handling in a REST API environment. Complex requests can be easily built in a GUI.

How to Use

  1. Select the Custom API type

  2. Choose a predefined endpoint or enter a custom URL

  3. Configure your request (set headers, body, and files as needed)

  4. Click Send Request and review the results

Supported Endpoints

Custom API supports 8 predefined endpoints as well as custom endpoints.

EndpointDescriptionHTTP MethodBody Type
Chat CompletionsConversational model APIPOSTJSON
Text CompletionsText auto-completion APIPOSTJSON
EmbeddingsSentence vector transformationPOSTJSON
Image GenerationsText-to-image generation APIPOSTJSON
Audio Speech (TTS)Text-to-speech conversion APIPOSTJSON
Audio Transcriptions (STT)Speech-to-text conversion APIPOSTForm Data
Audio TranslationsSpeech translation APIPOSTForm Data
Custom...User-defined endpointSelectableSelectable

Request Configuration

Selecting HTTP Method

MethodDescriptionTypical Use Case
GETRetrieve dataCheck model information, view status
POSTCreate/process dataModel inference, file upload
PUTUpdate all dataUpdate settings
DELETEDelete dataRemove resources
PATCHUpdate partial dataChange specific settings
HEADView headers onlyStatus check
OPTIONSCheck supported methodsCORS preflight requests

**Headers **

  • **Automatic headers:** Authorization, Content-Type, and X-Model-Name are added by default.
  • **Custom headers:** Add additional headers as needed.
  • **API key management:** Easily copy the API key from the Authorization header.

Body Type Settings

Body TypeDescriptionUse Case
NoneNo request bodyGET, HEAD, OPTIONS requests
JSONJSON formatted dataMost API requests
Form DataForm dataFile uploads, form submissions
TextPlain textSimple text transmission

File Upload

  • **Supported formats:** All file types
  • Size limits: Limits based on browser and server settings
  • **Multiple files:** Upload several files at once

Response Handling

Response Modes

ModeDescriptionUse Case
FormattedStructured JSON tree viewAnalyzing complex JSON responses
RawOriginal text formViewing simple responses

JSON Tree View Features

  • **Auto-expand:** All nodes are expanded by default
  • **Copy individual values:** Easily copy data from specific nodes
  • **Color coding by type:** Strings (green), numbers (blue), booleans (purple), etc.
  • **Nested structures:** Objects and arrays are visually represented

Response Statistics

ItemDescriptionDisplay Format
HTTP StatusResponse status code200, 404, 500, etc.
DurationProcessing time1.23s, 500ms, etc.
SizeResponse size1.2KB, 5.4MB, etc.
TimestampRequest time2024-01-15 14:30:25

Advanced Features

cURL Command Generation

  • **Auto-generation:** Requests can be converted to cURL commands
  • **Clipboard copy:** Copy the generated cURL command with one click

URL Management

  • **Real-time URL display:** See the final request URL as you configure
  • **URL copy:** Copy the completed URL to your clipboard

API Key Management

  • **Auto-injection:** The API key is automatically added to the Authorization header
  • **Key copy:** Copy just the API key separately
  • Development: Quickly verify request and response formats while developing APIs
  • Debugging: Analyze unexpected responses in detail
  • Documentation: Document API usage with generated cURL commands
  • Testing: Validate API behavior across different parameter combinations

  • API Key Security: Be careful not to expose your API keys
  • Request Limits: Watch for API server rate limits
  • File Size: Check network status for large file uploads
  • Endpoint Accuracy: Incorrect endpoints may result in 404 error

Model Usage – Internal Connection Endpoint

The internal Inference Service endpoint is a URL that can be used only within the GPU Cluster. Depending on the deployed model, it may have different API address patterns and request formats, and it cannot be used from outside the cluster.

Context

To enable fast and secure model invocation from other workloads running within the same cluster where the model is deployed (e.g., training servers, batch servers).

Purpose 

Minimize intra-cluster network latency and ensure security by performing model inference without external exposure.

Main Function

  • An internal Inference Service endpoint usable only inside the GPU cluster.
  • Supports multiple endpoint patterns depending on the deployed model type and runtime.
  • No separate API key authentication is required.

How to check

The Endpoint field in the model list shown when you navigate to Model Serving > Inference Service List from the home page or the left navigation menu.

Precautions

  • For internal endpoints, the endpoint URL and request format can vary depending on the deployment runtime, model type, and configuration.
  • Usable only within the GPU cluster.

Model Usage – External Connection Endpoint

The external connection endpoint shown in the Playground is a URL reachable from outside the cluster where the model is deployed. This endpoint is guaranteed to work only for OpenAI API–compatible models.

Context

Used when you need to access the model inference API from external web services, mobile apps, or development environments outside the cluster.

Purpose 

To call the model API from outside the cluster for integration with external services, and to perform OpenAI-compatible API communication in the same way as the Playground.

Main Function

  • An external access URL discoverable via the Playground and reachable from outside the cluster.
  • Guaranteed to work only for OpenAI API–compatible models.
  • An API key is mandatory for access.

How to check

  • Endpoint: Open Playground from Model Serving > Inference Service List > Action, choose Custom API Request, then check the endpoint shown in Request Configuration.
  • API Key: On the Custom API Request screen, in Headers, click the copy button next to Authorization.

Precautions

  • When using the external connection endpoint, you must specify the model name in the request body to distinguish deployed models (per the OpenAI API spec).
  • Because it follows the OpenAI API spec, the external connection address does not support endpoints that are not available in OpenAI.