What is GPU Management?
The GPU management function manages the status of all GPU components through the GPU Dashboard, which is an operator-oriented function, and manages the usage status of GPUs through physical/logical capacity, pool requests and allocations of GPU resources.
You can manage the entire lifecycle, from GPU pool creation → assignment → optimization → monitoring, through a single interface.
Key Features
- GPU Capacity: Increase utilization by managing the actual capacity of physical GPU hardware as a pool.
- NodeCapacity: Information about the overall capacity of each GPU node
- GpuCapacity: Details of individual GPU devices
- Profile: The purpose of mapping the actual assigned pod information (workload tracking)
- SpecAndCount: Used to manage provisioned MIG or GPU resources
- GPU Pool Management: Group or separate GPUs by workload type and manage as pools at the project level.
- GPU Monitoring: Measure GPU usage by power consumption, SM activity, and memory errors
Architecture References
The GPUs are managed through the GPU management stack provided by Nvidia in Nvidia H100 and H200 configurations, and the life cycle of GPUs (container scheduling, resource per per-load, isolation, etc.) is managed in an integrated manner based on Kubernetes. Therefore, the allocated GPUs apply the same workload behavior principles on Kubernetes (such as resource allocation through app distribution).

| Layer | Element | Role | Feature |
|---|---|---|---|
| Physics | Kubernetes Node | Physical Servers | The physical infrastructure that hosts the server resources |
| GPU Hardware | Physical GPU | In charge of AI-related calculations | |
| Logic | Kubernetes Api Server | API Server | Control over Kubernetes |
| Operator | GPU Management | GPU resource management and provisioning | |
| MCM (Multi-Cluster Management) | Cluster Management | Kubernetes Cluster Management and GPU Resources | |
| Capacity | Capacity Management | Resource management objects between the GPU pool and the physical | |
| GPU Pool | Manage your usage | A managed object that is assigned to an end user as a logical resource |
GPU Management Method
GPUs manage GPU resources in the form of MIGs divided into physical GPUs and independent GPU instances on GPUaaS, which is a hardware-based GPU virtualization technology introduced by Nvidia since the A100.
MIG (Multi-Instance GPU) Provisioning
Multi-Instance GPUs (MIGs) are an innovative technology from NVIDIA that allows a single GPU to be partitioned into up to seven independent GPU instances. Each instance has dedicated memory, cache, and streaming multiprocessors, and operates in complete isolation at the hardware level. This provides a secure multi-tenant environment where failures on one instance do not affect the others

- Maximize resource efficiency
- Multiple users utilizing one expensive GPU (e.g., A100) at the same time
- Cost-effective use of workloads that don't require full GPU performance
- Improve ROI by minimizing idle GPU resources
- Multi-tenancy support
- Securely share the same GPU across multiple projects or teams
- Ensuring complete isolation between each instance (memory, compute resources)
- One instance failure does not affect the other
- Custom assignment of workloads
- Inference tasks: Provide sufficient performance even with a small amount of memory
- Dev/Test: Enables rapid iterative development with small resources
- Batch processing: Run multiple small jobs in parallel at the same time
- Improve operational efficiency
- Reduced GPU queue latency (more instances available)
- Increased flexibility in scheduling and scheduling resources
- Accurate cost allocation based on usage
- Scalability and ease of management
- Accommodates more users without adding a physical GPU
- Centralized GPU resource management
- Optimize with dynamic resource reallocation
Full GPU Provisioning
MIG is not suitable for all workloads, and in environments that require high-performance, massively parallel processing, leveraging a full GPU is most effective. The use of Full GPUs is mainly used for large-scale training (deep learning learning), LLM training (GPT-3, etc.), inference of large models, high-performance simulations, and scientific calculations, and is mainly used for Multi-GPU, Fine-Tuning, etc. It is the exclusive use of the entire GPU in one workload. It is optimized for training large-scale AI models or high-performance computing tasks.

| Category | MIG (Multi-Instance GPU) | Full GPU |
|---|---|---|
| Resource Structure | Hardware-level partitioning of the GPU (1 to 7 instances) | Uses the entire GPU as a single resource |
| Isolation Level | Hardware-level isolation (isolation of memory and cache) | Shared resource (resource based on a single GPU) |
| Performance | High-density multi-tenancy | Maximum performance, high bandwidth utilization |
| Suitable Workloads | Inference, small Fine-Tuning | Large-scale training, multi-GPU training |
GPUaaS provides a billing service that automatically tracks GPU Pool usage and generates billing data. GPU usage becomes billable after approval of a Pool usage request at the project level, and resources allocated to a Pool are dedicated to that project and cannot be shared with other projects.