メインコンテンツまでスキップ

What is GPU Management?

The GPU management function manages the status of all GPU components through the GPU Dashboard, which is an operator-oriented function, and manages the usage status of GPUs through physical/logical capacity, pool requests and allocations of GPU resources.

You can manage the entire lifecycle, from GPU pool creation → assignment → optimization → monitoring, through a single interface.

Key Features

  • GPU Capacity: Increase utilization by managing the actual capacity of physical GPU hardware as a pool.
    • NodeCapacity: Information about the overall capacity of each GPU node
    • GpuCapacity: Details of individual GPU devices
    • Profile: The purpose of mapping the actual assigned pod information (workload tracking)
    • SpecAndCount: Used to manage provisioned MIG or GPU resources
  • GPU Pool Management: Group or separate GPUs by workload type and manage as pools at the project level.
  • GPU Monitoring: Measure GPU usage by power consumption, SM activity, and memory errors

Architecture References

The GPUs are managed through the GPU management stack provided by Nvidia in Nvidia H100 and H200 configurations, and the life cycle of GPUs (container scheduling, resource per per-load, isolation, etc.) is managed in an integrated manner based on Kubernetes. Therefore, the allocated GPUs apply the same workload behavior principles on Kubernetes (such as resource allocation through app distribution).

LayerElementRoleFeature
PhysicsKubernetes NodePhysical ServersThe physical infrastructure that hosts the server resources
GPU HardwarePhysical GPUIn charge of AI-related calculations
LogicKubernetes Api ServerAPI ServerControl over Kubernetes
OperatorGPU ManagementGPU resource management and provisioning
MCM
(Multi-Cluster Management)
Cluster ManagementKubernetes Cluster Management and GPU Resources
CapacityCapacity ManagementResource management objects between the GPU pool and the physical
GPU PoolManage your usageA managed object that is assigned to an end user
as a logical resource

GPU Management Method

GPUs manage GPU resources in the form of MIGs divided into physical GPUs and independent GPU instances on GPUaaS, which is a hardware-based GPU virtualization technology introduced by Nvidia since the A100.

MIG (Multi-Instance GPU) Provisioning

Multi-Instance GPUs (MIGs) are an innovative technology from NVIDIA that allows a single GPU to be partitioned into up to seven independent GPU instances. Each instance has dedicated memory, cache, and streaming multiprocessors, and operates in complete isolation at the hardware level. This provides a secure multi-tenant environment where failures on one instance do not affect the others

  • Maximize resource efficiency
    • Multiple users utilizing one expensive GPU (e.g., A100) at the same time
    • Cost-effective use of workloads that don't require full GPU performance
    • Improve ROI by minimizing idle GPU resources
  • Multi-tenancy support
    • Securely share the same GPU across multiple projects or teams
    • Ensuring complete isolation between each instance (memory, compute resources)
    • One instance failure does not affect the other
  • Custom assignment of workloads
    • Inference tasks: Provide sufficient performance even with a small amount of memory
    • Dev/Test: Enables rapid iterative development with small resources
    • Batch processing: Run multiple small jobs in parallel at the same time
  • Improve operational efficiency
    • Reduced GPU queue latency (more instances available)
    • Increased flexibility in scheduling and scheduling resources
    • Accurate cost allocation based on usage
  • Scalability and ease of management
    • Accommodates more users without adding a physical GPU
    • Centralized GPU resource management
    • Optimize with dynamic resource reallocation

Full GPU Provisioning

MIG is not suitable for all workloads, and in environments that require high-performance, massively parallel processing, leveraging a full GPU is most effective. The use of Full GPUs is mainly used for large-scale training (deep learning learning), LLM training (GPT-3, etc.), inference of large models, high-performance simulations, and scientific calculations, and is mainly used for Multi-GPU, Fine-Tuning, etc. It is the exclusive use of the entire GPU in one workload. It is optimized for training large-scale AI models or high-performance computing tasks.

CategoryMIG (Multi-Instance GPU)Full GPU
Resource StructureHardware-level partitioning of the GPU (1 to 7 instances)Uses the entire GPU as a single resource
Isolation LevelHardware-level isolation (isolation
of memory and cache)
Shared resource (resource based on a single GPU)
PerformanceHigh-density multi-tenancyMaximum performance, high bandwidth utilization
Suitable WorkloadsInference, small Fine-TuningLarge-scale training, multi-GPU training
Billing Method:

GPUaaS provides a billing service that automatically tracks GPU Pool usage and generates billing data. GPU usage becomes billable after approval of a Pool usage request at the project level, and resources allocated to a Pool are dedicated to that project and cannot be shared with other projects.