メインコンテンツまでスキップ

Architecture

Control Plane

  • Model Registry: Manages model metadata and versions, supports migration from Hugging Face to Gitea, and allows uploading internal model repositories to Git.
  • Model Deploy/Orchestrator: Handles deployment workflows, target cluster selection, and generates InferenceService (ISVC) specifications.
  • AI Gateway: API entry point, authentication/authorization, routing
  • Playground: UI for real-world validation and testing of deployed models.
  • MCM Service: Manages and monitors multi-cluster resources (collects K8s resource status from each cluster, issues deployment commands).

Data Plane (in each K8s cluster)

  • KServe (Agent/Controller): Receives InferenceService (ISVC) specifications and creates the actual Pods/Containers.
  • Model Artifact Puller: Downloads models from Gitea and stores them in PVCs for reuse.
  • Serving Runtime (e.g., vLLM): Loads models and provides inference via runtime containers.
  • Health/Log/Metric Endpoints: Exposes status (Ready/NotReady), log streaming, and resource usage metrics.