gpu-workload/t5/kubernetes/hpa.yaml (19 lines of code) (raw):

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: t5-inference spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: t5-inference minReplicas: 1 maxReplicas: 5 metrics: - type: Pods pods: metric: name: prometheus.googleapis.com|ts_queue_latency_microseconds|counter target: type: AverageValue averageValue: "7000000"