In Kubernetes, you can define the importance of Pods relative to others using PriorityClasses. This ensures critical services are scheduled and running even during resource constraints.
Key Points:
- Scheduling Priority: When enabled, the scheduler prioritizes pending Pods based on their assigned PriorityClass. Higher priority Pods are scheduled before lower priority ones if their resource requirements are met.
- Preemption: If a high-priority Pod cannot be scheduled due to lack of resources, the scheduler might evict lower-priority Pods to make room.
- PriorityClasses: Kubernetes comes with pre-defined classes like
system-cluster-critical
andsystem-node-critical
for critical system components. You can create custom classes for your applications. - PriorityClass Properties:
- Non-namespaced object: Applies cluster-wide.
- Priority Value: Higher integer value indicates higher priority (up to 1 billion). Values exceeding that are reserved for critical system Pods.
- Global Default: Only one PriorityClass can have this set, defining the default class for Pods without an explicit assignment.
- Preemption Policy:
Never
: Ensures priority but won’t preempt other Pods.- Default allows preemption for scheduling higher-priority Pods.
- Priority Admission Controller: Validates Pod creation based on the assigned PriorityClass and its existence.
Use Cases:
- Critical Services: Guarantee high availability for mission-critical Pods like metrics collectors, logging agents, or payment services by assigning them a high-priority class.
- Data Science Workloads: Prioritize data science jobs without disrupting existing workloads. Use a high-priority class with
preemptionPolicy: Never
to schedule them ahead of other queued Pods when resources become available.
Binding PriorityClass to a Pod:
Use the priorityClassName
field in the Pod spec to assign a PriorityClass. The priority admission controller enforces this setting.
Preemption Process:
- Pods enter a queue and wait for scheduling.
- The scheduler attempts to place a Pod on a suitable Node.
- If no Node meets the requirements, preemption logic activates for the pending Pod (let’s call it app-2).
- Preemption searches for Nodes where evicting lower-priority Pods would allow scheduling app-2.
- If a suitable Node is found, lower-priority Pods are evicted, making space for app-2.
- After eviction, app-2 is scheduled on the Node.
- The scheduler records the evicted Pods’ Node in the
nominatedNodeName
field of app-2’s status for tracking purposes.
By effectively utilizing PriorityClasses, you can manage resource allocation and ensure the smooth operation of critical services within your Kubernetes cluster.
Example YAML Manifests:
- High-Priority Class (Preemptible):
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
description: "This priority class should be used for critical pods only."
- High-Priority Class (Non-Preemptive):
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority-nonpreempting
value: 1000000
preemptionPolicy: Never
description: "This priority class will not cause other pods to be preempted."
- Pod Using High-Priority Class:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: production
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
priorityClassName: high-priority
These examples demonstrate how to create PriorityClasses with different preemption policies and assign them to Pods in your deployments.