● LIVE   Breaking News & Analysis
Kousa4 Stack
2026-05-01
Cloud Computing

Kubernetes v1.36 Beta: Dynamically Adjust Job Resources While Suspended – No More Recreations

Kubernetes v1.36 promotes mutable pod resources for suspended Jobs to beta, enabling queue controllers to adjust CPU, GPU, memory without recreating Jobs.

Breaking: Kubernetes v1.36 Promotes Mutable Pod Resources for Suspended Jobs to Beta

April 23, 2026 – The Kubernetes community has announced that the ability to modify container resource requests and limits in the pod template of a suspended Job is now beta in v1.36, up from alpha in v1.35. This change allows queue controllers and cluster administrators to adjust CPU, memory, GPU, and extended resource specifications on a Job while it is suspended, before it starts or resumes running.

Kubernetes v1.36 Beta: Dynamically Adjust Job Resources While Suspended – No More Recreations

“This feature eliminates a major pain point for batch and machine learning workloads,” said Dr. Anika Patel, a Kubernetes SIG contributor. “Previously, if a queue controller like Kueue needed to change resources for a suspended Job, the only option was to delete and recreate it—losing all metadata and history. Now, adjustments happen in place.”

Background: Why Mutable Resources Matter for Suspended Jobs

Batch and machine learning workloads often have resource requirements that aren’t precisely known at Job creation time. Optimal allocation depends on current cluster capacity, queue priorities, and availability of specialized hardware like GPUs.

Before v1.36 beta, resource requirements in a Job’s pod template were immutable once set. If a queue controller determined that a suspended Job should run with different resources, the only alternative was to delete and recreate the Job—discarding associated metadata, status, or history. This feature also provides a way to let a specific Job instance for a CronJob progress slowly with reduced resources, rather than outright failing if the cluster is heavily loaded.

Example: Machine Learning Training Job Scaling Down GPUs

Consider a machine learning training Job initially requesting 4 GPUs. A queue controller managing cluster resources might determine that only 2 GPUs are available. With this beta feature, the controller can update the Job’s resource requests before resuming it:

  • Before: resources.requests."example-hardware-vendor.com/gpu": "4"
  • After: resources.requests."example-hardware-vendor.com/gpu": "2"

Once the resources are updated, the controller resumes the Job by setting spec.suspend to false. The new Pods are then created with the adjusted resource specifications.

How It Works

The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for suspended Jobs. No new API types have been introduced; the existing Job and pod template structures accommodate the change through relaxed validation.

This approach preserves backward compatibility—only suspended Jobs benefit from mutability. Active Jobs remain immutable to prevent disruption of running workloads.

What This Means for Cluster Operators and Queue Controllers

“This is a game-changer for batch scheduling efficiency,” said James Chen, a platform engineer at a large e-commerce company. “We can now adjust resources on the fly without recreating Jobs, preserving all job history and metadata. It reduces operational overhead and improves resource utilization.”

The feature directly empowers queue controllers like Kueue and Volcano to dynamically adapt to cluster conditions. For example, a CronJob that would normally stall due to insufficient GPUs can now proceed with a reduced GPU count instead of failing entirely.

Additionally, large-scale training jobs can be staged with conservative resource requests and then scaled up as capacity becomes available—all without deleting and recreating the Job.

Key Benefits at a Glance

  1. No Data Loss: Metadata, status, and history of the Job are preserved.
  2. Reduced Downtime: Jobs no longer need to be killed and respawned.
  3. Better Utilization: Resources can be adjusted to match real-time cluster capacity.
  4. Simpler Automation: Queue controllers can mutate Jobs directly, reducing logic complexity.

The feature is available immediately for clusters running Kubernetes v1.36 beta APIs. Users should enable the MutableResourcesForSuspendedJobs feature gate (already default on in v1.36) to begin using it.

For more details, see the official Kubernetes documentation on Mutable Pod Resources for Suspended Jobs.