Kubernetes Job

A Job creates one or more Pods and ensures that those Pods terminate successfully.

A simple scenario: create a Job to guarantee that a Pod runs to completion reliably. If the first Pod fails or is deleted before finishing, the Job will start a new Pod.

A Simple Example

Compute π to 2000 decimal places. The manifest has been uploaded to GitHub at: https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/job/pi.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

In this example, restartPolicy is set to Never. Can you omit it? No — this field is required for a Job, because Jobs only support Never and OnFailure, while the default value of restartPolicy is Always.

Creating a Job

Run the command:

1
kubectl apply -f https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/job/pi.yaml

Process:

create

Job Types

Before describing Job types, two relevant fields need to be introduced:

.spec.completions: Specifies the desired number of successfully completed Pods for this Job. The Job is considered complete only after the specified number of Pods have run successfully. Default value is 1.

.spec.parallelism: Specifies the maximum number of Pods the Job should run at any given time. Default value is 1.

Non-parallel Job

Neither of the two fields above needs to be specified.

Only one Pod is started, unless that Pod fails. The Job enters the completed state as soon as that Pod terminates successfully.

The example above is a non-parallel Job.

Parallel Job with a Fixed Completion Count

.spec.completions must be set to a non-zero positive value, say N. The Job enters the completed state only when there is one successful Pod for each value in the range 1 to N.

.spec.parallelism is optional. In the example below, .spec.completions is 6 and .spec.parallelism is 2. GitHub link: https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/job/pi-fixed.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
completions: 6
parallelism: 2
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

The execution of this example is shown below:
fixed

At most 2 Pods are in the creating or running state at any time; only after those 2 finish can the next Pods be created. The Job completes once 6 Pods have terminated successfully.

Parallel Job with a Work Queue

.spec.completions must not be specified, and .spec.parallelism must be set to a non-negative integer N. The Job starts N Pods. When one Pod terminates successfully and all other Pods also terminate, the Job enters the completed state. Once one Pod exits successfully, the remaining Pods should not continue doing any work or producing any output for this Job — they should all begin their exit process.

In the example below, .spec.completions is not specified and .spec.parallelism is 2. GitHub link: https://raw.githubusercontent.com/chengqing-su/kubernetes-learning/master/job/pi-work.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
parallelism: 2
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

The execution of this example is shown below:
work-queue

Cleanup

After a Job completes, no new Pods are created, but the existing Pods are not deleted either. This allows you to inspect the logs of completed containers to check for errors, warnings, or other diagnostic output. The Job object itself is also retained, but in most cases a system does not need completed Jobs. Keeping them around puts pressure on the API server. How do you clean up these Jobs?

  • If the Job is directly managed by a CronJob, cleanup can be handled by specifying the CronJob’s cleanup policy.

  • Delete manually. For example, all of the examples above can be cleaned up with kubectl delete jobs pi.

  • (This feature is still in alpha) Use the TTL mechanism to automatically delete completed or failed Jobs.