Principles and Practices for Automation Scripts

Before implementing continuous integration, one critical prerequisite is automated builds.

An automated build must satisfy one condition: both humans and computers can automatically execute the application’s build, test, and deployment process via the command line. Automation scripts are the embodiment of turning automated builds into scripted form.

This post introduces some principles and practices for writing automation scripts.

Automation scripts should live in the same repository as the application code

Automation scripts are just as important as application code and should be under version control. Keeping them in the same repository enables better collaboration between developers and operations teams. It is recommended to place all automation scripts under an auto directory.

The final shape of a pipeline is not determined at the very beginning of software development. Initially, you may only need unit tests and code style checks. As the software evolves, steps like build, smoke-test, and deploy-to-production are gradually added. Both the pipeline and its automation scripts evolve incrementally.

Create an automation script for every step in every stage of the pipeline

A pipeline typically defines multiple stages, such as test -> build -> deploy. Stages must run serially. Within a single stage there can be one or more steps, which may run serially or in parallel. For example, in the test stage we might run unit tests and code style checks — since these have no logical dependency on each other, they can run in parallel.

You should create an automation script for every step in every stage of the pipeline. The benefits of doing so are:

  • It guarantees that each script does exactly one thing. A step within a stage already defines the functional boundary of the script, so the script is responsible only for that step.
  • Automation scripts should be named meaningfully. Because the functional boundary is well-defined and we know exactly what each step in the pipeline does, we can name scripts clearly. The recommended naming convention is verb + noun style — for example, if a script deploys to the QA environment, name it deploy-to-qa. It is recommended not to add any file extension and to use the shebang mechanism instead.

Running results should be identical on a local PC and on CI

An implicit requirement in this practice is that automation scripts must be executable locally, meaning you should be able to run your automation scripts on your local machine. You must ensure that the results of running a script locally match the results on CI. For instance, if you have a unit test script auto/test, running it during development helps you catch problems early. If the script passes locally, it must also pass on CI.

However, when the local environment and the CI environment differ, scripts may fail — for example, jq behaves differently on macOS versus Linux. In this case, consider using containerization to guarantee environment consistency.

Use the same script to deploy to all environments

During software development, you may have a test environment, a QA environment, a staging environment, a production environment, and so on. The difference between these environments is in their resource configuration. The automation script auto/deploy can deploy the application to different environments. As mentioned above, each step in the pipeline should have its own script, and deploying to the test environment versus deploying to production are clearly two distinct steps. You can put the test environment configuration into the auto/deploy-to-test script and the production environment configuration into auto/deploy-to-production. Both of these scripts will call auto/deploy to perform the actual deployment.

Ensure the deployment process is idempotent

Regardless of the state of the target environment at the start of a deployment, the deployment process should always bring the target environment to the same end state.

For the same commit, no matter how many times the build is triggered, the final result should always be consistent.

Automation scripts should be decoupled from the specific CI tool

During the software deployment process, we need to assign a version number to the application. CI tools happen to provide a build number — for example, Gitlab’s CI_PIPELINE_ID or Buildkite’s BUILDKITE_BUILD_NUMBER — so it is tempting to use these environment variables directly in scripts to obtain a version number. However, it is not recommended to use CI-provided environment variables directly. Doing so tightly couples your automation scripts to the CI tool, making it painful to switch CI tools in the future.

Use relative paths

Using absolute paths creates a hard dependency between the build process and a specific machine, making it difficult to use the scripts for setting up and maintaining other servers.

Eliminate manual steps

Eliminate manual or GUI-based steps for deploying software. The deployment process should be completed automatically by automation scripts — not by following a detailed document listing every step and every command to run. Manual deployments increase deployment costs and raise the likelihood of errors.

Do not put any build artifacts into version control

During a deployment, you may produce a Docker image, a Java WAR file, or a zip package of AWS Lambda code. These artifacts should be stored in their corresponding artifact storage systems, not checked back into the version control repository. If these artifacts are lost or need to be regenerated, they should be regenerated by triggering the pipeline again.