GitHub Actions Reusable Workflow: A Complete Implementation of Zero-Config Unified CI/CD

GitHub Actions Reusable Workflow: A Complete Implementation of Zero-Config Unified CI/CD

This is the third post in the “Unified CI/CD Pipeline Governance” series. This article provides an in-depth breakdown of how a platform team uses Reusable Workflows to achieve “zero-config onboarding for business repositories” — covering architecture design, JWT/OIDC authentication, multi-environment routing, container builds, and lessons learned. This article draws from a real-world deployment covering 500+ repositories in production.


Part 1: Architecture Overview

The .github Repository as a Platform Boundary

Within a GitHub Organization, the special repository named .github serves two roles: hosting Organization-level default Community Health Files (such as CODE_OF_CONDUCT.md), and storing Reusable Workflow files that all repositories in the Org can call.

The platform team centralizes all CI/CD logic in the .github repository, establishing a clear platform boundary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
OrgA/.github
├── .github/
│ └── workflows/
│ ├── platform-ci-core.yml # Orchestrator / entry point
│ ├── platform-ci-check.yml # lint + unit tests
│ ├── platform-ci-security.yml # static code scanning
│ ├── platform-ci-build.yml # container build + multi-registry push + signing
│ ├── platform-ci-prepare-release.yml
│ ├── platform-ci-release.yml
│ └── platform-ci-deploy.yml
└── actions/
├── prepare-release/
├── release/
└── security-scan/

All 500+ business repositories call this same set of workflow files. Each business repository only needs to maintain a minimal ci.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# .github/workflows/ci.yml (business repository, ~15 lines)
name: CI

on:
push:
branches: [trunk, releases/latest]
pull_request:
branches: [trunk, releases/latest]
pull_request_target:
branches: [trunk, releases/latest]

jobs:
pipeline:
# fork PRs use pull_request_target, same-repo PRs use pull_request
if: >-
github.event_name != 'pull_request_target' ||
github.event.pull_request.head.repo.full_name != github.repository
uses: OrgA/.github/.github/workflows/platform-ci-core.yml@main
# no with: block — all config is read from .ci-config/config.yaml
# no secrets: block — credentials are handled internally by the platform pipeline

Why pull_request_target Is Necessary

The pull_request event cannot access Org Secrets in fork PR scenarios, causing all jobs that require Vault credentials to fail. Switching to pull_request_target runs the workflow in the base repository’s context, granting access to Secrets. However, this introduces security risks — see the lessons learned section for details.


Workflow File Responsibilities

FileResponsibility
platform-ci-core.ymlOrchestrator, contains the config job, calls all downstream reusable workflows
platform-ci-check.ymlpylint, shellcheck, unit tests
platform-ci-security.ymlSonarQube / Semgrep scanning
platform-ci-build.ymlbuildx build, multi-registry push, image signing
platform-ci-prepare-release.ymlCalculate the next semantic version number
platform-ci-release.ymlCreate Git tag, create GitHub Release
platform-ci-deploy.ymlStatus reporting, Docker info push, health check

Part 2: The config Job — The Key Design That Replaces the Groovy Merger

Why a Dedicated config Job Is Needed

GitHub Actions workflow structure is static: values in with: fields must have their types determined at workflow parse time, and if: conditions also have syntax constraints at the job level. The platform cannot dynamically merge configuration at runtime the way Jenkins Groovy can.

The solution is to set up a config job in the orchestrator that reads the business repository’s .ci-config/config.yaml, outputs all derived configuration as outputs, and lets downstream jobs consume them via needs.config.outputs.*.

1
2
3
4
5
6
7
8
9
.ci-config/config.yaml (business repository declaration)


config job (parse + compute, runs once)

├──► platform-ci-check.yml (python_version, pylint_sources)
├──► platform-ci-security.yml (sonar_project_key)
├──► platform-ci-build.yml (dockerfile, image_url_list)
└──► platform-ci-deploy.yml (content_name, image_url_list)

The Complete Shell Implementation of the config Job

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# platform-ci-core.yml (config job excerpt)
jobs:
config:
name: Resolve Config
runs-on: [self-hosted, linux]
outputs:
python_version: ${{ steps.resolve.outputs.python_version }}
pylint_module_paths: ${{ steps.resolve.outputs.pylint_module_paths }}
pylint_rc_file: ${{ steps.resolve.outputs.pylint_rc_file }}
has_unit_test: ${{ steps.resolve.outputs.has_unit_test }}
unit_test_script: ${{ steps.resolve.outputs.unit_test_script }}
dockerfile: ${{ steps.resolve.outputs.dockerfile }}
image_url_list: ${{ steps.resolve.outputs.image_url_list }}
sonar_project_key: ${{ steps.resolve.outputs.sonar_project_key }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha || github.sha }}

- name: Install yq
run: |
if ! command -v yq &>/dev/null; then
curl -fsSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 \
-o /usr/local/bin/yq && chmod +x /usr/local/bin/yq
fi

- name: Parse .ci-config/config.yaml
id: resolve
env:
REPO_NAME: ${{ github.event.repository.name }}
REPO_OWNER: ${{ github.repository_owner }}
run: |
CFG=".ci-config/config.yaml"

# Python version: extract semantic version from image tag
# e.g.: platform-registry.example.com/python:3.14 → 3.14
RAW_IMAGE=$(yq '.containers[] | select(.name=="project-runtime") | .image' \
"${CFG}" 2>/dev/null || echo "")
PYTHON_VERSION=$(echo "${RAW_IMAGE}" | \
grep -oE '[0-9]+\.[0-9]+(\.[0-9]+)?$' || echo "3.x")
[ -z "${PYTHON_VERSION}" ] && PYTHON_VERSION="3.x"

# pylint sourceSets → space-separated string
PYLINT_PATHS=$(yq '.jobs[] | select(.name=="lint") | .steps[].pyLint.sourceSets[]' \
"${CFG}" 2>/dev/null | tr '\n' ' ' | xargs || echo "")

# pylint rcFile: filter out yq's null output
PYLINT_RC=$(yq '.jobs[] | select(.name=="lint") | .steps[].pyLint.rcFile' \
"${CFG}" 2>/dev/null | grep -v '^null$' || echo "")

# Unit test detection
UNIT_TEST_SCRIPT=$(yq '.jobs[] | select(.name=="unit-test") | .steps[].script.workspace' \
"${CFG}" 2>/dev/null | grep -v '^null$' | head -1 || echo "")
HAS_UNIT_TEST="false"
[ -n "${UNIT_TEST_SCRIPT}" ] && HAS_UNIT_TEST="true"

# containerBuild config
DOCKERFILE=$(yq '.containerBuild.path' "${CFG}" 2>/dev/null | grep -v '^null$' || echo "")
IMAGE_TYPE=$(yq '.containerBuild.registryType' "${CFG}" 2>/dev/null | \
grep -v '^null$' || echo "internet")

# Compute target registry based on image_type
case "${IMAGE_TYPE}" in
private) PRIMARY_REG="internal-private.platform-registry.example.com" ;;
public) PRIMARY_REG="internal-public.platform-registry.example.com" ;;
internet) PRIMARY_REG="internet.platform-registry.example.com" ;;
*) PRIMARY_REG="internal.platform-registry.example.com" ;;
esac
PRIMARY_URL="${PRIMARY_REG}/${REPO_OWNER}/${REPO_NAME}"

# internet type also pushes to the public registry
if [ "${IMAGE_TYPE}" = "internet" ]; then
PUBLIC_URL="public.platform-registry.example.com/${REPO_OWNER}/${REPO_NAME}"
IMAGE_URL_LIST="${PRIMARY_URL},${PUBLIC_URL}"
else
IMAGE_URL_LIST="${PRIMARY_URL}"
fi
IMAGE_URL_LIST="${IMAGE_URL_LIST,,}" # convert to lowercase

echo "python_version=${PYTHON_VERSION}" >> "$GITHUB_OUTPUT"
echo "pylint_module_paths=${PYLINT_PATHS}" >> "$GITHUB_OUTPUT"
echo "pylint_rc_file=${PYLINT_RC}" >> "$GITHUB_OUTPUT"
echo "has_unit_test=${HAS_UNIT_TEST}" >> "$GITHUB_OUTPUT"
echo "unit_test_script=${UNIT_TEST_SCRIPT}" >> "$GITHUB_OUTPUT"
echo "dockerfile=${DOCKERFILE}" >> "$GITHUB_OUTPUT"
echo "image_url_list=${IMAGE_URL_LIST}" >> "$GITHUB_OUTPUT"

# Job Summary: config resolution table (debug-friendly, critical for supporting 500+ repos)
echo "### Config resolved from .ci-config/config.yaml" >> "$GITHUB_STEP_SUMMARY"
echo "| Key | Value |" >> "$GITHUB_STEP_SUMMARY"
echo "|-----|-------|" >> "$GITHUB_STEP_SUMMARY"
echo "| python_version | \`${PYTHON_VERSION}\` |" >> "$GITHUB_STEP_SUMMARY"
echo "| pylint_module_paths | \`${PYLINT_PATHS}\` |" >> "$GITHUB_STEP_SUMMARY"
echo "| has_unit_test | \`${HAS_UNIT_TEST}\` |" >> "$GITHUB_STEP_SUMMARY"
echo "| dockerfile | \`${DOCKERFILE:-<none>}\` |" >> "$GITHUB_STEP_SUMMARY"
echo "| image_url_list | \`${IMAGE_URL_LIST}\` |" >> "$GITHUB_STEP_SUMMARY"

The importance of Job Summary is amplified at 500+ scale: when a business team reports that “CI behavior doesn’t match expectations,” the platform team needs to quickly determine whether the issue is a config.yaml parsing problem or a pipeline logic problem. The config table in the Step Summary turns that diagnosis from “needing to dig through logs” into “just open the PR page.”

Downstream Consumption

1
2
3
4
5
6
7
8
build:
name: Build Container Image
needs: [config, security]
if: ${{ needs.config.outputs.dockerfile != '' }}
uses: OrgA/.github/.github/workflows/platform-ci-build.yml@main
with:
dockerfile: ${{ needs.config.outputs.dockerfile }}
image_url_list: ${{ needs.config.outputs.image_url_list }}

Part 3: Deep Dive into the JWT/OIDC Credential Architecture

The Nature of the job_workflow_ref Claim

The GitHub Actions OIDC token contains a key claim: job_workflow_ref. Its value is the path of the called workflow file, not the name of the calling repository.

When business repository OrgA/my-app (one of 500) calls platform-ci-build.yml:

1
2
job_workflow_ref = "OrgA/.github/.github/workflows/platform-ci-build.yml@refs/heads/main"
repository = "OrgA/my-app" ← this is the business repo, not .github

Vault’s bound_claims binds on job_workflow_ref — regardless of which business repository triggers it, as long as the platform workflow file is being called, authentication succeeds. 500 repositories, 1 Vault role, 0 static credentials.

The Complete Authentication Flow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Business repo ci.yml (any one of 500)
│ (uses: OrgA/.github/...platform-ci-build.yml@main)

platform-ci-build.yml (job level)
│ permissions:
│ id-token: write ← must be declared at the sub-workflow job level

├─1─► GHE OIDC Endpoint (https://ghe.example.com/_services/token)
│ └── returns JWT containing the job_workflow_ref claim

├─2─► vault-action (method: jwt)
│ ├── POST /v1/auth/jwt/login
│ │ { jwt: <oidc_token>, role: "platform-ci" }
│ │
│ └── Vault verification flow:
│ ├── fetch OIDC Discovery Document (JWKS endpoint)
│ ├── verify JWT signature
│ ├── check bound_claims (job_workflow_ref glob match)
│ ├── check @refs/heads/main suffix (branch lock)
│ └── return batch token (TTL: 5min)

└─3─► use batch token to read KV secrets
(registry credentials, code signing certificates, etc.)

Vault Role JSON Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"role_type": "jwt",
"bound_audiences": ["https://vault.example.com"],
"bound_claims_type": "glob",
"bound_claims": {
"job_workflow_ref": "OrgA/.github/.github/workflows/*@refs/heads/main"
},
"user_claim": "repository",
"claim_mappings": {
"repository": "repository",
"ref": "ref",
"workflow": "workflow",
"job_workflow_ref": "job_workflow_ref"
},
"policies": ["platform-ci"],
"ttl": "5m",
"max_ttl": "10m",
"token_type": "batch",
"token_no_default_policy": false
}

At 500+ repository scale, the security value of this design is especially significant:

  • 500 business repositories, none storing any credentials
  • Even if any business repository is compromised, the attacker still cannot obtain platform credentials (job_workflow_ref won’t match)
  • Every modification to the platform workflow files must go through a code review on the main branch; @refs/heads/main is enforced at the Vault layer

Note: The Vault CLI does not support passing map-type parameters (bound_claims, claim_mappings) via key=value. You must use JSON heredoc stdin format:

1
2
3
vault write auth/jwt/role/platform-ci - <<'EOF'
{ ...full JSON... }
EOF

Three Key Properties of Batch Tokens

  1. Non-renewable: vault token renew has no effect on batch tokens; they expire when the TTL is reached
  2. Non-queryable: they do not appear in vault list auth/token/accessors; the thousands of tokens generated daily by 500 repositories leave no queryable trace
  3. 5-minute TTL: sufficient to complete a single vault-action call; even if leaked after expiry, they cannot be used

OIDC Issuer Differences Between GHE and github.com

1
2
github.com:  https://token.actions.githubusercontent.com
GHE: https://your-ghe-hostname/_services/token

Vault’s oidc_discovery_url must point to the correct issuer; otherwise Vault cannot fetch the correct JWKS endpoint to verify signatures:

1
2
3
4
# GHE environment
vault write auth/jwt/config \
oidc_discovery_url="https://ghe.example.com/_services/token" \
bound_issuer="https://ghe.example.com/_services/token"

permissions: id-token: write Must Be Declared at the Job Level of Each Sub-Workflow

The permissions declared in the orchestrator platform-ci-core.yml are not automatically propagated to reusable workflows called via uses:. Each job that needs to obtain an OIDC token must declare it independently:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# platform-ci-build.yml
jobs:
build:
runs-on: [self-hosted, linux]
permissions:
id-token: write # must be declared here, cannot rely on the orchestrator
contents: read
steps:
- uses: hashicorp/vault-action@v3
with:
url: ${{ vars.VAULT_URL }}
namespace: ${{ vars.VAULT_NAMESPACE }}
method: jwt
role: ${{ vars.VAULT_ROLE }}
jwtGithubAudience: ${{ vars.VAULT_AUDIENCE }}
secrets: |
secret/data/platform/${{ vars.VAULT_ENV }}/registry username | REGISTRY_USER ;
secret/data/platform/${{ vars.VAULT_ENV }}/registry password | REGISTRY_PASS

Part 4: Multi-Environment Routing — The Org Variables Approach

Design Motivation

The traditional approach embeds if/else environment checks directly in the workflow, tightly coupling the workflow files to environments. At 500+ repository scale, any environment configuration change requires modifying the platform workflow files and re-testing.

The platform team adopted an Organization Variable injection approach: when each Org is created, a platform script writes environment-specific variables once. The workflow code contains absolutely no environment branching logic, and uses exactly the same code across all three environments (dev/stg/prod).

The Six Org Variables

VariableOrgA-DevOrgA-StgOrgA (prod)
VAULT_URLhttps://vault.example.comsamesame
VAULT_NAMESPACEplatform/devplatform/stgplatform/prod
VAULT_ROLEplatform-cisamesame
VAULT_AUDIENCEhttps://vault.example.comsamesame
VAULT_ENVdevstgprod
API_URLhttps://api.platform-dev.example.comhttps://api.platform-stg.example.comhttps://api.platform.example.com

In the secrets: field of vault-action, ${{ vars.VAULT_ENV }} is automatically interpolated:

1
2
secrets: |
secret/data/platform/${{ vars.VAULT_ENV }}/registry username | REGISTRY_USER

In OrgA-Devsecret/data/platform/dev/registry
In OrgAsecret/data/platform/prod/registry

Not a single line of workflow code changes; all three environments (covering 500+ repositories) route automatically.

The Idempotent apply-org-variables.sh Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env bash
# Idempotent write of Organization Variables
# Already exists → PATCH (update), doesn't exist → POST (create)

set -euo pipefail

apply_org_vars() {
local ORG=$1
local ENV=$2

declare -A VARS=(
["VAULT_URL"]="https://vault.example.com"
["VAULT_NAMESPACE"]="platform/${ENV}"
["VAULT_ROLE"]="platform-ci"
["VAULT_AUDIENCE"]="https://vault.example.com"
["VAULT_ENV"]="${ENV}"
["API_URL"]="https://api.platform-${ENV}.example.com"
)

for KEY in "${!VARS[@]}"; do
VALUE="${VARS[$KEY]}"
HTTP_STATUS=$(gh api "orgs/${ORG}/actions/variables/${KEY}" \
-i --silent 2>&1 | head -1 | awk '{print $2}')

if [ "${HTTP_STATUS}" = "200" ]; then
gh api --method PATCH "orgs/${ORG}/actions/variables/${KEY}" \
-f value="${VALUE}" -f visibility="all" --silent
echo "[UPDATE] ${ORG} / ${KEY}=${VALUE}"
else
gh api --method POST "orgs/${ORG}/actions/variables" \
-f name="${KEY}" -f value="${VALUE}" -f visibility="all" --silent
echo "[CREATE] ${ORG} / ${KEY}=${VALUE}"
fi
done
}

apply_org_vars "OrgA-Dev" "dev"
apply_org_vars "OrgA-Stg" "stg"
apply_org_vars "OrgA" "prod"

Part 5: Engineering Details of Container Builds

Multi-Registry Push Design

image_url_list is a comma-separated list of URLs:

1
2
3
4
5
6
internet type:
internet.platform-registry.example.com/OrgA/my-app
+ public.platform-registry.example.com/OrgA/my-app

private type:
internal-private.platform-registry.example.com/OrgA/my-app (only this one)

Cross-Registry Copying with docker buildx imagetools create

After the build completes, imagetools create copies the manifest directly at the registry layer without pulling the image to the runner, saving bandwidth and time. At the scale of 500+ repositories with high-frequency builds, the cumulative savings from this optimization are significant:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Build and push to the primary registry
docker buildx build \
--platform linux/amd64 \
--tag "${PRIMARY_URL}:${TAG}" \
--push \
--file "${DOCKERFILE}" .

# Cross-registry manifest copy (manifest only, never touches the runner)
IFS=',' read -ra ALL_URLS <<< "${IMAGE_URL_LIST}"
PRIMARY_URL="${ALL_URLS[0]}"

for EXTRA_URL in "${ALL_URLS[@]}"; do
[ "${EXTRA_URL}" = "${PRIMARY_URL}" ] && continue

EXTRA_REGISTRY=$(echo "${EXTRA_URL}" | cut -d'/' -f1)
if echo "${EXTRA_REGISTRY}" | grep -q 'public\.platform'; then
echo "${PUBLIC_REGISTRY_PASS}" | docker login "${EXTRA_REGISTRY}" \
-u "${PUBLIC_REGISTRY_USER}" --password-stdin
else
echo "${REGISTRY_PASS}" | docker login "${EXTRA_REGISTRY}" \
-u "${REGISTRY_USER}" --password-stdin
fi

docker buildx imagetools create --tag "${EXTRA_URL}:${TAG}" "${PRIMARY_URL}:${TAG}"
echo "Pushed ${EXTRA_URL}:${TAG}"
done

Image Signing (Signify)

The platform uses an internal Signify service for image signing with mTLS client certificate authentication. All tags across all registries require signing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
IFS=',' read -ra ALL_URLS <<< "${IMAGE_URL_LIST}"
IFS=',' read -ra ALL_TAGS <<< "${TAG_LIST}"

for URL in "${ALL_URLS[@]}"; do
for TAG in "${ALL_TAGS[@]}"; do
IMAGE="${URL}:${TAG}"
DIGEST=$(docker buildx imagetools inspect "${IMAGE}" \
--format '{{json .Manifest}}' | jq -r '.digest' | sed 's/^sha256://')
MANIFEST=$(docker manifest inspect "${IMAGE}" 2>/dev/null)
BYTE_SIZE=$(echo "${MANIFEST}" | jq -r '.config.size // 0')

GUN=$(echo "${IMAGE}" | rev | cut -d':' -f2- | rev)
PAYLOAD="{\"trustedCollections\":[{\"gun\":\"${GUN}\",\"targets\":[{\"name\":\"${TAG}\",\"digest\":\"${DIGEST}\",\"byteSize\":${BYTE_SIZE}}]}]}"

curl -sf -X POST \
--cert "${CERT_FILE}" \
--key "${KEY_FILE}" \
--pass "${KEY_PASS}" \
"${SIGNIFY_ENDPOINT}/trusted-collections/publish" \
-H "Content-Type: application/json" \
-d "${PAYLOAD}" || echo "Warning: signing failed for ${IMAGE}, continuing"
done
done

Compatibility Issues with upload-artifact@v4 on GHE

Certain versions of GitHub Enterprise Server do not support the new API used by actions/upload-artifact@v4:

1
2
Error: GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and
download-artifact@v4+ are not currently supported on GHES.

At 500+ repository scale, the blast radius of this kind of compatibility issue is total — you must downgrade to v3:

1
2
3
4
5
6
7
- uses: actions/upload-artifact@v3
with:
name: lint-report
path: reports/
- uses: actions/download-artifact@v3
with:
name: lint-report

Part 6: Per-Repo Secrets — Vault Enterprise Secrets Sync

The Division of Two Credential Types

Credential TypeHow It’s ObtainedExampleSecurity Level
Platform shared credentialsFetched from Vault at runtime via JWT/OIDCRegistry credentials, signing certificatesHigh (cross-Org permissions)
Business repository-specific credentialsPushed as GitHub Secrets by Vault Secrets SyncDatabase connection strings, business API keysMedium (single-repository permissions)

At 500+ repository scale, per-repo Secrets management requires automation — you cannot manually configure 500 repositories. Vault Enterprise Secrets Sync provides this capability.

Vault Enterprise Secrets Sync Configuration

1
2
3
4
5
6
7
8
9
10
11
# 1. Create a GitHub Actions sync destination (Fine-Grained PAT, only needs secrets:write)
vault write sys/sync/destinations/github-actions/my-app-prod \
access_token="github_pat_xxxx" \
repository_owner="OrgA" \
repository_name="my-app" \
secret_name_template="{{.SecretKey | uppercase}}"

# 2. Create an Association: KV path → GitHub Actions Secret
vault write sys/sync/associations/my-app-prod \
mount="secret" \
secret_name="apps/my-app/prod/db"

How Automatic Rotation Sync Works

1
2
3
4
5
6
7
8
9
10
Vault KV secret updated (manually or via dynamic credentials)


Vault Sync Engine (background polling, ~5 minute interval)


GitHub API: PUT /repos/OrgA/my-app/actions/secrets/DB_PASSWORD


GitHub Secret automatically updated (takes effect on next workflow run)

At 500+ repository scale, credential rotation no longer requires notifying each repository owner — the Vault Sync Engine automatically handles the push. The platform team only manages the KV in Vault, and business repository Secrets are automatically synced.


Part 7: Observability at 500+ Scale

When 500+ repositories are running CI simultaneously, observability is not a “nice to have” — it is foundational infrastructure for platform operations.

Runner Capacity Monitoring

1
2
3
4
5
6
7
8
9
# Record runner info at the start of each job
- name: Log runner info
run: |
echo "Runner: ${{ runner.name }}"
echo "OS: ${{ runner.os }}"
echo "Arch: ${{ runner.arch }}"
echo "Repo: ${{ github.repository }}"
echo "Event: ${{ github.event_name }}"
echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)"

CI Health Dashboard

The platform team needs to be able to answer:

  • Over the past 7 days, which 20 repositories had the highest CI failure rates?
  • Average CI duration trend (is there performance regression)?
  • Security scan coverage (which repositories haven’t had a CI run in over 30 days)?

This data can be collected via the GitHub API or through custom telemetry emitted during CI runs.

Bulk Compliance Check Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env bash
# Check CI onboarding status for all repositories
echo "Checking all repositories under ${ORG}..."
gh repo list "${ORG}" --limit 1000 --json name \
| jq -r '.[].name' \
| while read repo; do
# Check for .ci-config/config.yaml
if ! gh api "repos/${ORG}/${repo}/contents/.ci-config/config.yaml" \
--silent &>/dev/null; then
echo " [missing config] ${repo}"
continue
fi
# Check if the platform workflow is being called
if ! gh api "repos/${ORG}/${repo}/contents/.github/workflows/ci.yml" \
--silent 2>/dev/null | grep -q 'platform-ci-core.yml'; then
echo " [not onboarded to platform CI] ${repo}"
fi
done

Part 8: Lessons Learned

1. Two Ways to Handle Null Values in yq

yq outputs the string null by default when a field doesn’t exist, causing downstream jobs to receive the literal string "null". Across 500+ repositories with significant variation in config.yaml writing styles, this type of issue comes up frequently:

1
2
3
4
5
# Option A: yq inline default value (recommended for simple scalar fields)
RCFILE=$(yq '.jobs[].pyLint.rcFile // ""' config.yaml)

# Option B: shell filtering (handles all null output scenarios)
RCFILE=$(yq '.jobs[].pyLint.rcFile' config.yaml | grep -v '^null$' || echo "")

2. Multi-Line Values in $GITHUB_OUTPUT Must Use Heredoc

1
2
3
4
5
6
7
8
9
# Wrong: newlines are truncated
echo "changelog=${MULTI_LINE_TEXT}" >> "$GITHUB_OUTPUT"

# Correct: heredoc format
{
echo "changelog<<EOF"
echo "${MULTI_LINE_TEXT}"
echo "EOF"
} >> "$GITHUB_OUTPUT"

3. with: Fields in Reusable Workflows Only Support Strings

1
2
3
4
5
6
7
8
inputs:
has_unit_test:
type: string # can only be string, not boolean

steps:
- name: Run unit tests
if: inputs.has_unit_test == 'true' # string comparison
run: pytest

4. The Security Trap of pull_request_target + checkout

pull_request_target runs in the base repository’s context, but actions/checkout defaults to checking out the base branch, not the PR’s code.

You must explicitly specify the PR head SHA:

1
2
3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}

Security note: the code being checked out belongs to the fork. The Secrets access logic must be isolated in a separate job from the code checkout to prevent malicious scripts in the fork from reading Secrets.

5. Correct Syntax for needs.config.outputs in if: Conditions

1
2
3
4
5
6
7
build:
needs: config
# Correct: full expression syntax
if: ${{ needs.config.outputs.dockerfile != '' }}

# Wrong: without ${{ }} wrapping, a non-empty string is not automatically truthy
# if: needs.config.outputs.dockerfile

6. Runner Queue Pressure During 500+ Concurrent Triggers

During the morning commit rush, the runner queue can back up with hundreds of jobs. You need to monitor queue_time (the time from when a job enters the queue to when it starts running) and adjust runner count accordingly. Queue waits exceeding 5 minutes significantly degrade the developer experience.


Summary

At 500+ repository scale, the core value of GitHub Actions Reusable Workflows:

  1. Working around static structure constraints: the config job acts as a dynamic configuration middleware layer, replacing Jenkins Groovy’s runtime merge capability
  2. JWT/OIDC zero long-lived credentials: none of the 500 repositories stores any credentials; batch tokens are non-queryable, maximizing the security boundary
  3. Multi-environment zero-code routing: Organization Variable injection enables three environments to use exactly the same workflow code
  4. Multi-registry container builds: imagetools create copies at the manifest layer, saving bandwidth across thousands of daily builds from 500+ repositories
  5. Layered credential management: platform shared credentials use OIDC; business-specific credentials use Secrets Sync; credential lifecycle for 500 repositories is fully automated

Each business repository ultimately only needs to maintain a 15-line ci.yml and a .ci-config/config.yaml — this “minimal onboarding model” is identical for the 1st repository and the 500th, which is precisely the design goal of a scalable platform.