GitHub Actions Reusable Workflow: A Complete Implementation of Zero-Config Unified CI/CD

Posted on 2026-06-21 In DevOps , CI/CD Views: Word count in article: 3.3k Reading time ≈ 12 mins.

DevOps, Cloud, Kubernetes — personal tech notes by ChengQing Su

GitHub Actions Reusable Workflow: A Complete Implementation of Zero-Config Unified CI/CD

This is the third post in the “Unified CI/CD Pipeline Governance” series. This article provides an in-depth breakdown of how a platform team uses Reusable Workflows to achieve “zero-config onboarding for business repositories” — covering architecture design, JWT/OIDC authentication, multi-environment routing, container builds, and lessons learned. This article draws from a real-world deployment covering 500+ repositories in production.

Part 1: Architecture Overview

The `.github` Repository as a Platform Boundary

Within a GitHub Organization, the special repository named .github serves two roles: hosting Organization-level default Community Health Files (such as CODE_OF_CONDUCT.md), and storing Reusable Workflow files that all repositories in the Org can call.

The platform team centralizes all CI/CD logic in the .github repository, establishing a clear platform boundary:

OrgA/.github
├── .github/
│   └── workflows/
│       ├── platform-ci-core.yml          # Orchestrator / entry point
│       ├── platform-ci-check.yml         # lint + unit tests
│       ├── platform-ci-security.yml      # static code scanning
│       ├── platform-ci-build.yml         # container build + multi-registry push + signing
│       ├── platform-ci-prepare-release.yml
│       ├── platform-ci-release.yml
│       └── platform-ci-deploy.yml
└── actions/
    ├── prepare-release/
    ├── release/
    └── security-scan/

All 500+ business repositories call this same set of workflow files. Each business repository only needs to maintain a minimal ci.yml:

# .github/workflows/ci.yml (business repository, ~15 lines)
name: CI

on:
  push:
    branches: [trunk, releases/latest]
  pull_request:
    branches: [trunk, releases/latest]
  pull_request_target:
    branches: [trunk, releases/latest]

jobs:
  pipeline:
    # fork PRs use pull_request_target, same-repo PRs use pull_request
    if: >-
      github.event_name != 'pull_request_target' ||
      github.event.pull_request.head.repo.full_name != github.repository
    uses: OrgA/.github/.github/workflows/platform-ci-core.yml@main
    # no with: block — all config is read from .ci-config/config.yaml
    # no secrets: block — credentials are handled internally by the platform pipeline

Why `pull_request_target` Is Necessary

The pull_request event cannot access Org Secrets in fork PR scenarios, causing all jobs that require Vault credentials to fail. Switching to pull_request_target runs the workflow in the base repository’s context, granting access to Secrets. However, this introduces security risks — see the lessons learned section for details.

Workflow File Responsibilities

File	Responsibility
`platform-ci-core.yml`	Orchestrator, contains the `config` job, calls all downstream reusable workflows
`platform-ci-check.yml`	pylint, shellcheck, unit tests
`platform-ci-security.yml`	SonarQube / Semgrep scanning
`platform-ci-build.yml`	buildx build, multi-registry push, image signing
`platform-ci-prepare-release.yml`	Calculate the next semantic version number
`platform-ci-release.yml`	Create Git tag, create GitHub Release
`platform-ci-deploy.yml`	Status reporting, Docker info push, health check

Part 2: The config Job — The Key Design That Replaces the Groovy Merger

Why a Dedicated config Job Is Needed

GitHub Actions workflow structure is static: values in with: fields must have their types determined at workflow parse time, and if: conditions also have syntax constraints at the job level. The platform cannot dynamically merge configuration at runtime the way Jenkins Groovy can.

The solution is to set up a config job in the orchestrator that reads the business repository’s .ci-config/config.yaml, outputs all derived configuration as outputs, and lets downstream jobs consume them via needs.config.outputs.*.

.ci-config/config.yaml (business repository declaration)
        │
        ▼
   config job (parse + compute, runs once)
        │
        ├──► platform-ci-check.yml   (python_version, pylint_sources)
        ├──► platform-ci-security.yml (sonar_project_key)
        ├──► platform-ci-build.yml   (dockerfile, image_url_list)
        └──► platform-ci-deploy.yml  (content_name, image_url_list)

The Complete Shell Implementation of the `config` Job

# platform-ci-core.yml (config job excerpt)
jobs:
  config:
    name: Resolve Config
    runs-on: [self-hosted, linux]
    outputs:
      python_version:       ${{ steps.resolve.outputs.python_version }}
      pylint_module_paths:  ${{ steps.resolve.outputs.pylint_module_paths }}
      pylint_rc_file:       ${{ steps.resolve.outputs.pylint_rc_file }}
      has_unit_test:        ${{ steps.resolve.outputs.has_unit_test }}
      unit_test_script:     ${{ steps.resolve.outputs.unit_test_script }}
      dockerfile:           ${{ steps.resolve.outputs.dockerfile }}
      image_url_list:       ${{ steps.resolve.outputs.image_url_list }}
      sonar_project_key:    ${{ steps.resolve.outputs.sonar_project_key }}
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha || github.sha }}

      - name: Install yq
        run: |
          if ! command -v yq &>/dev/null; then
            curl -fsSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 \
              -o /usr/local/bin/yq && chmod +x /usr/local/bin/yq
          fi

      - name: Parse .ci-config/config.yaml
        id: resolve
        env:
          REPO_NAME:  ${{ github.event.repository.name }}
          REPO_OWNER: ${{ github.repository_owner }}
        run: |
          CFG=".ci-config/config.yaml"

          # Python version: extract semantic version from image tag
          # e.g.: platform-registry.example.com/python:3.14 → 3.14
          RAW_IMAGE=$(yq '.containers[] | select(.name=="project-runtime") | .image' \
            "${CFG}" 2>/dev/null || echo "")
          PYTHON_VERSION=$(echo "${RAW_IMAGE}" | \
            grep -oE '[0-9]+\.[0-9]+(\.[0-9]+)?$' || echo "3.x")
          [ -z "${PYTHON_VERSION}" ] && PYTHON_VERSION="3.x"

          # pylint sourceSets → space-separated string
          PYLINT_PATHS=$(yq '.jobs[] | select(.name=="lint") | .steps[].pyLint.sourceSets[]' \
            "${CFG}" 2>/dev/null | tr '\n' ' ' | xargs || echo "")

          # pylint rcFile: filter out yq's null output
          PYLINT_RC=$(yq '.jobs[] | select(.name=="lint") | .steps[].pyLint.rcFile' \
            "${CFG}" 2>/dev/null | grep -v '^null$' || echo "")

          # Unit test detection
          UNIT_TEST_SCRIPT=$(yq '.jobs[] | select(.name=="unit-test") | .steps[].script.workspace' \
            "${CFG}" 2>/dev/null | grep -v '^null$' | head -1 || echo "")
          HAS_UNIT_TEST="false"
          [ -n "${UNIT_TEST_SCRIPT}" ] && HAS_UNIT_TEST="true"

          # containerBuild config
          DOCKERFILE=$(yq '.containerBuild.path' "${CFG}" 2>/dev/null | grep -v '^null$' || echo "")
          IMAGE_TYPE=$(yq '.containerBuild.registryType' "${CFG}" 2>/dev/null | \
            grep -v '^null$' || echo "internet")

          # Compute target registry based on image_type
          case "${IMAGE_TYPE}" in
            private)  PRIMARY_REG="internal-private.platform-registry.example.com" ;;
            public)   PRIMARY_REG="internal-public.platform-registry.example.com" ;;
            internet) PRIMARY_REG="internet.platform-registry.example.com" ;;
            *)        PRIMARY_REG="internal.platform-registry.example.com" ;;
          esac
          PRIMARY_URL="${PRIMARY_REG}/${REPO_OWNER}/${REPO_NAME}"

          # internet type also pushes to the public registry
          if [ "${IMAGE_TYPE}" = "internet" ]; then
            PUBLIC_URL="public.platform-registry.example.com/${REPO_OWNER}/${REPO_NAME}"
            IMAGE_URL_LIST="${PRIMARY_URL},${PUBLIC_URL}"
          else
            IMAGE_URL_LIST="${PRIMARY_URL}"
          fi
          IMAGE_URL_LIST="${IMAGE_URL_LIST,,}"  # convert to lowercase

          echo "python_version=${PYTHON_VERSION}"     >> "$GITHUB_OUTPUT"
          echo "pylint_module_paths=${PYLINT_PATHS}"  >> "$GITHUB_OUTPUT"
          echo "pylint_rc_file=${PYLINT_RC}"          >> "$GITHUB_OUTPUT"
          echo "has_unit_test=${HAS_UNIT_TEST}"       >> "$GITHUB_OUTPUT"
          echo "unit_test_script=${UNIT_TEST_SCRIPT}" >> "$GITHUB_OUTPUT"
          echo "dockerfile=${DOCKERFILE}"             >> "$GITHUB_OUTPUT"
          echo "image_url_list=${IMAGE_URL_LIST}"     >> "$GITHUB_OUTPUT"

          # Job Summary: config resolution table (debug-friendly, critical for supporting 500+ repos)
          echo "### Config resolved from .ci-config/config.yaml" >> "$GITHUB_STEP_SUMMARY"
          echo "| Key | Value |" >> "$GITHUB_STEP_SUMMARY"
          echo "|-----|-------|" >> "$GITHUB_STEP_SUMMARY"
          echo "| python_version | \`${PYTHON_VERSION}\` |" >> "$GITHUB_STEP_SUMMARY"
          echo "| pylint_module_paths | \`${PYLINT_PATHS}\` |" >> "$GITHUB_STEP_SUMMARY"
          echo "| has_unit_test | \`${HAS_UNIT_TEST}\` |" >> "$GITHUB_STEP_SUMMARY"
          echo "| dockerfile | \`${DOCKERFILE:-<none>}\` |" >> "$GITHUB_STEP_SUMMARY"
          echo "| image_url_list | \`${IMAGE_URL_LIST}\` |" >> "$GITHUB_STEP_SUMMARY"

The importance of Job Summary is amplified at 500+ scale: when a business team reports that “CI behavior doesn’t match expectations,” the platform team needs to quickly determine whether the issue is a config.yaml parsing problem or a pipeline logic problem. The config table in the Step Summary turns that diagnosis from “needing to dig through logs” into “just open the PR page.”

Downstream Consumption

build:
  name: Build Container Image
  needs: [config, security]
  if: ${{ needs.config.outputs.dockerfile != '' }}
  uses: OrgA/.github/.github/workflows/platform-ci-build.yml@main
  with:
    dockerfile:     ${{ needs.config.outputs.dockerfile }}
    image_url_list: ${{ needs.config.outputs.image_url_list }}

Part 3: Deep Dive into the JWT/OIDC Credential Architecture

The Nature of the `job_workflow_ref` Claim

The GitHub Actions OIDC token contains a key claim: job_workflow_ref. Its value is the path of the called workflow file, not the name of the calling repository.

When business repository OrgA/my-app (one of 500) calls platform-ci-build.yml:

1 2	job_workflow_ref = "OrgA/.github/.github/workflows/platform-ci-build.yml@refs/heads/main" repository = "OrgA/my-app" ← this is the business repo, not .github

Vault’s bound_claims binds on job_workflow_ref — regardless of which business repository triggers it, as long as the platform workflow file is being called, authentication succeeds. 500 repositories, 1 Vault role, 0 static credentials.

The Complete Authentication Flow

Business repo ci.yml (any one of 500)
      │  (uses: OrgA/.github/...platform-ci-build.yml@main)
      ▼
platform-ci-build.yml (job level)
      │  permissions:
      │    id-token: write   ← must be declared at the sub-workflow job level
      │
      ├─1─► GHE OIDC Endpoint (https://ghe.example.com/_services/token)
      │       └── returns JWT containing the job_workflow_ref claim
      │
      ├─2─► vault-action (method: jwt)
      │       ├── POST /v1/auth/jwt/login
      │       │     { jwt: <oidc_token>, role: "platform-ci" }
      │       │
      │       └── Vault verification flow:
      │             ├── fetch OIDC Discovery Document (JWKS endpoint)
      │             ├── verify JWT signature
      │             ├── check bound_claims (job_workflow_ref glob match)
      │             ├── check @refs/heads/main suffix (branch lock)
      │             └── return batch token (TTL: 5min)
      │
      └─3─► use batch token to read KV secrets
              (registry credentials, code signing certificates, etc.)

Vault Role JSON Configuration

{
  "role_type": "jwt",
  "bound_audiences": ["https://vault.example.com"],
  "bound_claims_type": "glob",
  "bound_claims": {
    "job_workflow_ref": "OrgA/.github/.github/workflows/*@refs/heads/main"
  },
  "user_claim": "repository",
  "claim_mappings": {
    "repository":       "repository",
    "ref":              "ref",
    "workflow":         "workflow",
    "job_workflow_ref": "job_workflow_ref"
  },
  "policies":               ["platform-ci"],
  "ttl":                    "5m",
  "max_ttl":                "10m",
  "token_type":             "batch",
  "token_no_default_policy": false
}

At 500+ repository scale, the security value of this design is especially significant:

500 business repositories, none storing any credentials
Even if any business repository is compromised, the attacker still cannot obtain platform credentials (job_workflow_ref won’t match)
Every modification to the platform workflow files must go through a code review on the main branch; @refs/heads/main is enforced at the Vault layer

Note: The Vault CLI does not support passing map-type parameters (bound_claims, claim_mappings) via key=value. You must use JSON heredoc stdin format:
1
2
3
vault write auth/jwt/role/platform-ci - <<'EOF'
{ ...full JSON... }
EOF

Three Key Properties of Batch Tokens

Non-renewable: vault token renew has no effect on batch tokens; they expire when the TTL is reached
Non-queryable: they do not appear in vault list auth/token/accessors; the thousands of tokens generated daily by 500 repositories leave no queryable trace
5-minute TTL: sufficient to complete a single vault-action call; even if leaked after expiry, they cannot be used

OIDC Issuer Differences Between GHE and github.com

1 2	github.com: https://token.actions.githubusercontent.com GHE: https://your-ghe-hostname/_services/token

Vault’s oidc_discovery_url must point to the correct issuer; otherwise Vault cannot fetch the correct JWKS endpoint to verify signatures:

# GHE environment
vault write auth/jwt/config \
  oidc_discovery_url="https://ghe.example.com/_services/token" \
  bound_issuer="https://ghe.example.com/_services/token"

`permissions: id-token: write` Must Be Declared at the Job Level of Each Sub-Workflow

The permissions declared in the orchestrator platform-ci-core.yml are not automatically propagated to reusable workflows called via uses:. Each job that needs to obtain an OIDC token must declare it independently:

# platform-ci-build.yml
jobs:
  build:
    runs-on: [self-hosted, linux]
    permissions:
      id-token: write    # must be declared here, cannot rely on the orchestrator
      contents: read
    steps:
      - uses: hashicorp/vault-action@v3
        with:
          url: ${{ vars.VAULT_URL }}
          namespace: ${{ vars.VAULT_NAMESPACE }}
          method: jwt
          role: ${{ vars.VAULT_ROLE }}
          jwtGithubAudience: ${{ vars.VAULT_AUDIENCE }}
          secrets: |
            secret/data/platform/${{ vars.VAULT_ENV }}/registry username | REGISTRY_USER ;
            secret/data/platform/${{ vars.VAULT_ENV }}/registry password | REGISTRY_PASS

Part 4: Multi-Environment Routing — The Org Variables Approach

Design Motivation

The traditional approach embeds if/else environment checks directly in the workflow, tightly coupling the workflow files to environments. At 500+ repository scale, any environment configuration change requires modifying the platform workflow files and re-testing.

The platform team adopted an Organization Variable injection approach: when each Org is created, a platform script writes environment-specific variables once. The workflow code contains absolutely no environment branching logic, and uses exactly the same code across all three environments (dev/stg/prod).

The Six Org Variables

Variable	OrgA-Dev	OrgA-Stg	OrgA (prod)
`VAULT_URL`	`https://vault.example.com`	same	same
`VAULT_NAMESPACE`	`platform/dev`	`platform/stg`	`platform/prod`
`VAULT_ROLE`	`platform-ci`	same	same
`VAULT_AUDIENCE`	`https://vault.example.com`	same	same
`VAULT_ENV`	`dev`	`stg`	`prod`
`API_URL`	`https://api.platform-dev.example.com`	`https://api.platform-stg.example.com`	`https://api.platform.example.com`

In the secrets: field of vault-action, ${{ vars.VAULT_ENV }} is automatically interpolated:

1 2	secrets: \| secret/data/platform/${{ vars.VAULT_ENV }}/registry username \| REGISTRY_USER

In OrgA-Dev → secret/data/platform/dev/registry
In OrgA → secret/data/platform/prod/registry

Not a single line of workflow code changes; all three environments (covering 500+ repositories) route automatically.

The Idempotent `apply-org-variables.sh` Implementation

#!/usr/bin/env bash
# Idempotent write of Organization Variables
# Already exists → PATCH (update), doesn't exist → POST (create)

set -euo pipefail

apply_org_vars() {
  local ORG=$1
  local ENV=$2

  declare -A VARS=(
    ["VAULT_URL"]="https://vault.example.com"
    ["VAULT_NAMESPACE"]="platform/${ENV}"
    ["VAULT_ROLE"]="platform-ci"
    ["VAULT_AUDIENCE"]="https://vault.example.com"
    ["VAULT_ENV"]="${ENV}"
    ["API_URL"]="https://api.platform-${ENV}.example.com"
  )

  for KEY in "${!VARS[@]}"; do
    VALUE="${VARS[$KEY]}"
    HTTP_STATUS=$(gh api "orgs/${ORG}/actions/variables/${KEY}" \
      -i --silent 2>&1 | head -1 | awk '{print $2}')

    if [ "${HTTP_STATUS}" = "200" ]; then
      gh api --method PATCH "orgs/${ORG}/actions/variables/${KEY}" \
        -f value="${VALUE}" -f visibility="all" --silent
      echo "[UPDATE] ${ORG} / ${KEY}=${VALUE}"
    else
      gh api --method POST "orgs/${ORG}/actions/variables" \
        -f name="${KEY}" -f value="${VALUE}" -f visibility="all" --silent
      echo "[CREATE] ${ORG} / ${KEY}=${VALUE}"
    fi
  done
}

apply_org_vars "OrgA-Dev" "dev"
apply_org_vars "OrgA-Stg" "stg"
apply_org_vars "OrgA"     "prod"

Part 5: Engineering Details of Container Builds

Multi-Registry Push Design

image_url_list is a comma-separated list of URLs:

internet type:
  internet.platform-registry.example.com/OrgA/my-app
  + public.platform-registry.example.com/OrgA/my-app

private type:
  internal-private.platform-registry.example.com/OrgA/my-app (only this one)

Cross-Registry Copying with `docker buildx imagetools create`

After the build completes, imagetools create copies the manifest directly at the registry layer without pulling the image to the runner, saving bandwidth and time. At the scale of 500+ repositories with high-frequency builds, the cumulative savings from this optimization are significant:

# Build and push to the primary registry
docker buildx build \
  --platform linux/amd64 \
  --tag "${PRIMARY_URL}:${TAG}" \
  --push \
  --file "${DOCKERFILE}" .

# Cross-registry manifest copy (manifest only, never touches the runner)
IFS=',' read -ra ALL_URLS <<< "${IMAGE_URL_LIST}"
PRIMARY_URL="${ALL_URLS[0]}"

for EXTRA_URL in "${ALL_URLS[@]}"; do
  [ "${EXTRA_URL}" = "${PRIMARY_URL}" ] && continue

  EXTRA_REGISTRY=$(echo "${EXTRA_URL}" | cut -d'/' -f1)
  if echo "${EXTRA_REGISTRY}" | grep -q 'public\.platform'; then
    echo "${PUBLIC_REGISTRY_PASS}" | docker login "${EXTRA_REGISTRY}" \
      -u "${PUBLIC_REGISTRY_USER}" --password-stdin
  else
    echo "${REGISTRY_PASS}" | docker login "${EXTRA_REGISTRY}" \
      -u "${REGISTRY_USER}" --password-stdin
  fi

  docker buildx imagetools create --tag "${EXTRA_URL}:${TAG}" "${PRIMARY_URL}:${TAG}"
  echo "Pushed ${EXTRA_URL}:${TAG}"
done

Image Signing (Signify)

The platform uses an internal Signify service for image signing with mTLS client certificate authentication. All tags across all registries require signing:

IFS=',' read -ra ALL_URLS <<< "${IMAGE_URL_LIST}"
IFS=',' read -ra ALL_TAGS <<< "${TAG_LIST}"

for URL in "${ALL_URLS[@]}"; do
  for TAG in "${ALL_TAGS[@]}"; do
    IMAGE="${URL}:${TAG}"
    DIGEST=$(docker buildx imagetools inspect "${IMAGE}" \
      --format '{{json .Manifest}}' | jq -r '.digest' | sed 's/^sha256://')
    MANIFEST=$(docker manifest inspect "${IMAGE}" 2>/dev/null)
    BYTE_SIZE=$(echo "${MANIFEST}" | jq -r '.config.size // 0')

    GUN=$(echo "${IMAGE}" | rev | cut -d':' -f2- | rev)
    PAYLOAD="{\"trustedCollections\":[{\"gun\":\"${GUN}\",\"targets\":[{\"name\":\"${TAG}\",\"digest\":\"${DIGEST}\",\"byteSize\":${BYTE_SIZE}}]}]}"

    curl -sf -X POST \
      --cert "${CERT_FILE}" \
      --key  "${KEY_FILE}"  \
      --pass "${KEY_PASS}"  \
      "${SIGNIFY_ENDPOINT}/trusted-collections/publish" \
      -H "Content-Type: application/json" \
      -d "${PAYLOAD}" || echo "Warning: signing failed for ${IMAGE}, continuing"
  done
done

Compatibility Issues with `upload-artifact@v4` on GHE

Certain versions of GitHub Enterprise Server do not support the new API used by actions/upload-artifact@v4:

1 2	Error: GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES.

At 500+ repository scale, the blast radius of this kind of compatibility issue is total — you must downgrade to v3:

- uses: actions/upload-artifact@v3
  with:
    name: lint-report
    path: reports/
- uses: actions/download-artifact@v3
  with:
    name: lint-report

Part 6: Per-Repo Secrets — Vault Enterprise Secrets Sync

The Division of Two Credential Types

Credential Type	How It’s Obtained	Example	Security Level
Platform shared credentials	Fetched from Vault at runtime via JWT/OIDC	Registry credentials, signing certificates	High (cross-Org permissions)
Business repository-specific credentials	Pushed as GitHub Secrets by Vault Secrets Sync	Database connection strings, business API keys	Medium (single-repository permissions)

At 500+ repository scale, per-repo Secrets management requires automation — you cannot manually configure 500 repositories. Vault Enterprise Secrets Sync provides this capability.

Vault Enterprise Secrets Sync Configuration

# 1. Create a GitHub Actions sync destination (Fine-Grained PAT, only needs secrets:write)
vault write sys/sync/destinations/github-actions/my-app-prod \
  access_token="github_pat_xxxx" \
  repository_owner="OrgA" \
  repository_name="my-app" \
  secret_name_template="{{.SecretKey | uppercase}}"

# 2. Create an Association: KV path → GitHub Actions Secret
vault write sys/sync/associations/my-app-prod \
  mount="secret" \
  secret_name="apps/my-app/prod/db"

How Automatic Rotation Sync Works

Vault KV secret updated (manually or via dynamic credentials)
      │
      ▼
Vault Sync Engine (background polling, ~5 minute interval)
      │
      ▼
GitHub API: PUT /repos/OrgA/my-app/actions/secrets/DB_PASSWORD
      │
      ▼
GitHub Secret automatically updated (takes effect on next workflow run)

At 500+ repository scale, credential rotation no longer requires notifying each repository owner — the Vault Sync Engine automatically handles the push. The platform team only manages the KV in Vault, and business repository Secrets are automatically synced.

Part 7: Observability at 500+ Scale

When 500+ repositories are running CI simultaneously, observability is not a “nice to have” — it is foundational infrastructure for platform operations.

Runner Capacity Monitoring

# Record runner info at the start of each job
- name: Log runner info
  run: |
    echo "Runner: ${{ runner.name }}"
    echo "OS: ${{ runner.os }}"
    echo "Arch: ${{ runner.arch }}"
    echo "Repo: ${{ github.repository }}"
    echo "Event: ${{ github.event_name }}"
    echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)"

CI Health Dashboard

The platform team needs to be able to answer:

Over the past 7 days, which 20 repositories had the highest CI failure rates?
Average CI duration trend (is there performance regression)?
Security scan coverage (which repositories haven’t had a CI run in over 30 days)?

This data can be collected via the GitHub API or through custom telemetry emitted during CI runs.

Bulk Compliance Check Script

#!/usr/bin/env bash
# Check CI onboarding status for all repositories
echo "Checking all repositories under ${ORG}..."
gh repo list "${ORG}" --limit 1000 --json name \
  | jq -r '.[].name' \
  | while read repo; do
      # Check for .ci-config/config.yaml
      if ! gh api "repos/${ORG}/${repo}/contents/.ci-config/config.yaml" \
          --silent &>/dev/null; then
        echo "  [missing config] ${repo}"
        continue
      fi
      # Check if the platform workflow is being called
      if ! gh api "repos/${ORG}/${repo}/contents/.github/workflows/ci.yml" \
          --silent 2>/dev/null | grep -q 'platform-ci-core.yml'; then
        echo "  [not onboarded to platform CI] ${repo}"
      fi
    done

Part 8: Lessons Learned

1. Two Ways to Handle Null Values in `yq`

yq outputs the string null by default when a field doesn’t exist, causing downstream jobs to receive the literal string "null". Across 500+ repositories with significant variation in config.yaml writing styles, this type of issue comes up frequently:

# Option A: yq inline default value (recommended for simple scalar fields)
RCFILE=$(yq '.jobs[].pyLint.rcFile // ""' config.yaml)

# Option B: shell filtering (handles all null output scenarios)
RCFILE=$(yq '.jobs[].pyLint.rcFile' config.yaml | grep -v '^null$' || echo "")

2. Multi-Line Values in `$GITHUB_OUTPUT` Must Use Heredoc

# Wrong: newlines are truncated
echo "changelog=${MULTI_LINE_TEXT}" >> "$GITHUB_OUTPUT"

# Correct: heredoc format
{
  echo "changelog<<EOF"
  echo "${MULTI_LINE_TEXT}"
  echo "EOF"
} >> "$GITHUB_OUTPUT"

3. `with:` Fields in Reusable Workflows Only Support Strings

inputs:
  has_unit_test:
    type: string   # can only be string, not boolean

steps:
  - name: Run unit tests
    if: inputs.has_unit_test == 'true'   # string comparison
    run: pytest

4. The Security Trap of `pull_request_target` + checkout

pull_request_target runs in the base repository’s context, but actions/checkout defaults to checking out the base branch, not the PR’s code.

You must explicitly specify the PR head SHA:

1
2
3

- uses: actions/checkout@v4
  with:
    ref: ${{ github.event.pull_request.head.sha }}

Security note: the code being checked out belongs to the fork. The Secrets access logic must be isolated in a separate job from the code checkout to prevent malicious scripts in the fork from reading Secrets.

5. Correct Syntax for `needs.config.outputs` in `if:` Conditions

build:
  needs: config
  # Correct: full expression syntax
  if: ${{ needs.config.outputs.dockerfile != '' }}

  # Wrong: without ${{ }} wrapping, a non-empty string is not automatically truthy
  # if: needs.config.outputs.dockerfile

6. Runner Queue Pressure During 500+ Concurrent Triggers

During the morning commit rush, the runner queue can back up with hundreds of jobs. You need to monitor queue_time (the time from when a job enters the queue to when it starts running) and adjust runner count accordingly. Queue waits exceeding 5 minutes significantly degrade the developer experience.

Summary

At 500+ repository scale, the core value of GitHub Actions Reusable Workflows:

Working around static structure constraints: the config job acts as a dynamic configuration middleware layer, replacing Jenkins Groovy’s runtime merge capability
JWT/OIDC zero long-lived credentials: none of the 500 repositories stores any credentials; batch tokens are non-queryable, maximizing the security boundary
Multi-environment zero-code routing: Organization Variable injection enables three environments to use exactly the same workflow code
Multi-registry container builds: imagetools create copies at the manifest layer, saving bandwidth across thousands of daily builds from 500+ repositories
Layered credential management: platform shared credentials use OIDC; business-specific credentials use Secrets Sync; credential lifecycle for 500 repositories is fully automated

Each business repository ultimately only needs to maintain a 15-line ci.yml and a .ci-config/config.yaml — this “minimal onboarding model” is identical for the 1st repository and the 500th, which is precisely the design goal of a scalable platform.

GitHub Actions Reusable Workflow: A Complete Implementation of Zero-Config Unified CI/CD

Part 1: Architecture Overview

The .github Repository as a Platform Boundary

Why pull_request_target Is Necessary

Workflow File Responsibilities

Part 2: The config Job — The Key Design That Replaces the Groovy Merger

Why a Dedicated config Job Is Needed

The Complete Shell Implementation of the config Job

Downstream Consumption

Part 3: Deep Dive into the JWT/OIDC Credential Architecture

The Nature of the job_workflow_ref Claim

The Complete Authentication Flow

Vault Role JSON Configuration

Three Key Properties of Batch Tokens

OIDC Issuer Differences Between GHE and github.com

permissions: id-token: write Must Be Declared at the Job Level of Each Sub-Workflow

Part 4: Multi-Environment Routing — The Org Variables Approach

Design Motivation

The Six Org Variables

The Idempotent apply-org-variables.sh Implementation

Part 5: Engineering Details of Container Builds

Multi-Registry Push Design

Cross-Registry Copying with docker buildx imagetools create

Image Signing (Signify)

Compatibility Issues with upload-artifact@v4 on GHE

Part 6: Per-Repo Secrets — Vault Enterprise Secrets Sync

The Division of Two Credential Types

Vault Enterprise Secrets Sync Configuration

How Automatic Rotation Sync Works

Part 7: Observability at 500+ Scale

Runner Capacity Monitoring

CI Health Dashboard

Bulk Compliance Check Script

Part 8: Lessons Learned

1. Two Ways to Handle Null Values in yq

2. Multi-Line Values in $GITHUB_OUTPUT Must Use Heredoc

3. with: Fields in Reusable Workflows Only Support Strings

4. The Security Trap of pull_request_target + checkout

5. Correct Syntax for needs.config.outputs in if: Conditions

6. Runner Queue Pressure During 500+ Concurrent Triggers

Summary

The `.github` Repository as a Platform Boundary

Why `pull_request_target` Is Necessary

The Complete Shell Implementation of the `config` Job

The Nature of the `job_workflow_ref` Claim

`permissions: id-token: write` Must Be Declared at the Job Level of Each Sub-Workflow

The Idempotent `apply-org-variables.sh` Implementation

Cross-Registry Copying with `docker buildx imagetools create`

Compatibility Issues with `upload-artifact@v4` on GHE

1. Two Ways to Handle Null Values in `yq`

2. Multi-Line Values in `$GITHUB_OUTPUT` Must Use Heredoc

3. `with:` Fields in Reusable Workflows Only Support Strings

4. The Security Trap of `pull_request_target` + checkout

5. Correct Syntax for `needs.config.outputs` in `if:` Conditions