How I'm Using AI to Speed Up My CI/CD Pipelines
A codebase outgrows the people who built it. That's the part nobody likes to say out loud. You start a project, you know every integration and every abstraction cold, and then a year goes by and even you have lost the map of where things connect. So when a change lands and someone asks "are we sure this is safe to ship," the honest answer is nobody fully knows. And the way teams buy back that certainty is to run the whole end-to-end suite, every time, because a full pass feels safe even when the change didn't need most of it.
That feeling is expensive. In CI/CD, time is money in a pretty literal way: a pipeline that takes hours, multiplied by every engineer pushing every day, is real money and real waiting. If you can cut that time and trust the cut, it's worth chasing. The catch is the trust. You can't shorten a pipeline by guessing. So I put AI on the one question that's hard to answer by hand: does this change actually need that test?
Where the time goes
On my pipelines the end-to-end tests are the long pole, by a wide margin. The unit tests run fast and they confirm the small pieces fit the way I expected; that layer was never the bottleneck. The end-to-end suite is. Some of those tests stand up real flows, and on an older project a full vetted run can take hours once you fold in things like DNS propagation, never mind the high-availability and disaster-recovery checks, which are their own world.
I already keep every end-to-end test in the same repo as the code it covers, so a pull request can target a subset instead of the whole catalog. That worked when the list was short, but the list isn't short anymore, and picking the right subset by hand stopped being something I'd trust myself to do on a tired Friday.
The selector, in one script
So the picking comes down to one script the pipeline runs before the tests. It reads the diff, hands it to a model along with a map of which code areas feed which suites, and prints the list of suites to run. Trimmed to the parts that matter:
#Requires -Version 7
param(
[Parameter(Mandatory)] [string] $BaseRef, # e.g. origin/main
[Parameter(Mandatory)] [string] $MapPath # the markdown test map
)
Set-StrictMode -Version Latest
$ErrorActionPreference = 'Stop'
$map = Get-Content -Raw $MapPath
$allSuites = Get-SuiteCatalog $map # the full list, parsed out of the map
$pinned = Get-AlwaysRunList $map # the "always run" entries from the map
function Set-Output($name, $value) { "$name=$value" | Out-File $env:GITHUB_OUTPUT -Append }
# The one safe move when anything looks off: run everything.
function Use-FullSuite($why) {
Write-Host "Running the full suite: $why"
Set-Output 'suites' ($allSuites | ConvertTo-Json -Compress -AsArray)
exit 0
}
$changed = git diff --name-only "$BaseRef...HEAD"
if (-not $changed) { Use-FullSuite 'no diff to read' }
$prompt = @"
Map of code areas to end-to-end suites:
$map
Files changed in this pull request:
$($changed -join "`n")
Return ONLY a JSON array of suite names from the map that this change could
affect. If you are unsure about any of it, return every suite.
"@
try { $picked = (Invoke-SelectionModel $prompt) | ConvertFrom-Json }
catch { Use-FullSuite "model or JSON parse failed: $($_.Exception.Message)" }
# Trust, then verify: drop anything not in the map, then force the pinned suites on.
$valid = $picked | Where-Object { $_ -in $allSuites }
$final = @($valid + $pinned | Select-Object -Unique)
if (-not $final) { Use-FullSuite 'nothing usable came back' }
Set-Output 'suites' ($final | ConvertTo-Json -Compress -AsArray)
Write-Host "Picked $($final.Count) of $($allSuites.Count): $($final -join ', ')"I left the model call abstract on purpose. Invoke-SelectionModel just sends the prompt to whatever model you already pay for and hands back its text; point it wherever you like. The interesting part is everything around it. The diff comes from git, so the model can't be wrong about what changed. Anything it returns gets checked against the real map, so a made-up suite name is dropped instead of crashing the run. The pinned suites get welded back on after the model has had its say. And every failure I could think of, a bad diff, a model timeout, garbage JSON, routes to the same place, which is running everything.
The part that's actually hard, and the bet
The script is the easy half. The hard half is the map it leans on: knowing that when service X changes, the controllers behind test Y are in the line of fire. Living in a codebase you carry that in your head; writing it down where the model can use it is the actual work, and it's the spot where this can quietly go wrong.
So the bet stays lopsided on purpose. Running a test the change didn't need costs a few minutes of compute. Skipping one that would have caught a break costs you a bug behind a green check, found a week later. The AI only earns the right to skip on the changes it can scope with confidence, and everything else runs the lot.


Comments (0)
No comments yet. Be the first to share your thoughts!
Leave a Comment
Sign in with Google, Microsoft, or email to leave a comment.