How the Best Teams Build Confidence
Turning Releases From Heroics Into Routine Progress
Why confidence, not speed, is the real differentiator
High-performing teams focus on building confidence in their releases rather than just shipping faster.
How a Path to Production encodes culture
Making safety, speed, and compliance unavoidable through systematic engineering practices.
What leaders should look for before shipping
Clear criteria from CEO to Legal that changes must meet before earning the right to ship.
Path to Production as a System, Not a Phase
A good pipeline is not a tool. It's a contract: an agreed, shared way for changes to move from idea to customers.
It doesn't replace culture, it encodes it. When the pipeline reflects the set of shared principles, leaders don't
have to micromanage behaviour: the system guides it.
That's the point: the pipeline is where culture meets engineering. If your culture values safety, speed, and collaboration and autonomy, the pipeline makes those values unavoidable in the day-to-day.
And when it's designed well, it gives everyone, from engineers to executives, the thing they need most before
releasing a change: confidence.
What Matters More Than Speed
Most companies talk about shipping faster. The best companies, in our experience, talk about shipping with confidence.
Without confidence:
- Teams slow down, adding meetings and approvals
- People hesitate, guessing at side effects
- Releases become events, not habits

Confidence isn't a feeling. It's the product of a delivery system that proves a change is ready in all the ways your organization needs. That system is the Path to Production.
Confidence Across Leadership Roles
As a CEO
Confidence that change won't hurt the business or the brand.
- Check that new features work as promised for customers
- Check that performance holds up under real demand
- Check that delivery protects reputation instead of risking it
As a CISO
Confidence that every release meets security and compliance by default.
- Check that regulatory requirements are met every time
- Check that security controls are enforced consistently
- Check that nothing bypasses compliance for the sake of speed
As a COO
Confidence that operations run efficiently at scale.
- Check that delivery doesn't waste time or people
- Check that costs stay under control as releases accelerate
- Check that resources are optimised, not burned on rework
As a CTO
Confidence that the system enables innovation, not overhead.
- Check that teams can adopt new tech without breaking delivery
- Check that feedback loops support faster iteration
- Check that engineering effort goes into features, not waiting
As Legal / Compliance
Confidence that compliance is provable, not just promised.
- Check that every release leaves a clear audit trail
- Check that evidence of rule-following is visible and accessible
- Check that accountability is built into the process, not added after
A Learning System To Stop Regression
And here is the important reality: Even with the best systems, mistakes will happen. Unless you're willing to spend mission-critical budgets for near-zero failure, it is a matter of when, not if.
That is the reason the best teams also use their path to production as a
learning system. They don't just aim to stop mistakes from ever reaching
customers, they build the ability to never make the same mistake twice.
The learnings from dealing with every incident or bug become encoded into the pipeline as an additional guardrail, making the system more resilient over time. This is how you remove blind spots: your experience becomes part of the infrastructure, with clear visibility for all.
What Is a Path to Production?
It is the road your code travels through; from a developer's laptop to a live customer environment. It passes through a series of quality gates, each one answering the question:
Can we trust this change to move forward?
In technical terms, these are automated stages: build, test, package, deploy, validate, and promote. In leadership terms, it's a risk management system that turns "I think this is fine" into "Ready to Ship."

Golden Path vs Path to Production: Much of the industry talks about the Golden
Path in platform engineering. While it overlaps with our idea of the Path to Production, the focus is often on
pipelines and tooling. For us, that misses the deeper point: the real leverage comes from shared agreements between
stakeholders. In our experience, solving this problem starts in the right order: with the conversation about what to
do, not how.
The Stages of Confidence
We've seen this work time and again, in teams large and small, across industries. When you make it clear from the start that this is not a tooling conversation, but an agreement on how changes earn the right to ship, alignment becomes simpler.
01. Agree on principles first
It's easier to agree on principles than on tools. Once those principles are agreed, they become the compass for every decision.
02. Include business stakeholders
Discussing principles allows for the business to be part of the conversation. This is essential for cost-first quality decisions.
03. Build shared understanding
With that shared understanding, the path to production works — because the agreement matters more than the exact tools or steps.
Those tools will evolve. What doesn't change is the value of committing to a common set of standards. That's the first big step.
Stage 1: The Build
The path to production begins with the Build step, creating a deployable, versioned, immutable, artifact (DIVA): a sealed package of the application, stamped with a unique serial number. Like a vehicle identification number (VIN), it tells you exactly where it came from, what's in it, and when it was made.
Application Definition: An application here could be anything from a
customer-facing website to a background service processing transactions. It evolves constantly: new features, bug
fixes, security patches: which makes knowing exactly what is running, and where, essential.
DIVA Encapsulation
A DIVA (Deployable, Immutable, Versioned Artifact) is more than compiled code — it's a fully encapsulated unit that carries everything a change needs to earn the right to ship:
- Deployable: Can be promoted through environments without modification
- Immutable: Once built, it never changes — the same thing tested is what runs in production
- Versioned: Every build is uniquely identifiable, reproducible, and traceable
- Artifact (Everything as Code): A single package that includes business logic, infrastructure requirements, configuration, tests, and monitoring
If a build can change after creation, or if different environments are running
slightly different versions, repeatability disappears. You lose the ability to
trace defects with precision, prove what was deployed for audit purposes, or guarantee that a fix tested in one
place will behave the same in another.
Pre-Seal Validation: Localized Tests & Security
Before an artifact is sealed, the pipeline executes localized tests within a pristine build environment. This mirrors a developer's local checks but guarantees no reliance on prior states. It's also where crucial security scanning begins, including static analysis and dependency checks, ensuring early detection of vulnerabilities.
This process ensures the artifact is a consistent, verifiable unit. Immutability guarantees that the package moving
forward is precisely the one that passed these local tests: unchanged, dependable, and ready for subsequent stages.
This early validation underpins reliability for integration, meaningfulness for further testing, and predictability for production changes. The integrity established here forms the foundation for confidence throughout the entire path to production.
This critical environment performs static code analysis (SAST) and open-source dependency checks (SCA). These act as early-warning systems, akin to detecting a hairline crack in an aircraft wing before takeoff, rather than mid-flight. Identifying vulnerabilities here is significantly cheaper and faster to fix than later in the production path.
People respect what you inspect.
Linting means guiding coding standards, balancing consistency with developer autonomy. Consolidated visibility over these initial builds is paramount for guiding subsequent development steps effectively.
Cultural Impact of Reliable Builds
Cultural Ties
When the build stage always works, it's usually because engineers have already run their tests locally before committing. This habit doesn't need a mandate; it emerges naturally, because it keeps the flow unbroken for everyone else and that's how people really go faster every day.

Versioning and Confidence: Robust versioning makes these guarantees durable. Every build can be traced, reproduced,
and rolled back if needed. That's how a change earns the right to ship: it proves not just that it works once, but
that it can be trusted across time, environments, and teams. Confidence, that what you tested is exactly what you'll
run, is the foundation of safe speed.
Stage 2: Fast Feedback
This stage of the pipeline exists to keep developers in flow. The goal is simple: give them realistic, real-world feedback on their changes in 15 minutes or less. Quick signals let developers fix issues immediately, instead of losing hours or days to context switching.
But fast feedback is about more than speed. It marks the shift from code delivery to service delivery. From this
point on, developers aren't just writing software: they're delivering a running, observable, reliable service.
Operational readiness becomes part of the developer's craft, not something bolted on later. All tests from here forward are expected to run on production-grade systems, using the same tooling that production itself relies on.
Substage 2.1: Fast Feedback with Stubbed Dependencies

The quicker a change shows it works, the less it costs to fix when it doesn't. Waiting until a service is surrounded by all its real dependencies is like discovering the foundations are crooked when the building is already half-finished.
The Fast Feedback with Stubbed Dependencies stage prevents that. The artifact from Stage 1 is deployed into a production-like environment and tested in isolation using stubs.
What are stubs? Stubs are lightweight, simulated versions of the external systems
a service depends on. Instead of waiting for every real system to be available, teams use stubs to mimic their
behaviour, returning predictable responses to specific requests. A widely adopted tool for this is WireMock, created
by Tom Akehurst, whose work has been greatly impactful to the industry.
Because these stubs are under our control, we can script them to be fully co-operative or intentionally problematic. This allows us to verify both the ideal and the difficult scenarios. In fact, there are many behaviours we can test with stubs that would be impossible or unsafe to test against a real dependency.
2.1.1. Stubbed Functional (Testing Behaviour)
Running the functional tests here gives teams autonomy. They can achieve high confidence in their service without depending on another team to prepare or maintain a shared environment and with the risk of impacting another team with the untested changes.
It's like a dress rehearsal: the service gets to run through its lines and cues in a controlled space before stepping onto the main stage with the rest of the cast.
Historical Challenge
Environments were heavyweight, long-lived, wasteful, costly, shared and slow to provision
Modern Solution
Lightweight, on-demand, self-service environments make it possible to deploy and test in minutes
This is also the first full functional run of the service the same infrastructure setup as production, but without interference from other teams/systems. If this stage passes, it means we can trust the service's core functionality in isolation.
Cultural Ties: Stubbed Functional Testing
When deploying to this stage is automatic and the environment is ready, writing the functional tests becomes a natural part of a development team's "definition of done."
It can also be something the business chooses to make a requirement, with acceptance criteria executed as part of the pipeline itself.

Teams start thinking about verification while they're building, not afterwards: a habit that emerges naturally
because the pipeline helps drive that culture.
2.1.2. Stubbed Non-Functional (Performance)
Still in isolation, the service is tested for important non-functional qualities: how it performs under load, how it recovers from simulated failures, and whether it remains stable through intermittent disruption.
This is the difference between confirming a car starts, drives, turns vs. proving it can handle a sudden rainstorm or a steep hill.
Load Testing
Critical business journeys during fast feedback ensure key flows perform under expected peak demand.
Rolling Deployments
Version compatibility checks validate that one version can run alongside another during gradual rollouts.
Graceful Degradation
Under slow dependencies, the system fails elegantly rather than cascading errors.
Failure Simulations
Tests resilience to network latency or service discovery issues through chaos testing.
The economics of running performance tests have changed dramatically. With on-demand environments, we can spin up the exact scale we need for testing, run it for 15 minutes, and then tear it down, paying only for what we used.
Cultural Ties: Stubbed Non-Functional Testing (Performance)
The power of having stubs, and why we strongly advocate for them, is that we don't have to wait for it to rain. We choose how much to pour, how long to keep it coming, and repeat it as often as we like.
It puts real-world testing in our control. Many teams face the opposite problem: waiting on other teams, systems, or conditions before they can gain high confidence in their applications.

With stubs, we can trigger precise failure modes and stress conditions too risky or rare to test in production and
enter integration safer, having done our own homework first and proven our part works before trying it out with the
group.
2.2. Integration Tests
Substage 2.2: Integration Tests
After proving that a service works in isolation, the next step is to see how it behaves in the company of others. In this stage, all teams deploy their services into a shared, high-availability environment (Fast Integration).
The Fast Integration environment is treated like production: it must stay alive during working hours, and every team is responsible for keeping it healthy.
Non-disruptive deployments
New versions must come online without breaking what's already running
Production Grade Monitoring
Issues are detected the moment they appear
Clear response expectations
Teams fix problems immediately to keep everyone moving
Own your test data lifecycle
Each team manages the data their tests need, avoiding conflicts
Anyone can start a test at any time
The environment and services are always ready
Centralized Test Suites Run Alongside
Catching broader issues focusing on critical business journeys
Why these rules matter: In traditional setups, shared environments often suffer
from "data chaos." Teams overwrite each other's test data, leave it in an inconsistent state, or depend on a
specific dataset being present, only to find it changed by someone else.
Monitoring and alerting have also historically been disconnected from the application itself, managed by separate teams, on separate lifecycles, and often only connected in production. Thanks to encapsulation provided by the build stage, the DIVA has everything required to operate the service, including monitoring and alerting. When deployed, the service and its operational instrumentation arrive together, tested together, and proven together.
Cultural Ties: Integration Tests
Treating fast integration like production builds habits that carry forward: designing for safe rollouts, fixing issues immediately, and owning the health of the code you commit.
It fosters respect between teams, in the sense that each team arrives having done their homework in isolation, and they contribute to keeping the shared environment running for everyone else.

This marks the shift from code delivery to service delivery. Developers aren't just responsible for software; they're delivering a running, observable, reliable service.
For the organization, this shift means earlier detection of integration issues, faster recovery from problems, and fewer production incidents, all without the bottlenecks and firefighting of traditional release cycles.
Quality Gates
Quality gates are checkpoints after certain stages have completed. They are not about running more tests but about
recognizing if a service has cleared all of the tests so far.
1
Build
Localized Test and Packaging a test DIVA
Quality Gate: Ready for Fast Feedback
Promoted to Fast Feedback testing queue
2
Fast Feedback
2.1. Stubbed Testing
2.1.1. Functional Testing
2.1.2. Non Functional Testing
2.2. Integration Testing
Quality Gate: Ready for Extended Testing
Promoted to extended testing repository
Reaching this point means the current version is ready for a new class of scrutiny: Extended Testing. That readiness
is signalled by promoting the version into a repository or queue dedicated to extended test environments.
In practical terms, this is the "definition of done" for deployed tests. Passing this gate means the service is
proven in isolation and in the company of others, with operational visibility in place.
Stage 3: Extended Testing
Extended testing is where we run the heavyweight tests that can't feasibly be executed on every minor change. These are often fully automated end to end and take hours to run, sometimes an entire night, so we batch them for efficiency.
The latest version to clear Stage 2 is deployed here once a night, and the tests are run in isolation from other services to avoid noise.
Peak Load Tests
Simulating the heaviest bursts of traffic the service is expected to handle, like Black Friday sales or major
event launches.
Soak Tests
Running the service under sustained load for many hours to ensure it doesn't degrade over time from memory leaks
or resource starvation.
Rolling Deployments
Confirming one version can gracefully coexist with another during upgrades.
Fault Simulations
Reproducing real-world conditions: slow downstream services, network latency spikes, DNS failures, or restarts at
inconvenient moments.
At this stage, we also introduce promotion-preventing alerts, automated checks that must pass before a version can move forward.
Overnight, we run simulations of real client interactions, then using the application's own monitoring and alerting, we validate expected behavior. This ensures we have the right level of operational visibility and automation to always know if the application is working exactly as intended.
When teams must prove functionality themselves, they expose more metrics; giving operators clearer visibility and reducing handoffs.
Cultural Ties: Extended Testing
When extended testing is part of the normal flow, performance and resilience aren't "special projects." They're baked into every change.
Over time, teams build an instinct for designing features with these demands in mind, knowing they'll be tested and proven before being called 'done'.

Business Impact: Running these tests regularly means fewer surprises in
production, fewer performance bottlenecks during peak events, and more predictable infrastructure costs. It also
gives leaders confidence that the service can handle real-world conditions, proven in controlled, repeatable trials.
Stage 4: Canary and Production
In engineering, "canary" has become shorthand for releasing a change to a small, controlled slice of production to see if it's safe to proceed. The idea is simple: catch trouble early, when it's cheap and easy to fix, rather than late, when it's expensive and public.
For competitive businesses that depend on high-velocity change, canary deployments primarily mitigate risk. Most problems, if they happen, have a much smaller blast radius.
Deploy to Canary
Release to small subset of production traffic
→
Monitor Metrics
Watch real performance indicators
→
Expand Rollout
Gradually increase traffic percentage
→
Validate Success
Confirm service meets expectations
→
The ability to run two versions side by side, and all the engineering maturity that comes with it, gives teams and leaders the confidence to move faster. Everyone knows a rollback is not just possible, but routine.
Cultural Ties: Canary and Production
A well-run canary makes change safer and faster by reinforcing the expectation that all changes can be rolled out to production with a risk mitigation strategy.
This habit replaces the drama of high-stakes releases with the calm pace of small, recoverable steps. Progress becomes a product of design, not a gamble.

Build For Failure: A Reliable Path To Resilience
The cloud forced us to accept that servers crash, networks drop, and availability zones go dark. The lesson was clear: don't try to prevent failure, design systems that recover from it. Fast rollouts carry the same truth. New versions will fail sometimes. That isn't a reason to slow down: it's a reason to design for safe failure.
The cloud forced us to accept that servers crash, networks drop, and availability zones go dark. The lesson was clear: don't try to prevent failure, design systems that recover from it. Fast rollouts carry the same truth. New versions will fail sometimes. That isn't a reason to slow down: it's a reason to design for safe failure.
Why This Works
A strong path to production works because it is more than a set of automated steps: it's a contract that communicates the culture and expectations of delivery across the organization.
Encodes Culture into Engineering
Every stage reinforces the behaviors you want to see: testing before you push, monitoring early, treating shared
environments like production.
Reduces Delivery Friction
Shared rules, clear expectations, and on-demand environments remove slow hand-offs and bottlenecks.
Catches Issues Early
Problems are found when they're easier to diagnose, cheaper to resolve, and less likely to have downstream impact.
Clear Speed vs Risk Dial
Leaders can see exactly how much a change has been proven before it reaches customers.
Encodes Lessons into Systems
When something slips through, the fix is baked into the pipeline and its test environments, so blind spots shrink over time.
Builds Repeatability and Resilience
Rebuilding and validating a service daily proves you could replace it from scratch with confidence.
Developer-Owned Operations
By encapsulating monitoring and alerting with code, teams gain a single source of truth.
Mitigates Risk While Enabling Speed
Techniques like canary deployment make high-velocity change safe through smaller impact but faster cadence.
Where to Start: Just a Chat
One of the simplest ways to understand the health of your delivery process is to show up curious and supportive, genuinely interested in how things work for the people doing the work.
Talk to one part of the team, then another, and before long you see the whole picture more clearly than any single part could.
Here's one question for each step of a healthy path-to-production pipeline, along with what you might be missing and what to watch for:
Diagnostic Questions for Each Stage
Build
Ask: "When we make any change to our service, is it versioned and traceable through every environment?"
Might be missing: A truly immutable build that is identical in every stage.
Look out for: Manual tweaks, undocumented changes, versioned artifacts built more than once.
Stubbed Functional
Ask: "Are tests written and run automatically every time code changes, as part of the definition of done?"
Might be missing: Clear ownership of writing tests and an environment ready to run them.
Look out for: Tests that rely on manual steps, or only run occasionally.
Stubbed Non-Functional
Ask: "How do we know we have enough CPU and memory to support our expected load, but not so much we're wasting money?"
Might be missing: Right-sizing based on realistic scenarios.
Look out for: Overprovisioned infrastructure or surprises under load.
Integrated Functional
Ask: "How easy is it for a team to run a test in a shared integrated environment at any time?"
Might be missing: Shared ownership agreements, data agreements and stability agreements.
Look out for: Access queues, blocked deployments, or unstable test data.
Extended Tests
Ask: "How do we know that our service will work for key events, like when we get peak load?"
Might be missing: SLAs, SLOs, NFRs, Right Sizing, Dynamic Scaling.
Look out for: Performance issues, Resource Under Utilization, Environment Reliability Issues.
Canary & Production
Ask: "Can we release changes in production without affecting all customers at once, and promote changes based on monitoring that actively proves a service is working as intended?"
Might be missing: Rollback confidence, parallel-version capability, automated promotion rules.
Look out for: All-or-nothing deployments, manual sign-offs, or slow rollback processes.
We'd Love to Hear Your Thoughts!