How the Best Teams Build Confidence

Turning Releases From Heroics Into Routine Progress

Why confidence, not speed, is the real differentiator

High-performing teams focus on building confidence in their releases rather than just shipping faster.

How a Path to Production encodes culture

Making safety, speed, and compliance unavoidable through systematic engineering practices.

What leaders should look for before shipping

Clear criteria from CEO to Legal that changes must meet before earning the right to ship.

Share Your Feedback with Real Engineers

Path to Production as a System, Not a Phase

A good pipeline is not a tool. It's a contract: an agreed, shared way for changes to move from idea to customers.

It doesn't replace culture, it encodes it. When the pipeline reflects the set of shared principles, leaders don't have to micromanage behaviour: the system guides it.

That's the point: the pipeline is where culture meets engineering. If your culture values safety, speed, and collaboration and autonomy, the pipeline makes those values unavoidable in the day-to-day.

And when it's designed well, it gives everyone, from engineers to executives, the thing they need most before releasing a change: confidence.

What Matters More Than Speed

Most companies talk about shipping faster. The best companies, in our experience, talk about shipping with confidence.

Without confidence:

Teams slow down, adding meetings and approvals
People hesitate, guessing at side effects
Releases become events, not habits

Confidence isn't a feeling. It's the product of a delivery system that proves a change is ready in all the ways your organization needs. That system is the Path to Production.

Confidence Across Leadership Roles

As a CEO

Confidence that change won't hurt the business or the brand.

Check that new features work as promised for customers
Check that performance holds up under real demand
Check that delivery protects reputation instead of risking it

As a CISO

Confidence that every release meets security and compliance by default.

Check that regulatory requirements are met every time
Check that security controls are enforced consistently
Check that nothing bypasses compliance for the sake of speed

As a COO

Confidence that operations run efficiently at scale.

Check that delivery doesn't waste time or people
Check that costs stay under control as releases accelerate
Check that resources are optimised, not burned on rework

As a CTO

Confidence that the system enables innovation, not overhead.

Check that teams can adopt new tech without breaking delivery
Check that feedback loops support faster iteration
Check that engineering effort goes into features, not waiting

As Legal / Compliance

Confidence that compliance is provable, not just promised.

Check that every release leaves a clear audit trail
Check that evidence of rule-following is visible and accessible
Check that accountability is built into the process, not added after

A Learning System To Stop Regression

And here is the important reality: Even with the best systems, mistakes will happen. Unless you're willing to spend mission-critical budgets for near-zero failure, it is a matter of when, not if.

That is the reason the best teams also use their path to production as a learning system. They don't just aim to stop mistakes from ever reaching customers, they build the ability to never make the same mistake twice.

The learnings from dealing with every incident or bug become encoded into the pipeline as an additional guardrail, making the system more resilient over time. This is how you remove blind spots: your experience becomes part of the infrastructure, with clear visibility for all.

What Is a Path to Production?

It is the road your code travels through; from a developer's laptop to a live customer environment. It passes through a series of quality gates, each one answering the question:

Can we trust this change to move forward?

In technical terms, these are automated stages: build, test, package, deploy, validate, and promote. In leadership terms, it's a risk management system that turns "I think this is fine" into "Ready to Ship."

Golden Path vs Path to Production: Much of the industry talks about the Golden Path in platform engineering. While it overlaps with our idea of the Path to Production, the focus is often on pipelines and tooling. For us, that misses the deeper point: the real leverage comes from shared agreements between stakeholders. In our experience, solving this problem starts in the right order: with the conversation about what to do, not how.

The Stages of Confidence

We've seen this work time and again, in teams large and small, across industries. When you make it clear from the start that this is not a tooling conversation, but an agreement on how changes earn the right to ship, alignment becomes simpler.

01. Agree on principles first

It's easier to agree on principles than on tools. Once those principles are agreed, they become the compass for every decision.

02. Include business stakeholders

Discussing principles allows for the business to be part of the conversation. This is essential for cost-first quality decisions.

03. Build shared understanding

With that shared understanding, the path to production works — because the agreement matters more than the exact tools or steps.

Those tools will evolve. What doesn't change is the value of committing to a common set of standards. That's the first big step.

Stage 1: The Build

The path to production begins with the Build step, creating a deployable, versioned, immutable, artifact (DIVA): a sealed package of the application, stamped with a unique serial number. Like a vehicle identification number (VIN), it tells you exactly where it came from, what's in it, and when it was made.

Application Definition: An application here could be anything from a customer-facing website to a background service processing transactions. It evolves constantly: new features, bug fixes, security patches: which makes knowing exactly what is running, and where, essential.

DIVA Encapsulation

A DIVA (Deployable, Immutable, Versioned Artifact) is more than compiled code — it's a fully encapsulated unit that carries everything a change needs to earn the right to ship:

Deployable: Can be promoted through environments without modification
Immutable: Once built, it never changes — the same thing tested is what runs in production
Versioned: Every build is uniquely identifiable, reproducible, and traceable
Artifact (Everything as Code): A single package that includes business logic, infrastructure requirements, configuration, tests, and monitoring

If a build can change after creation, or if different environments are running slightly different versions, repeatability disappears. You lose the ability to trace defects with precision, prove what was deployed for audit purposes, or guarantee that a fix tested in one place will behave the same in another.

Pre-Seal Validation: Localized Tests & Security

Before an artifact is sealed, the pipeline executes localized tests within a pristine build environment. This mirrors a developer's local checks but guarantees no reliance on prior states. It's also where crucial security scanning begins, including static analysis and dependency checks, ensuring early detection of vulnerabilities.

This process ensures the artifact is a consistent, verifiable unit. Immutability guarantees that the package moving forward is precisely the one that passed these local tests: unchanged, dependable, and ready for subsequent stages.

This early validation underpins reliability for integration, meaningfulness for further testing, and predictability for production changes. The integrity established here forms the foundation for confidence throughout the entire path to production.

This critical environment performs static code analysis (SAST) and open-source dependency checks (SCA). These act as early-warning systems, akin to detecting a hairline crack in an aircraft wing before takeoff, rather than mid-flight. Identifying vulnerabilities here is significantly cheaper and faster to fix than later in the production path.

People respect what you inspect.

Linting means guiding coding standards, balancing consistency with developer autonomy. Consolidated visibility over these initial builds is paramount for guiding subsequent development steps effectively.

Cultural Impact of Reliable Builds

Cultural Ties

When the build stage always works, it's usually because engineers have already run their tests locally before committing. This habit doesn't need a mandate; it emerges naturally, because it keeps the flow unbroken for everyone else and that's how people really go faster every day.

Versioning and Confidence: Robust versioning makes these guarantees durable. Every build can be traced, reproduced, and rolled back if needed. That's how a change earns the right to ship: it proves not just that it works once, but that it can be trusted across time, environments, and teams. Confidence, that what you tested is exactly what you'll run, is the foundation of safe speed.

Stage 2: Fast Feedback

This stage of the pipeline exists to keep developers in flow. The goal is simple: give them realistic, real-world feedback on their changes in 15 minutes or less. Quick signals let developers fix issues immediately, instead of losing hours or days to context switching.

But fast feedback is about more than speed. It marks the shift from code delivery to service delivery. From this point on, developers aren't just writing software: they're delivering a running, observable, reliable service.

Operational readiness becomes part of the developer's craft, not something bolted on later. All tests from here forward are expected to run on production-grade systems, using the same tooling that production itself relies on.

Substage 2.1: Fast Feedback with Stubbed Dependencies

The quicker a change shows it works, the less it costs to fix when it doesn't. Waiting until a service is surrounded by all its real dependencies is like discovering the foundations are crooked when the building is already half-finished.

The Fast Feedback with Stubbed Dependencies stage prevents that. The artifact from Stage 1 is deployed into a production-like environment and tested in isolation using stubs.

What are stubs? Stubs are lightweight, simulated versions of the external systems a service depends on. Instead of waiting for every real system to be available, teams use stubs to mimic their behaviour, returning predictable responses to specific requests. A widely adopted tool for this is WireMock, created by Tom Akehurst, whose work has been greatly impactful to the industry.

Because these stubs are under our control, we can script them to be fully co-operative or intentionally problematic. This allows us to verify both the ideal and the difficult scenarios. In fact, there are many behaviours we can test with stubs that would be impossible or unsafe to test against a real dependency.

2.1.1. Stubbed Functional (Testing Behaviour)

Running the functional tests here gives teams autonomy. They can achieve high confidence in their service without depending on another team to prepare or maintain a shared environment and with the risk of impacting another team with the untested changes.

It's like a dress rehearsal: the service gets to run through its lines and cues in a controlled space before stepping onto the main stage with the rest of the cast.

Historical Challenge

Environments were heavyweight, long-lived, wasteful, costly, shared and slow to provision

Modern Solution

Lightweight, on-demand, self-service environments make it possible to deploy and test in minutes

This is also the first full functional run of the service the same infrastructure setup as production, but without interference from other teams/systems. If this stage passes, it means we can trust the service's core functionality in isolation.

Cultural Ties: Stubbed Functional Testing

When deploying to this stage is automatic and the environment is ready, writing the functional tests becomes a natural part of a development team's "definition of done."

It can also be something the business chooses to make a requirement, with acceptance criteria executed as part of the pipeline itself.

Teams start thinking about verification while they're building, not afterwards: a habit that emerges naturally because the pipeline helps drive that culture.

2.1.2. Stubbed Non-Functional (Performance)

Still in isolation, the service is tested for important non-functional qualities: how it performs under load, how it recovers from simulated failures, and whether it remains stable through intermittent disruption.

This is the difference between confirming a car starts, drives, turns vs. proving it can handle a sudden rainstorm or a steep hill.

Load Testing

Critical business journeys during fast feedback ensure key flows perform under expected peak demand.

Rolling Deployments

Version compatibility checks validate that one version can run alongside another during gradual rollouts.

Graceful Degradation

Under slow dependencies, the system fails elegantly rather than cascading errors.

Failure Simulations

Tests resilience to network latency or service discovery issues through chaos testing.

The economics of running performance tests have changed dramatically. With on-demand environments, we can spin up the exact scale we need for testing, run it for 15 minutes, and then tear it down, paying only for what we used.

Cultural Ties: Stubbed Non-Functional Testing (Performance)

The power of having stubs, and why we strongly advocate for them, is that we don't have to wait for it to rain. We choose how much to pour, how long to keep it coming, and repeat it as often as we like.

It puts real-world testing in our control. Many teams face the opposite problem: waiting on other teams, systems, or conditions before they can gain high confidence in their applications.

With stubs, we can trigger precise failure modes and stress conditions too risky or rare to test in production and enter integration safer, having done our own homework first and proven our part works before trying it out with the group.

2.2. Integration Tests

Substage 2.2: Integration Tests

After proving that a service works in isolation, the next step is to see how it behaves in the company of others. In this stage, all teams deploy their services into a shared, high-availability environment (Fast Integration).

The Fast Integration environment is treated like production: it must stay alive during working hours, and every team is responsible for keeping it healthy.

Non-disruptive deployments

New versions must come online without breaking what's already running

Production Grade Monitoring

Issues are detected the moment they appear

Clear response expectations

Teams fix problems immediately to keep everyone moving

Own your test data lifecycle

Each team manages the data their tests need, avoiding conflicts

Anyone can start a test at any time

The environment and services are always ready

Centralized Test Suites Run Alongside

Catching broader issues focusing on critical business journeys

Why these rules matter: In traditional setups, shared environments often suffer from "data chaos." Teams overwrite each other's test data, leave it in an inconsistent state, or depend on a specific dataset being present, only to find it changed by someone else.

Monitoring and alerting have also historically been disconnected from the application itself, managed by separate teams, on separate lifecycles, and often only connected in production. Thanks to encapsulation provided by the build stage, the DIVA has everything required to operate the service, including monitoring and alerting. When deployed, the service and its operational instrumentation arrive together, tested together, and proven together.

Cultural Ties: Integration Tests

Treating fast integration like production builds habits that carry forward: designing for safe rollouts, fixing issues immediately, and owning the health of the code you commit.

It fosters respect between teams, in the sense that each team arrives having done their homework in isolation, and they contribute to keeping the shared environment running for everyone else.

This marks the shift from code delivery to service delivery. Developers aren't just responsible for software; they're delivering a running, observable, reliable service.

For the organization, this shift means earlier detection of integration issues, faster recovery from problems, and fewer production incidents, all without the bottlenecks and firefighting of traditional release cycles.

Quality Gates

Quality gates are checkpoints after certain stages have completed. They are not about running more tests but about recognizing if a service has cleared all of the tests so far.

Build

Localized Test and Packaging a test DIVA

Quality Gate: Ready for Fast Feedback

Promoted to Fast Feedback testing queue

Fast Feedback

2.1. Stubbed Testing

2.1.1. Functional Testing

2.1.2. Non Functional Testing

2.2. Integration Testing

Quality Gate: Ready for Extended Testing

Promoted to extended testing repository

Reaching this point means the current version is ready for a new class of scrutiny: Extended Testing. That readiness is signalled by promoting the version into a repository or queue dedicated to extended test environments.

In practical terms, this is the "definition of done" for deployed tests. Passing this gate means the service is proven in isolation and in the company of others, with operational visibility in place.

Stage 3: Extended Testing

Extended testing is where we run the heavyweight tests that can't feasibly be executed on every minor change. These are often fully automated end to end and take hours to run, sometimes an entire night, so we batch them for efficiency.

The latest version to clear Stage 2 is deployed here once a night, and the tests are run in isolation from other services to avoid noise.

Peak Load Tests

Simulating the heaviest bursts of traffic the service is expected to handle, like Black Friday sales or major event launches.

Soak Tests

Running the service under sustained load for many hours to ensure it doesn't degrade over time from memory leaks or resource starvation.

Rolling Deployments

Confirming one version can gracefully coexist with another during upgrades.

Fault Simulations

Reproducing real-world conditions: slow downstream services, network latency spikes, DNS failures, or restarts at inconvenient moments.

At this stage, we also introduce promotion-preventing alerts, automated checks that must pass before a version can move forward.

Overnight, we run simulations of real client interactions, then using the application's own monitoring and alerting, we validate expected behavior. This ensures we have the right level of operational visibility and automation to always know if the application is working exactly as intended.

When teams must prove functionality themselves, they expose more metrics; giving operators clearer visibility and reducing handoffs.

Cultural Ties: Extended Testing

When extended testing is part of the normal flow, performance and resilience aren't "special projects." They're baked into every change.

Over time, teams build an instinct for designing features with these demands in mind, knowing they'll be tested and proven before being called 'done'.

Business Impact: Running these tests regularly means fewer surprises in production, fewer performance bottlenecks during peak events, and more predictable infrastructure costs. It also gives leaders confidence that the service can handle real-world conditions, proven in controlled, repeatable trials.

Stage 4: Canary and Production

In engineering, "canary" has become shorthand for releasing a change to a small, controlled slice of production to see if it's safe to proceed. The idea is simple: catch trouble early, when it's cheap and easy to fix, rather than late, when it's expensive and public.

For competitive businesses that depend on high-velocity change, canary deployments primarily mitigate risk. Most problems, if they happen, have a much smaller blast radius.

Deploy to Canary

Release to small subset of production traffic

→

Monitor Metrics

Watch real performance indicators

→

Expand Rollout

Gradually increase traffic percentage

→

Validate Success

Confirm service meets expectations

→

The ability to run two versions side by side, and all the engineering maturity that comes with it, gives teams and leaders the confidence to move faster. Everyone knows a rollback is not just possible, but routine.

Cultural Ties: Canary and Production

A well-run canary makes change safer and faster by reinforcing the expectation that all changes can be rolled out to production with a risk mitigation strategy.

This habit replaces the drama of high-stakes releases with the calm pace of small, recoverable steps. Progress becomes a product of design, not a gamble.

Build For Failure: A Reliable Path To Resilience
The cloud forced us to accept that servers crash, networks drop, and availability zones go dark. The lesson was clear: don't try to prevent failure, design systems that recover from it. Fast rollouts carry the same truth. New versions will fail sometimes. That isn't a reason to slow down: it's a reason to design for safe failure.

Why This Works

A strong path to production works because it is more than a set of automated steps: it's a contract that communicates the culture and expectations of delivery across the organization.

Encodes Culture into Engineering

Every stage reinforces the behaviors you want to see: testing before you push, monitoring early, treating shared environments like production.

Reduces Delivery Friction

Shared rules, clear expectations, and on-demand environments remove slow hand-offs and bottlenecks.

Catches Issues Early

Problems are found when they're easier to diagnose, cheaper to resolve, and less likely to have downstream impact.

Clear Speed vs Risk Dial

Leaders can see exactly how much a change has been proven before it reaches customers.

Encodes Lessons into Systems

When something slips through, the fix is baked into the pipeline and its test environments, so blind spots shrink over time.

Builds Repeatability and Resilience

Rebuilding and validating a service daily proves you could replace it from scratch with confidence.

Developer-Owned Operations

By encapsulating monitoring and alerting with code, teams gain a single source of truth.

Mitigates Risk While Enabling Speed

Techniques like canary deployment make high-velocity change safe through smaller impact but faster cadence.

A well-designed path to production doesn't just move code faster, it changes how teams work, how they think, and how reliably they deliver. Confidence stops being a feeling and becomes a property of the system itself.

Where to Start: Just a Chat

One of the simplest ways to understand the health of your delivery process is to show up curious and supportive, genuinely interested in how things work for the people doing the work.

Talk to one part of the team, then another, and before long you see the whole picture more clearly than any single part could.

Here's one question for each step of a healthy path-to-production pipeline, along with what you might be missing and what to watch for:

Diagnostic Questions for Each Stage

Build

Ask: "When we make any change to our service, is it versioned and traceable through every environment?"

Might be missing: A truly immutable build that is identical in every stage.

Look out for: Manual tweaks, undocumented changes, versioned artifacts built more than once.

Stubbed Functional

Ask: "Are tests written and run automatically every time code changes, as part of the definition of done?"

Might be missing: Clear ownership of writing tests and an environment ready to run them.

Look out for: Tests that rely on manual steps, or only run occasionally.

Stubbed Non-Functional

Ask: "How do we know we have enough CPU and memory to support our expected load, but not so much we're wasting money?"

Might be missing: Right-sizing based on realistic scenarios.

Look out for: Overprovisioned infrastructure or surprises under load.

Integrated Functional

Ask: "How easy is it for a team to run a test in a shared integrated environment at any time?"

Might be missing: Shared ownership agreements, data agreements and stability agreements.

Look out for: Access queues, blocked deployments, or unstable test data.

Extended Tests

Ask: "How do we know that our service will work for key events, like when we get peak load?"

Might be missing: SLAs, SLOs, NFRs, Right Sizing, Dynamic Scaling.

Look out for: Performance issues, Resource Under Utilization, Environment Reliability Issues.

Canary & Production

Ask: "Can we release changes in production without affecting all customers at once, and promote changes based on monitoring that actively proves a service is working as intended?"

Might be missing: Rollback confidence, parallel-version capability, automated promotion rules.

Look out for: All-or-nothing deployments, manual sign-offs, or slow rollback processes.

We'd Love to Hear Your Thoughts!

How the Best Teams Build Confidence