Automated Landing Zones in GCP Organizations

Author: Derek Mortimer | Posted on: August 12, 2024


What is a Landing Zone?

As cloud usage increases across organizations and more teams deploy resources, it becomes increasingly important to stay organized as platform operators to be able to ensure security best practices are being applied and also be able to attribute resources to their owners (e.g., for cost attribution, to discover responsible people/teams).

Cloud providers have evolved to allow you to group and organize related resources in similar ways:

  • AWS has Accounts that can be grouped into Operating Units
  • Azure has Resource Groups that can be grouped into Subscriptions
  • GCP has Projects which can be grouped into Folders

At the same time, the rise of Platform Engineering has shown the benefit to teams from enabling an autonomous onboarding experience to using the cloud, and placed an importance on enabling it with a good developer experience.

This has given rise to the term “Landing Zones” to describe the capability of provisioning an area within cloud providers where a team can safely operate, think of it as “this is the first place you’ll land when your team starts using the org’s cloud provider(s)”. We can generally think of it as:

  • Some isolated (in a security sense) area within a cloud provider’s hierarchy (e.g., a Project or a Folder within some GCP Organization)
  • A mechanism for the owners of the Landing Zone to operate within it (e.g., a Service Account they can use, or permissions granted to some GCP groups or users)

The rest of this article will focus on Landing Zones in the context of GCP Organizations to better explain some of the service and implementation specifics that can go into enabling Landing Zones as a capability.


GCP 101

Before we dive into more specifics, a quick primer on the resource model employed by Google Cloud :

  • The top level of the hierarchy is a GCP Organization
  • A single Organization can contain multiple Projects
  • A single **Organization **can contain multiple Folders
  • A single Folder can contain multiple Projects
  • Each Project contains infrastructure deployed by GCP Services

In addition to this, from an access model perspective:

  • Human Users exist with a globally unique email address
  • Machine Service Accounts exist with a globally unique email address
  • Groups have globally unique email addresses
    • Users, service accounts and other groups can all be members of groups
  • Users, service accounts and groups can be generally referred to as Principals
  • Policies grant roles on resources to principals
    • Policies that are applied at an Organization or Folder level apply to everything contained within them unless explicitly overridden (e.g., if you grant owner to a folder, you are granting owner to everything inside that folder)

This gives us everything we need to be able to organize resources and grant access to them.


When and Why?

If you have more than a couple of teams independently managing cloud resources, or strong requirements on isolation or cost attribution, our experience is that it pays off to be proactive in enabling Lanzing Zones as an automated capability.

A huge benefit of establishing an automated lifecycle as early as possible is to make the rollout of changes and evolutions to Landing Zones as smooth as possible, and remove the risk of not updating some manually/ad-hoc deployed resources.

Some of the specific benefits CECG have experienced by automating Landing Zones include:

  • It gives us a standardized path to quickly pave brand new GCP Organizations ready for use in a secure and standards compliant manner
    • This has been essential for clients who have strict regulatory and security requirements enforced at all levels from day 0
  • It allows us to safely provision isolated resources in existing GCP organizations without impacting existing operations
    • Where clients have existing workloads running in GCP, this provides confidence that existing and new workloads are isolated from each other unless explicitly permitted and configured
  • It allows us to quickly and safely bootstrap an existing or brand new GCP Organization for the CECG Core Platform offering
    • By automating the from-zero experience, we make it as easy as possible to use the CECG Core Platform in a safe and isolated setting and enable interaction with existing/external workloads when needed
  • It allows us to evolve the definition and requirements of Landing Zones and have automation bring all existing Landing Zones up to date, with controlled rollouts to allow validation and safe promotion.
  • It easily supports a workflow where required approvals and automated checks can be centralized on pull requests for smoother operations.
  • Definitions and workflows can easily be integrated into Developer Portals to allow tools such as Backstage to provision and manage Landing Zones

A Minimal Landing Zone

In the first section we minimally described a Landing Zone as “an isolated area within a cloud provider’s hierarchy” and “a mechanism for the owners to operate within that area”

From here on out we’ll refer to the owners of a Landing Zone as (Platform)** Tenants**, and the teams maintaining the automation that provisions Landing Zones as (Platform) **Operators**.


A Definition for Tenants

Similarly to managing tenants, applications and infrastructure of Internal Developer Platforms, we’ve had great success modeling Landing Zones as simple YAML (or JSON, TOML, et al.), as part of this we have a minimally useful definition:

  • A Landing Zone name, unique within the organization
  • A responsible contact (GCP user or group) where important cloud provider notifications will be sent
  • One or more principals permitted to manage infrastructure within the Landing Zone

This allows tenants to manage their Landing Zones as YAML objects that can be handled a variety of ways by operator, the definitions could be custom resources in a Kubernetes cluster, they can be plain files in a source code repository or they can be pushed to some kind of object store.


A GCP Project

Based on the minimal definition of a tenant, we define the minimal useful Landing Zone in GCP as an isolated GCP Project, for the following reasons:

  • A lot of GCP quotas apply at the project level, for example, the number and rate of virtual machines can be spawned is limited and if you placed two tenants in the same project, they could exhaust these quotas and inadvertently impact the other tenants. This is commonly referred to as the Noisy Neighbour problem, and as a principal we aim to avoid it in all Platform Engineering
  • By default, GCP prohibits Cross Project Networking (a.k.a Shared VPCs), meaning that the Networks (VPCs) created to house tenant infrastructure in separate projects cannot reach each other unless explicitly configured to do so. As platform operators you have the choice to safely grant cross project permissions to enable this kind of interaction when required by the business
  • Similar to networking, a number of interactions between GCP projects are denied by default, and can be enforced as organizational policies requiring specific overrides, platform operators can automate the validation and provisioning of these overrides to explicitly allow inter-project capabilities when required.

In addition to the GCP Project, we need to give principals permission to operate within the project, to provision and manage their infrastructure. We must also take care on the IAM permission boundary so platform tenants cannot make changes to the project that would break it for themselves, or grant anybody wider permissions (i.e., tenants get permissions inside the project and operators get permissions ​​over and inside the project). The two simplest ways to do this are:

  1. Grant roles directly to principals giving them permissions scoped to their new project
  2. Create a GCP Service Account with permissions scoped to the new project, and allow the specified principals to impersonate the service account so they can use its permissions

There are pros and cons to both of these approaches, granting roles directly to principals can be done on a just-in-time basis using Privileged Access Management, to minimize “standing” permissions available at all times. Service accounts have the benefit that you can allow other kinds of automation to impersonate them such as GitHub Actions using OIDC to impersonate a service account, granting permissions to operate inside a Landing Zone to CI/CD pipelines.

We recommend provisioning two service accounts with bindings allowing your CI/CD tooling of choice to impersonate them:

  • A privileged service account with permissions to actively administer infrastructure in your landing zone
  • A read-only service account with permissions to inspect infrastructure in your landing zone (this is useful for previewing changes in PRs)

The choice of allowing principals to impersonate service accounts vs. granting roles directly to principals is best evaluated depending on your requirements.


Automated Provisioning

Given a schema for defining Landing Zones, the next step is to automate their provisioning and ongoing management, in the previous section we defined a minimal Landing Zone in GCP:

  • A GCP project
  • admin and readonly service accounts (SAs) to manage the project from CI/CD
  • Some IAM bindings allowing teams to use the SAs and manage the GCP project

The next step is to automate the provisioning and ongoing management of these things. We won’t go deep into a single implementation but will instead cover the different Infrastructure-as-Code and orchestration technologies we’ve used to achieve automation in this area across multiple engagements, the most appropriate solution often depends on existing technologies and practices in place (e.g., IaC vs reconciliation loop from a k8s cluster, pipelines enacting change vs. gitops tooling pulling changes):

  1. Terraform and GitHub Actions: We defined a Terraform module to represent a single Landing Zone and then created a Terraform project which loads Landing Zone definitions from YAML files and creates an instance of the Landing Zone terraform module for each LZ defined in the YAML. GitHub actions were used to coordinate running test and deployment pipelines (e.g., terraform plan … for a pull-request, terraform apply … after a merge)
  2. Pulumi and Tekton: Similar to the previous point, we defined a Pulumi Component Resource representing a single Landing Zone and then a Pulumi project which read Landing Zone definitions from YAML files. Tekton was used to orchestrate test and deployment pipelines for a pull-request driven workflow where new LZs are provisioned after code merges
  3. Crossplane and ArgoCD: Crossplane is a Kubernetes (k8s) native Infra-as-Code tool which allows you to manage infrastructure via objects in your k8s clusters, Crossplane allows you to compose multiple resources together as a Composite Resource, we defined a LandingZone custom composite resource which managed all of the GCP infrastructure for each instance created. We then used ArgoCD to deploy all of the LandingZone objects contained in a GitHub repository into a management cluster running Crossplane. A pull-request driven workflow was used to approve changes to the defined LandingZone objects which are constantly synchronized into the cluster by Argo.

Taking Landing Zones Further

By adopting the steps we’ve outlined so far, you’ll be in a position where you have an automated loop for managing new and existing Landing Zones for platform tenants, and being able to easily update the definition of those Landing Zones as platform operators. This allows you to add new capabilities based on your needs as they evolve over time, the rest of this post details the various directions we’ve worked on across various engagements.


Extending the Landing Zone Model

The minimal useful model for a Landing Zone includes a unique name, a responsible contact for GCP notifications and a list of principals to manage infrastructure. This is a good start but we’ve seen clients extend the model in multiple ways:

  • Specify cost centers for cost attribution and FinOps
  • Finer grained responsible contacts such as “security”, “billing”, “technical”
  • Flags to indicate whether a Landing Zone contains PII data
  • Flags to indicate whether a Landing Zone is production or customer facing

Specifically in GCP, adding additional flags to Landing Zones can be used to add labels and tags to GCP projects which include or exclude them from inherited GCP IAM rules and Organizational policies.


GCP Folders as Landing Zones

Sometimes your platform tenants may require multiple projects that are logically related, or the ability to organize GCP projects into their own hierarchy of folders, in this case it can be beneficial to implement a Landing Zone as a GCP Folder into which tenants can manage their own projects. This has been useful at engagements where a Landing Zone is owned by teams operating an Internal Developer Platform and they wish to manage different environments (e.g., dev, stage and prod) as separate GCP folders under the same “IDP” Landing Zone.

In this case we usually recommend:

  1. For each Landing Zone, create a GCP folder into which the tenant will place and manage all of their projects and infrastructure
  2. For each Landing Zone, create a GCP folder to be used as a management project for the Landing Zone. This project will be owned by Platform Operators and used to provision Service Accounts which are given permission to operate within the Landing Zone’s GCP folder.

The reason we maintain a separate management project in this scenario is so that there is always some GCP project over which the platform operators have control which the platform tenants cannot accidentally break and lock themselves out of.


Automated OIDC Bindings

One of the most used additions to automated Landing Zones is setting up OIDC (or other federated identity) bindings which allow integration with on-prem identity management solutions and CI/CD tooling.

Our most commonly leveraged one is setting up GCP Workload Identity configuration for each Landing Zone to allow GitHub Actions to impersonate the admin and readonly GCP service accounts to manage infrastructure, all without having to rely on any stored credentials. The bindings can be configured to be very broad (e.g., any pipelines in this GitHub organization can use this GCP SA) down to extremely finely grained (e.g., only one specific commit hash of one specific workflow in a specific repository running from a specific branch can use this GCP SA), allowing you to satisfy most security use-cases.


Bootstrapping Infrastructure as Code Tool Usage

After a tenant has been provisioned with a Landing Zone, we’ve frequently had requests that tenants be able to use the same infrastructure-as-code tools that the platform operators use for consistency and sharing existing knowledge. For tools such as Terraform and Pulumi this can require some bootstrapping of an initial Google Cloud Storage bucket(s) that can be used to store state files.

The Landing Zone implementation can be extended to bootstrap all the things needed by a tenant to begin using Terraform or Pulumi straight away:

  • A GCS bucket used to store state
  • A GCP service account with the required permissions to preview and apply IaC changes

This capability pairs nicely with the GCP Folders as Landing Zones capability where the IaC tooling bucket and SAs can be created as part of the management project.


Automated Access to Shared Resources

It’s very common to run organization wide services such as Container Registries and GitOps repositories, extending Landing Zones to automatically provision configuration giving the required access to these shared resources becomes trivial. For example:

  • Provision GCP IAM policies allowing a Landing Zone specific SA to read any image from a shared container registry and to write any image under a team specific /path to the registry
  • Provision GitHub configuration (e.g., CODEOWNERS) allowing a team to push YAML k8s manifests to a specific directory in some shared GitOps repository.

Service Perimeters

GCP allows you to define Service Perimeters around one or more projects which govern all requests made to the GCP APIs affecting resources within the perimeter. It’s beyond the scope of this article to go in-depth on the usage of Service Perimeters but they are a powerful tool that can support security and regulatory requirements, and lock down the capability for one Landing Zone to affect resources in another.


Hierarchical IAM and Organizational Policies

Some GCP IAM roles and permissions can be granted at the Organization and Folder level, and are inherited by any projects contained within them. This gives you the power to establish baseline IAM policy which is inherited by all projects unless specifically overridden at a finer grained level (e.g., override Org policy at the Folder level, override Folder policy at the Project level).

The GCP docs give a good overview of the policies offered by these services, some of the most common we see used include:

  • Completely disable the generation of static JSON credentials for GCP Service Accounts
  • Completely disable cross GCP organization IAM permission grants

Billing Account Management

The GCP billing model is that you can define one or more Billing Accounts in an Organization, and then projects are individually associated with one of these billing accounts. It’s quite common for clients to have a single Billing Account defined for their organization so this functionality can be as simple as “explicitly associate all projects with the billing account”

In larger organizations we’ve seen multiple Billing Accounts configured, we solved this by extending a Landing Zone definition to include which billing account projects under the Landing Zone should be associated with.


That’s all, for now!

Hopefully we’ve convinced you that the benefits make it worth automating Landing Zones as a capability, and that our experience has given you food for thought about how you can leverage and extend them to fit your organizational needs.