Kubeadmiral, Karmada, and Multi-cluster Federation Standards
A client recently asked us for a comparison of two multi-cluster Kubernetes management technologies, specifically Kubeadmiral and Karmada . In this post we’ll introduce these technologies, and the Kubernetes standards that influenced their architecture, to enable multi-cluster workload orchestration. The focus will be entirely on managing the manifests inside existing Kubernetes clusters, specifically excluding discussion of standards such as the Cluster API which are concerned with the provisioning of new clusters from scratch.
Introduction
When we talk about multi-cluster orchestration (also known as federation), we are referring to having multiple Kubernetes clusters, each of which is managed by some centralized authoritative source of truth for what manifests should be deployed. For clarity, we will refer to them as hub and spoke clusters:
- Hub clusters are the authoritative single source of truth that defines what manifests should be deployed to all managed clusters
- Spoke clusters are associated with a hub cluster, the local state on a spoke cluster is constantly reconciled towards the desired state defined by the hub cluster
Two approaches exist for synchronizing this configuration between the hub and spoke clusters, the push and pull models:
- In the push model, the hub cluster proactively manages manifests deployed on the spoke clusters (e.g., by directly interacting with its Kubernetes API), ensuring that they stay reconciled against the defined configurations.
- In contrast, the pull model moves the reconciliation into spoke clusters themselves. Each spoke cluster monitors the hub cluster, autonomously pulling updates and reconciling local state as needed.
This model simplifies how platform engineering and application teams can deploy workloads consistently across an estate of managed clusters. By aligning all spoke clusters to a single source of truth, the system can automatically detect configuration drift and trigger reconciliation.
Crucially, this approach also leverages a Kubernetes-native approach to define, manage, and automate the federated rollout process, aligning with the broader cloud-native ecosystem for a familiar user experience.
Evaluation
We’ll evaluate the standards and solutions in terms of:
Capability | Reasoning |
---|---|
Dynamic Placement | Can manifests be dynamically placed on subset of spoke clusters? Can replicas be dynamically spread? |
Dynamic Configuration | Can Kubernetes resources of all types be synchronized to spoke clusters? Can Kubernetes resources have per-cluster overrides applied to them when synchronized? |
Operational Complexity | How difficult/complex are the tools to deploy? How difficult/complex are the tools to run? How are the solutions architected and designed? |
Operational Mode Support | Is push-based management supported? Is pull-based management supported? |
Before we get into evaluating Kubeadmiral, we’ll take a bit of a detour to look at the first set of Kubernetes working group APIs targeting the use case of multi-cluster federation: Kubefed.
Standards: Kubefed
Across its v1 and v2 incarnations, Kubefed
was a CNCF/k8s standard that introduced the terminology of a “federation control plane” (i.e., the hub cluster) to manage resources across multiple federated Kubernetes clusters (i.e., the spoke clusters). The hub is made aware of other clusters using Kubernetes objects — in v1, this was the Cluster
object, while in v2, the FederatedCluster
object served the same purpose. The federation control plane acts as a central manager, ensuring resources are synced and distributed across these clusters following the push model.
Kubefed v1 provided a predefined set of objects such as FederatedNamespace
, FederatedDeployment
, FederatedReplicaSet
, and FederatedIngress
, which were automatically synchronized across all spoke clusters. While functional, this version was rigid: only supporting a fixed set of resource types and applying them identically across all known clusters in the federation.
apiVersion: federation/v1
kind: Cluster
metadata:
name: cluster-a
spec:
# ... cluster endpoint and credentials ...
---
apiVersion: federation/v1
kind: Cluster
metadata:
name: cluster-b
spec:
# ... cluster endpoint and credentials ...
---
apiVersion: extensions/v1beta1
kind: FederatedDeployment
metadata:
name: nginx-deployment
namespace: nginx
spec:
# ... deployment template ...
The above example shows YAML defining 2 spoke clusters (cluster-a
and cluster-b
), and a FederatedDeployment
(to deploy nginx). Under Kubefed v1, the spec of the FederatedDeployment
would be identically applied across all known spoke clusters (a and b).
Kubefed v2 iterated on two particular pain points identified in v1:
- Only being able to federate a restricted set of types (e.g., Deployment, Ingress, ConfigMap)
- The inability to selectively target specific clusters for resource deployment
- The inability to do cluster-specific overrides on the configuration of manifests that are federated
To solve the first issue, v2 introduced the FederatedTypeConfig
object, which allowed Kubernetes administrators to specify any resource type they wanted to federate. For example, by creating a FederatedTypeConfig
for a resource like apps/v1/Deployment
, a corresponding FederatedDeployment
type would be automatically generated, enabling it to be synchronized across clusters.
The second and third issues were solved by adding placement
and overrides
support to all generated Federated*
object types which allow configuration of which clusters resources are federated to, and any cluster-specific overrides to be applied before synchronization.
apiVersion: extensions/v1beta1
kind: FederatedDeployment
metadata:
name: nginx-deployment
namespace: nginx
spec:
# ... deployment template ...
placement:
clusters:
- name: cluster1
- name: cluster2
overrides:
- clusterName: cluster1
clusterOverrides:
- path: "/spec/replicas"
value: 5
This example shows the placement
and overrides
properties being used to explicitly state which clusters the deployment should be federated across (cluster1 and cluster2), and that the spec.replicas
field should be set to 5
on the resource when it is synchronized to cluster1.
Despite the advancements in v2, Kubefed failed to gain significant community traction, being seen as complex, lacking popular community-supported solutions, and competing against emerging alternatives such as Google Anthos. These factors culminated in the archival and abandonment of the Kubefed standard and reference implementations.
Technology: Kubeadmiral
The first popular federation solution we’ll look at is Kubeadmiral , a system inspired by Kubefed v2, but not a direct implementation of it. It follows the push model of a hub cluster managing the resources across multiple spoke Kubernetes clusters.
At a high level, Kubeadmiral allows you to define PropagationPolicy
objects which target arbitrary Kubernetes objects and mark them for propagation to Federated Clusters, which are defined via FederatedCluster
objects. PropagationPolicy
objects allow you to target:
- All Kubernetes object of a given group, version, and kind (e.g.,
apps/v1/Deployment
) - A subset of objects of a given group, version, and kind, using typical
matchLabels
andmatchExpressions
- A specifically named object of a given group, version, and kind, explicitly by
name
NOTE:PropagationPolicy
are namespace scoped objects, so the objects they target must be in the same namespace to be synchronized into federated clusters. A ClusterPropagationPolicy
exists that allows you to manage cluster-wide syncing of Kubernetes objects.
Kubeadmiral supports the placement and overrides functionality of Kubefed v2 where you can explicitly list a subset of spoke clusters to synchronize resources across, but it also adds more sophisticated capabilities, notably including:
- clusterAffinity, clusterSelector & tolerations – Dynamically deploying federated resources to clusters based on cluster affinity, match labels, and match expressions
- autoMigration – Dynamically migrating resources away from clusters that have no capacity or have failed
- followerScheduling – Specify whether dependent resources should always follow their leader (e.g., a Deployment depends on a ConfigMap that must be present in the same cluster) to the target cluster(s)
- maxClusters – Specify an upper limit on how many clusters resources can be propagated across
- replicaStrategy – Configure whether Kubeadmiral uses a spread or bin-packing strategy when scheduling replicas across multiple clusters
- schedulingMode – “Duplicate” mode deploys a specified number of replicas across all selected clusters while “Divide” will use a configured strategy to place an overall desired number of replicas across available clusters
- reschedulePolicy – Specify when targeted resources should be rescheduled across target clusters, such as when clusters are added or removed, or manifests and policies are updated
The example YAML below demonstrates some of these features being exercised:
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-server
namespace: default
labels:
app: echo-server
spec:
replicas: 6
selector:
matchLabels:
app: echo-server
template:
metadata:
labels:
app: echo-server
spec:
containers:
- name: echo-server
image: ealen/echo-server:latest
ports:
- containerPort: 8080
---
apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
name: policy-echo-server
namespace: default
spec:
# Divide replicas dynamically across clusters
schedulingMode: Divide
clusterSelector: {} # Apply to all clusters
reschedulePolicy:
replicaRescheduling: # Enables replica rescheduling
avoidDisruption: true # Avoids moving replicas to prevent service disruption
intervalSeconds: 300 # Reschedules every 5 minutes if needed
In this example we:
- Define a
Deployment
whose replicas should be spread across spoke clusters - Define a
PropagationPolicy
that targets theDeployment
for federation- Selecting all known spoke clusters for federation
- Dividing the replicas (6) across all spoke clusters (e.g., 1 * 6, 2 * 3, 3 * 2)
- Allow rescheduling of replicas across known clusters on a regular basis, while avoiding disruptions
Issues and Conclusion
While the functionality offered by Kubeadmiral is a sensible evolution on the capabilities offered by Kubefed v2 which served as inspiration, the technology itself has a few issues which make adoption hard to recommend.
Notably, the documentation isn’t particularly comprehensive; you will often have to dive into the code to find the available options for available configuration properties and their impact, compared to the alternative products (which we’ll get onto later) Kubeadmiral is the most lacking in terms of docs. On top of this, Kubeadmiral’s contributions seem to have slowed down, and no official support beyond Kubernetes 1.24 is mentioned anywhere.
From a technical/deployment perspective, Kubeadmiral makes some interesting choices. When you deploy Kubeadmiral to a kubernetes cluster (which Kubeadmiral refers to as your meta-cluster), it actually creates a brand new virtual Kubernetes cluster inside your meta-cluster. This virtual cluster is your kubeadmiral cluster. This means you have your meta cluster, which runs your **hub **cluster virtually inside it, targeting all of your desired spoke clusters. The below figure shows a meta-cluster, containing a kubeadmiral cluster, managing two resources in two spoke clusters.
This virtual kubeadmiral cluster deployment, at time of writing, uses k8s 1.20 to deploy a customized apiserver and kube-controller-manager; these old versions are missing a lot of core API changes that have landed since it was released. The reason Kubeadmiral runs its own apiserver and controller manager is so it can prevent pods actually being scheduled inside your virtual kubeadmiral cluster and instead ensure they run only across the managed clusters.
The biggest issues with this approach are:
- The Kubernetes version of the virtual hub cluster determines the API capabilities your developers can use on the objects they ultimately want to synchronize across other clusters and is likely to be a complete showstopper.
- Having a virtual cluster configured entirely by Kubeadmiral (which may not be in line at all with your security or internal standards), plus the added complexity of that cluster subverting standard Kubernetes pod execution outside recommended approaches
The approach taken by Kubeadmiral, to avoid generating a Federated*
pair type for every resource you want to sync, is aimed directly towards addressing the perceived complexity of the original Kubefed standards.
However, the approach of augmenting Deployments with new functionality (e.g., scheduling across multiple clusters) is supported by registering your own controllers to be called during regular scheduling of pods. The choice by Kubeadmiral to subvert management of them by standard Kubernetes control plane components makes it a hard choice to recommend, especially when modern alternatives already address all of these pain points.
Standards: Work API
The Kubefed v2 standard had 2 main issues that hampered adoption and usefulness:
- The only supported model was push, where the hub proactively manages Kubernetes manifests across all spoke clusters. Requiring the central cluster to have privileged access to every cluster it manages can be both a security risk, and a scaling risk as the number of clusters under management scales.
- There was a proliferation of Kubernetes API types resulting in the dynamic generation of
Federated*
objects, which can be avoided using modern Kubernetes capabilities such as inserting your own controllers into the scheduling processes as middleware
The Work API
simplifies “manifests to be synced” down to Work
objects (which live on the hub cluster) defining manifests to be deployed to a given set of clusters, and AppliedWork
objects (which can live in the hub or a spoke) which represent reconciled status of manifests applied within a specific cluster.
apiVersion: work/v1
kind: Work
metadata:
name: example-work
namespace: cluster1
spec:
workload:
manifests:
- apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: nginx
spec:
... deployment spec ...
---
apiVersion: work/v1
kind: AppliedWork
metadata:
name: example-work
namespace: fleet-system
spec:
clusterName: cluster1
workName: example-work
status:
conditions:
- type: Applied
status: "True"
lastTransitionTime: "2023-10-01T12:00:00Z"
- type: Completed
status: "True"
lastTransitionTime: "2023-10-01T12:05:00Z"
You can think of these objects (Work
and AppliedWork
) as a “signal that some reconciliation needs to happen” and “status of some ongoing reconciliation” respectively. This simplified kernel allows you to support push and pull models:
- You can support a push model by either having the hub cluster actively manage resources on spoke clusters, and manage the status of
AppliedWork
objects - You can support a pull model by allowing spoke clusters to watch for
Work
objects on the hub cluster, and then reconcile their own local changes as required
You may have noticed that we’ve made no mention of placement, the Work
API deliberately is focused on describing the result of a decision, that is, “this set of manifests should be deployed to interested clusters.” These Work
objects are namespace scoped objects, with the expectation that spoke clusters “subscribe” to one or more namespace(s) they are interested in for Work
objects.
Another important note is that the manifests
property of Work
objects is validated when it is applied to spoke clusters, the Work API makes no assumptions about validation before a Work
object is generated, and will capture failures to apply in AppliedWork
and status
fields.
This intentionally loose coupling allows you to define your own placement strategies and workflows, whatever is making the decisions simply needs to create Work
objects to signal the outcome of those decisions, to be reconciled against the spoke clusters. Shown at an extremely high level in the below figure:
Another deliberate omission here is how you define spoke clusters from the hub cluster. In a pull model, the hub cluster may act simply as a source of truth from which all the spoke clusters pull. They simply need to be able to read via the standard Kubernetes API, without the hub knowing anything about the spokes. For a push model you would require the hub to be configured with knowledge and credentials for the spoke clusters it proactively manages, but it is specifically out of scope of the Work
API as defined.
Technology: Karmada
Karmada , currently an incubating CNCF project, is a Kubernetes native and modern architecture to cross cluster orchestration and federation. It supports both push and pull models through the use of the karmada-agent which can be installed in the hub cluster to push updates out to spoke clusters, or can be run in spoke clusters to pull changes observed in the hub and reconcile them locally.
Karmada runs its control plane components in the hub cluster, documented here . As stated above, the karmada-agent component may run in hub or spoke clusters as needed (for push and pull use cases respectively). The following figure shows an example architecture of a hub cluster managing two spoke clusters, one via push, and one via pull.
Karmada implements the Work API but also builds on top of it to offer dynamic placement capabilities:
- The
Work
API defines manifests to be deployed to subscribed spoke clusters- The
status
field of this object takes the place of a dedicatedAppliedWork
object, more in line with Kubernetes API norms
- The
- Around the
Work
API, Karmada offers:- Resource Templates: these are your Kubernetes manifests (e.g.,
Deployment
,ConfigMap
, or any valid type), which you would like to deploy to one or more spoke clusters - Propagation Policy: A
PropagationPolicy
object targets resources by group, version, kind, name, and possibly namespace, and selects which spoke clusters they should propagate to - Resource Binding: A
ResourceBinding
object represents the result of a resource template being selected for propagation to a spoke cluster, referencing an individual resource template and known cluster - Override Policy: An override policy applies cluster specific settings to resources after they are bound by a
ResourceBinding
object, but before they are propagated into aWork
object to be reconciled on spoke cluster
- Resource Templates: these are your Kubernetes manifests (e.g.,
Karmada offers the following capabilities for dynamic placement:
- Explicitly list the (sub)set of spoke clusters to deploy to
- Dynamically select the spoke clusters to deploy to using matchLabels or matchExpressions
- Strategies for replica placement and balancing (bin packing, spread, duplicate, divide)
- Trigger rebalancing when clusters come and go
- Dynamic rebalancing based on replica health
- Explicit dependencies between propagated resources
In addition to these capabilities, Karmada offers multi-cluster Ingress and Service support, although these capabilities require additional configuration to enable the necessary network connectivity.
Summary
In the evaluation section, we outlined a set of capabilities used to evaluate the standards and technologies in this blog post, the below table summarizes those capabilities against the technology
Capability / Tech | Kubefed v1 API | Kubefed v2 API | Work API | Kubeadmiral | Karmada |
---|---|---|---|---|---|
Can manifests be dynamically placed on subset of spoke clusters? | ⛔️ | ✅ | N/A | ✅ | ✅ |
Can replicas be dynamically spread | ⛔️ | ⛔️ | N/A | ✅ | ✅ |
Can Kubernetes resources of any type be synchronized to spoke clusters? | ⛔️ | ✅ | ✅ | ✅ | ✅ |
Can Kubernetes resources have per-cluster overrides applied to them when synchronized? | ⛔️ | ✅ | ✅ | ✅ | ✅ |
Complexity of operation? | N/A | N/A | N/A | 🔥🔥 | 🔥 |
Complexity of architecture? | N/A | N/A | N/A | 🔥🔥 | 🔥 |
Is push based management supported? | ✅ | ✅ | ✅ | ✅ | ✅ |
Is pull based management supported? | ⛔️ | ⛔️ | ✅ | ⛔️ | ✅ |
Distilling complexity of operation and architecture down to a pictogram isn’t particularly easy, the intention here is to indicate that Karamada has a cleaner, more Kubernetes-friendly approach to architecture and operation when compared to Kubeadmiral.
Conclusion
Karmada is a useful evolution in technology for federated resource management across multiple clusters. It can flexibly support a varied cluster estate operating in push and pull mode as required, and can offer additional niceties such as multi-cluster ingress and multi-cluster services if you wish to expose your workloads from a central cluster.
The functionality, community and sponsor support, CNCF adoption, and overall Kubernetes friendly design (i.e., avoiding all the non-standard pitfalls in deployment configuration that Kubeadmiral made) make Karmada an easy and solid choice for a cloud vendor agnostic, or on-premise, approach to multi-cluster Kubernetes resource management.
Honorable Mentions
While we were tasked with evaluating and comparing Kubeadmiral and Karmada, a number of other tools exist in the similar and adjacent areas. We’ll very briefly touch on them below.
Anthos
Anthos is a managed service offered by GCP, targeted specifically at the application of resources across multiple Kubernetes clusters, benefitting from enterprise level security and policy support offered by GCP. It supports a wide range of target clusters, including on-prem and across other Cloud Service Providers (e.g., AWS and Azure). Anthos is a solid offering with great documentation that comes with the usual trade-offs of locking in to a vendor-specific solution.
When directly compared to Karmada, Anthos doesn’t provide intelligent placement of replicas across multiple clusters, similar to Kubefed v1, Anthos simply ensures that a set of manifests is uniformly applied across multiple clusters. However, it is important to note that the Anthos suite can provide multi-cluster meshing which allows you to pursue similar ends, via different means.
Argo CD and Flux CD (GitOps)
Argo CD and Flux CD can both satisfy the use case of multi-cluster manifest management in a couple of ways:
- Locally reconciling manifests pulled from supported repositories (typically GitOps repos)
- Pull manifests from supported repositories and remotely push them to other clusters
As with Anthos, these technologies are not aware of individual replicas and their placement across an estate of clusters, but they do support dynamic generation of per-cluster configuration using techniques such as Application Sets .
Clusterpedia
Definitely on the edge of related technologies, Clusterpedia allows you to federate the status of existing objects across multiple clusters into a single place, allowing aggregated views of objects across your entire estate. It does not support synchronizing resources outwards to other clusters, but may form part of a strategy to monitor your estate.
Farewell
I hope you found this quick sojourn through multi-cluster federation standards and technologies useful and that it helped illuminate how these tools work and why they work the way they do!