Writing a custom Kubernetes Operator for the first time can be a bit of a challenge. It’s not obvious where to look or where to start!
There are various ways to build one, so sometimes all that choice gets in the way. Should you use client-go directly, Kubebuilder , or Operator SDK ? Just going through the process of writing a production-grade operator for the first time inevitably will force you to deal with some snags that can be time-kills.
The following is a collation of information that could help save some time; and clarify some finer details of Kubernetes operators. Note that this article will talk a lot about “reconciliation”. In brief, Kubernetes tries to “reconcile” the current state to a desired state. So for example, if you have a Deployment resource and you want to adjust the scale to be 5 replicas vs 3 replicas, and you do so by updating your resource with Kubectl, a built-in controller will see that a Deployment resource should have 5 replicas but in reality, there are only 3. The controller will reconcile this difference by creating 2 more pods to get to 5.
Note that this article assumes you have basic knowledge of Kubernetes, Kubernetes custom resources, Kubernetes Operator Pattern , and working knowledge of built-in Kubernetes Controllers. This blog in some parts will be tailored to operators written in Go.
We hope it serves you well!
1. Use Operator SDK
Why? It allows you to focus mainly on the reconciliation of resources.
You do not need to worry about watches, reformers and the nitty gritty details of client-go.
Essentially what that means, is that if building a Go version operator, Operator SDK will produce your operator go file and have the method called “Reconcile” and your job would be to fill in the logic for that method which is very convenient, since your focus is tailored.
Operator SDK creates most of the scaffolding, for example, Makefile, your custom CRD struct types if using the Go version, along with all the setup you need for the main.go file. Out of the box, you can deploy it to a local cluster on your machine and debug using your favourite IDE debugger.
Also, when needing to update your Operator SDK operator, it is usually a matter of updating only your go.mod file. Another observation between Kubebuilder and Operator SDK is that Operator SDK’s user experience is more polished. When you go to Kubebuilder it is an actual GitHub repository which is fine, but comparing that to Operator SDK which is a fully fleshed-out production website with lots of information and tutorials, it is just a bit more professional looking.
2. Read the Kubebuilder book
The Kubebuilder book is free and gives you a great rundown on most features that you would use when building a basic operator. There are nice examples, and the book is quite short and readable within a few hours. It is a great overview of what you can do out of the box with Operator SDK which behind the scenes uses Kubebuilder anyways.
3. Beware, all your applied custom resources could be processed on start-up by default, EVERY TIME
The interesting thing about operators regarding data loss is that when your operator is running as a pod on your cluster and it goes down for some reason, on starting back up, it will process every watched resource that exists on the cluster unless you do something extra to limit this.
Why does it do this? Because behind the scenes, the already created scaffolding you get from Operator SDK has code that sets up a Kubernetes Watch and that initial watch does not have a resource version, so the watch request to the Kube API server will get the state of your tracked resource, starting at the most recent and send ‘synthetic’ changes to your “Reconcile” function in your already created operator go file. The synthetic changes just mean that your Reconcile function will be called X times, X being the number of instances of the resource you are watching. Keep in mind as well that since you are processing the entire custom resource dataset on your cluster on pod re-starts, ensure from a performance perspective that your operator can cope quickly with processing that data. There are ways to change this behaviour, one being to limit your operator to watch only a specific namespace if feasible.
4. Infinite loops
The reconcile function always gets called when there is some change to a watched resource.
If you manage some custom resource and then update its status or update some metadata on the resource e.g., some custom label, then that will trigger another reconcile call. Therefore, it is important that when reconciling, be cognisant of how you update your resources.
For example:
If you blindly update your resources all the time without checking if something changed on the resource, you might end up reconciling indefinitely.
This issue can be missed as well because even though you reconcile infinitely, it might not show up as a failed test. One way to potentially avoid this is you can configure your controller to skip update events that do not change the custom resource’s metadata. generation field. See GenerationChangedPredicate .
5. Custom operators do not handle events, they only know that SOMETHING happened
When reconciling, we do not know if an update, create or delete happened.
We only know that the resource with a specific name and located possibly in some namespace has had a change.
This fits into the model of checking your current custom resource state with your desired state and determining if they match. If they do, you’re done! If not, then you need to bring the desired state to reality.
This results in calling the Kubernetes API Server and asking it for the current snapshots of resources you care about, if they are all located in Kubernetes, and then creating/updating/deleting some set of resources to satisfy the desired state.
A great side effect of this is that you do not have to worry about event ordering since you do not get events, you just know something happened, and it is up to you to view the changes in the cluster at that moment.
6. Resource Version Conflicts
The way Kubernetes resource updates work is like optimistic locking.
If you examine an applied Kubernetes resource, for example, a Deployment resource, you will see that it has a metadata field called ‘resourceVersion’ which contains some number.
When reconciling your custom resource on updates, Kubernetes will check this number. If Kubernetes detects that no changes occurred since your last read, it will allow the update and change the ‘resourceVersion’ number, otherwise, it will return a conflict. The thing to note here is that if you plan to do successive updates to your resource, ensure that you always work with the latest and greatest.
7. Webhooks are annoying when running locally for debugging
If you decide to add a mutating or validating webhook to your operator, be aware that running locally with that enabled can be a pain out of the box. The reason is that if you are running your operator locally in your dev environment and it is interacting with a local Kubernetes cluster on your machine, the Kubernetes API Server on the cluster will have a hard time calling out to your local running program due to networking/certificate issues. One quick workaround for this is to have program arguments for your program that skip installing the webhook resources on your local cluster and disable the webhook registration in your code.
It is quite easy to not install it and then have functional and integration tests to handle verifying webhook correctness on a real cluster.
8. Don’t panic! Or should you?
Panic handling in your Go operator is an interesting thing.
If you do not handle the panic, the pod running the operator will restart, if you handle the panic by recovering, maybe you are in a corrupted state so is it wise to continue?
One thing to note though is that if your panic is linked to a particular resource being reconciled, it can potentially cause a noisy neighbour effect. The reason this is so is that the panic caused by some reconcile call will cause the pod to restart, and on restarts, the operator will process every resource that you are watching, so it will try to process the offending resource again and potentially panic again, this can make for a bumpy ride regarding reconciling the resources that do not panic, the panicking one might block or slow down the reconciliation of the others. Make the choice that is right for your project.
9. Use vclusters to test your operator in isolation when needing full cluster functionality and isolation
Some operators need to work on cluster-wide resources or every specific type of resource even if namespace scoped. This can be a pain if you need to test these operators on clusters that are shared by multiple tenants. You could potentially have a hard time making deterministic tests.
One way to mitigate this is to use vclusters . So what do vclusters do for us? In just a few steps you can set up a Kubernetes cluster within a cluster. It sounds more complicated than it is.
By having this cluster in a cluster, you do not have to worry about anyone interfering with your operator, you can write deterministic tests and test in isolation. Why not use envtest ? You can use envtest for a lot of isolation testing but the big limitation of envtest is that it is not an actual Kubernetes cluster so built-in controllers are not available in envtest. That being said, decide what works best for your project.
10. Keep in mind how multiple workers would pan out
When you write your custom operator, it is possible to use multiple workers to process reconciles in parallel. In many cases, you do not need this, but it is worth having a quick thought about how it might be handled if needed. Keep in mind that out of the box, your only real form of concurrently handling reconciles is with more Goroutines. It is not trivial to have multiple pods for your operator running in parallel and processing data in parallel.
11. Manage metadata state carefully
There might be instances where you need to keep track of some state via metadata labels on the resources you are reconciling. It is easier to reason about it if the metadata just spans the resource you are updating. The reason is that the custom state can be managed with one Kubernetes API server call, e.g., one transaction. If your custom state spans multiple resources, then you will require multiple Kubernetes API server calls or multiple transactions to establish some state. When requiring multiple transactions or Kubernetes API calls, you need to think about what happens if you fail to perform those metadata state updates. Is it easy to recover, is it easy to reason about? If it is just one resource that manages your state, then the updates are atomic so it is a bit clearer.
Head over to Operator SDK and try out the quick tutorial located here . That tutorial plus the above points should give you a good footing going forward. If you are facing challenges in writing custom Kubernetes operators, we would love to hear from you!
Happy coding!