Kubernetes Operators
Kubernetes Operators are patterns that help us extend the behavior of the cluster. Operators enable us to view an application deployed on Kubernetes as one item. That is your application can composed of a Pod, a Service, a ConfigMap a Deployement etc but you get to manage it as one item and have a much better control of their lifecycle. The lifecycle includes but not limited to installation and configuration and also manage failover and recovery relying on the APIs and Patterns provided by Kubernetes.
If you’re not new to Kubernetes you probably have used a couple of operators, like Grafana, Prometheus, and a couple of ones I blogged about in the past like Strimzi for deployment and management of Kafka Cass Operator for doing the same with Cassandra or DSE.
Now that we know what operators are, how do we build one?.
There are a couple of ways to build an operator. However, we will be focusing on build a simple one with Operator SDK. This is part of the Operator framework that is set of developer tools and Kubernetes components, that aid in Operator development and central management on a multi-tenant cluster. There are three options for Operator development with Operator SDK, that is Golang, Ansible, or Helm. This post is focused on doing this in Go. To get started, we first need to install the OperatorSDK.
Kubebuilder is a framework for building Kubernetes Operators. Operator SDK uses Kubebuilder under the hood to do so for Go projects.
Putting it all together
Putting it all together we can build a Mock Operator. It won’t do much but we will get to use the Operator SDK to build a custom Operator that basically launches a Kubernetes Deployment and helps us manage the lifecycle. This is available on Github as https://github.com/malike/mock-operator.
Before we start building we can define our Mock Operator specification.
i. Operator Specification
1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: app.malike.kendeh.com/v1alpha1
kind: SampleKind
metadata:
name: mock-sample
spec:
image:
repository: ghcr.io/malike/mock-operator/sample-mock-service
tag: latest
pullPolicy: Always
pullSecretName:
- name: regcred
nodes: 2
containerPort: 80
servicePort: 80
Things to note:
i. apiVersion
of the Operator is app.malike.kendeh.com/v1alpha1
ii. The Operator has one kind
which is SampleKind
iii. The Operator basically deploys a custom image which will passed as image.repository
iv. Other parameters that describe the image are in the image: {}
block.
v. We use nodes to specify the number of instances we want to deploy.
iv. Configure the port for services and the pod as servicePort
and containerPort
respectively.
After defining the specification we can proceed to the next step of actually building
ii. Start coding
Once you have Operator SDK installed, we can generate project files for our Mock Operator. By running this command, we create an initial package.
operator-sdk init --domain malike.kendeh.com --repo github.com/malike/mock-operator
After generating the project we can proceed and generate the api controller. We want our API controller to be called SampleKind
with group as app
.
operator-sdk create api --group app --version v1alpha1 --kind SampleKind --resource --controller
iv. Building Operator
Before we start coding let us look at the structure of the source generated by operator-sdk using the two commands.
This link on Operator SDK helps us understand the project layout structure much better.
Our changes will be much focused on the api and the controllers folders.
Base on our specification we can update the API spec of the operator to meet what defined. Our API has for main parameters, that is image: {}
, node: 2
, containerPort: 80
, servicePort: 80
. We can update the specication to include these parametes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// SampleKindSpec defines the desired state of SampleKind
type SampleKindSpec struct {
//+kubebuilder:validation:Type:=object
// Image defines image configuration
Image ImageSpec `json:"image,omitempty"`
//+kubebuilder:validation:Type:=number
//+kubebuilder:default:=2
// Nodes defines number of instance
Nodes int32 `json:"nodes,omitempty"`
//+kubebuilder:validation:Type:=number
//+kubebuilder:default:=80
// ContainerPort defines port for container
ContainerPort int32 `json:"containerPort,omitempty"`
//+kubebuilder:validation:Type:=number
//+kubebuilder:default:=80
// ServicePort defines port for service
ServicePort int32 `json:"servicePort,omitempty"`
}
Using kubebuilder CRD marker validation, we can enforce rules for these parameters. For example nodes
should always be an int32
. Once defined we can run the command make generate manifests
and then using the helper classes set up by Operator SDK the CRD and codes containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations that basically transforms the YAML to objects usable in Go.
After defining our API we can move to the controller package. There we should see the controller for the Kind generated by Operator SDK, called samplekind_controller
. It has a reconcile function which is responsible for enforcing the desired state of the system based on the CR applied. So if we need certain changes applied based on the CR applied, we can write code for that in this section.
The logic is here is pretty simple for the reconciliation function.
- Confirm if resource needs to be created
- If resource not found but should have existed, create it
- If resource exists, confirm if it is the same as specified in CRD
- If resource not found and it is not supposed to be created do nothing.
Pretty simple logic. Remember this function will be called in cycles.
For our MockOperator, we will need a two k8s resources.
- A Deployment
- Service to expose our deployment
Putting this together we will have something like this in our reconcile function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Fetch the SampleKind instance
sampleApp := &appv1alpha1.SampleKind{}
err := r.Get(ctx, req.NamespacedName, sampleApp)
if err != nil {
if errors.IsNotFound(err) {
// Request object not found, could have been deleted after reconcile request.
// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
// Return and don't requeue
log.Info("SampleKind resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
// Error reading the object
return ctrl.Result{}, err
} else {
log.V(1).Info("Detected existing SampleKind", " sampleApp.Name", sampleApp.Name)
}
// Check if the Deployment already exists, if not create a new one
deployment := &appsv1.Deployment{}
deploymentName := sampleApp.Name
err = r.Get(ctx, types.NamespacedName{Name: deploymentName, Namespace: sampleApp.Namespace}, deployment)
if err != nil && errors.IsNotFound(err) {
// Define a new configmap
deployment := r.newSampleAppDeployment(deploymentName, sampleApp)
log.Info("Creating a new SampleApp", "SampleKind.Namespace", sampleApp.Namespace, "SampleKind.Name", sampleApp.Name)
err = r.Create(ctx, deployment)
if err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
return ctrl.Result{}, err
}
service := &corev1.Service{}
serviceName := getServiceName(deploymentName)
err = r.Get(ctx, types.NamespacedName{Name: serviceName, Namespace: sampleApp.Namespace}, service)
if err != nil && errors.IsNotFound(err) {
// New service
service = r.newSampleAppService(deploymentName, sampleApp)
log.Info("Creating a new Service for SampleApp ", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
err = r.Create(ctx, service)
if err != nil {
//log failed to create
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
//log failed to create
return ctrl.Result{}, err
} else {
log.V(1).Info("Detected existing Service", " Service.Name", service.Name)
}
One other important thing, we need to make sure the MockOperator has access to create, update, delete and read these two k8s resources. so we add these three lines and with the help of kubebuidler, the next time we run make generate manifests
the right permissions will be give to the operator.
1
2
3
//+kubebuilder:rbac:groups="apps",resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;create;update;delete;patch
//+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete
v. Sample Image for the Operator
Now this part is optional, but needed so we can test the kubernetes deployment the operator manages. It is a simple docker image packaging, called sample-mock-service
, for a custom HTML page in nginx. This can be found in the folder.
vi. Testing with Ginko and Gomega
Testing the Operator is specifically a large topic and it is not something I can fully expand on in this subsection. There are better resources like this and this.
Our sample test will then look like this. It just uses BDD to confirm that when we create a SampleKind
, a deployment also gets created.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
var _ = Describe("Deployment test", func() {
const (
name = "deployment-test"
namespace = "default"
)
Context("When SampleKind is created, Deployment is created", func() {
It("allows deployment to be created and deleted", func() {
By("set up deployment", func() {
skMockOperator := &samplekind.SampleKind{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: namespace,
},
Spec: samplekind.SampleKindSpec{
Nodes: 1,
},
}
Expect(k8sClient.Create(ctx, skMockOperator)).Should(Succeed())
EventuallyWithOffset(10, func() bool {
smDeployment := &v1.Deployment{}
err := k8sClient.Get(ctx, types.NamespacedName{Name: skMockOperator.Name, Namespace: skMockOperator.Namespace}, smDeployment)
return err == nil
}).WithTimeout(20 * time.Second).Should(BeTrue())
//delete samplekind delete deployment
Expect(k8sClient.Delete(ctx, skMockOperator)).To(Succeed())
})
})
})
})
As you can see it is not extensive but since the MockOperator does little, the test coverage is pretty high as well.
vii. Automated Testings
Now we need to add a simple integration test to confirm our operator works as expected. Using https://github.com/helm/kind-action we can set up a simple Kind cluster in K8s and then test the deployment of MockOperator and then using curl we can confirm if we can access the custom html page deployed in nginx.
A section of the Github Action pipeline looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
name: Build and Test for Operator
on: workflow_call
jobs:
test:
name: Test
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v1
- name: Create k8s Kind Cluster
uses: helm/kind-action@v1.3.0
- name: Login to Github Packages
uses: docker/login-action@v2
with:
registry: ghcr.io
username: $
password: $
- name: Operator deployment
run: |
kubectl cluster-info
kubectl get pods -n kube-system
echo "current-context:" $(kubectl config current-context)
echo "environment-kubeconfig:" ${KUBECONFIG}
kubectl create ns mock-operator-system --save-config
kubectl create secret generic regcred --from-file=.dockerconfigjson=${HOME}/.docker/config.json --type=kubernetes.io/dockerconfigjson -n mock-operator-system
make deploy | grep created
kubectl rollout status deployment mock-operator-controller-manager -n mock-operator-system --timeout=30s
kubectl get crd | grep samplekind
- name: Create deployment
run: |
kubectl create secret generic regcred --from-file=.dockerconfigjson=${HOME}/.docker/config.json --type=kubernetes.io/dockerconfigjson -n default
kubectl apply -f ci/sample.yaml | grep "lewis-sample"
sleep 5 ; kubectl get all
kubectl wait pods --selector app.kubernetes.io/instance=lewis-sample --for condition=Ready --timeout=40s | grep "condition met"
kubectl get po --show-labels | grep lewis-sample | grep "1/1"
kubectl port-forward svc/lewis-sample-service 8080:80 &
sleep 5
curl localhost:8080 | grep mock
- name: Delete operator deployment
run: |
kubectl delete samplekind lewis-sample | grep deleted
Conclusion
Hopefully you found this useful and are able to kick-start your operator journey with this. The source code for this MockOperator can be found here.
References
https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
https://kubebyexample.com/learning-paths
https://www.infracloud.io/blogs/testing-kubernetes-operator-envtest/
https://betterprogramming.pub/write-tests-for-your-kubernetes-operator-d3d6a9530840