Introduction
I’m often asked for best practices when it comes to the OpenShift GitOps but I’ve never been a fan of the term “best practices” as it implies a “one true way” to do things. When it comes to GitOps so much depends on a variety of factors such as organizational (DevOps versus traditional Silos) structure, yaml management tools (Helm versus Kustomize versus others), and more.
As a result I like to categorize these practices into a set of buckets as follows:
- Recommended. Practices that are recommended for all organizations and situations
- Suggested. Applicable for most organizations and use cases but may vary in specific situations
- Situational. Practices that are highly dependent on the organization, use case and other factors.
In the subsequent sections we will dive into these categories with the practices that align for each one.
Recommended Practices
Keep Source Code and Manifests in Different Repositories
It is not uncommon to see folks starting with GitOps to mix source code (i.e. Java, Go, Python, etc) and manifests (i.e. Deployment, Service, Ingress/Route yaml) in the same repo. This is not recommended since the two typically have different lifecycles often with different teams maintaining each. It also leads to Argo CD potentially doing more reconciliation work as source code changes drive repository updates even though the manifests themselves haven’t changed.
Use a YAML Management Tool
Do not manage raw YAML directly in the git repository as this will lead to a lot of YAML duplication. Instead use a YAML management tool like Helm or Kustomize which will enable you to deploy largely the same YAML across multiple environments and clusters with the specific changes required for the target cluster and/or environment.
As a side corollary, while it is OK to state that a specific tool is the preferred tool, do not get locked into only that tool. For example with Helm and Kustomize I have a strong personal preference for Kustomize however there are use cases where Helm is the better fit (i.e. when templating is needed). I use Helm in those cases rather than try to contort Kustomize to make it work. Good carpenters have the ability to use the right tool for the job.
Version Manifests
Regardless of the YAML management tool, you should always be deploying versioned manifests and not deploying from the Head of the git repo or Latest from an OCI repository. Versioning ensures that changes can be rolled out across environments and clusters in a controlled manner with adequate testing and safeguards. The Argo CD documentation does a good job of covering this (Tracking and Deployment Strategies) but it is often overlooked.
With Kustomize I like using tag tracking or commit pinning. With Helm I like keeping my versioned charts in a Helm or OCI repository (i.e. not accessing the chart directly from git) and using tag tracking or commit pinning with the value files.
Validate Manifests via Linting
It's not uncommon to introduce an error when creating and modifying Kubernetes YAML manifests. Validating manifests in the git repository, or even better validating Pull Requests before they are merged, helps to ensure the correctness of the repository and catches these errors early.
My colleague Trevor Royer wrote a great blog on how to validate manifests, including ones generated by Kustomize and Helm, that is well worth checking out.
Use Annotation Tracking
By default Argo CD uses Label Tracking where it adds a label to resources it manages. The issue is that labels in Kubernetes are limited to 63 characters of data which means Argo CD can only include a limited amount of tracking information. When operators or controllers create resources they often copy the tracking label which leads Argo CD to think these operator resources are managed by Argo CD but are not in git resulting in an Out Of Sync status.
Annotation Tracking enables Argo CD to include much more information and automatically weed out these false positives without having to rely on cumbersome additional labels like IgnoreExtraneous to work around the issue. Note that in the future Argo 3.0 version annotation tracking will be the default.
Do not use the Default AppProject
Argo CD Application Projects, or AppProject, provide a way to logically group Applications. The AppProject can be used to restrict who has access to the Applications as well as restrict what resources these Applications are permitted to deploy.
Organizations starting with Argo CD often just roll with the Default AppProject that is available Out-Of-The-Box with Argo CD. The issue with this is that as more and more Applications rely on the Default AppProject you lose the ability to segment and manage these Applications.
Always define your own AppProject(s) and never use the Default.
Define Tenant RBAC in AppProject
When defining Role Based Access Controls (RBAC), Argo CD allows you to define them globally in the argo-rbac-cm ConfigMap (or the ArgoCD CR for the OpenShift GitOps operator) or in the individual AppProjects.
When defining RBAC for tenants in a multi-tenant Argo CD you should always be creating separate AppProjects for each tenant (see recommendation above) and defining the RBAC for that tenant in the same AppProject. This prevents the global RBAC from becoming overly complicated, messy and difficult to maintain.
Suggested Practices
Use Global AppProject for Common Settings
A lesser known feature of Argo CD is the ability of AppProject to inherit settings from a Global AppProject. This can be very useful to centralize common settings such as resource inclusions and exclusions across a multitude of tenant AppProjects. Additionally this enables centralizing Sync Windows allowing Platform teams to define common maintenance windows without having to define them individually in each AppProject.
Define Custom Health Checks for Custom Resources
In addition to health checks for standard Kubernetes resources, Argo CD provides Out-Of-The-Box health checks for a number of custom resources which can be found here. These health checks enable Argo CD to determine the health of the associated resources and aggregate them up to the Application with the appropriate health status (Degraded, Progressing, Healthy, etc).
Monitoring and alerting on the health of the Application basically provides free, low effort health monitoring for all of the resources managed by the Application which have defined health checks.
However Argo CD does not include health checks for all types of resources and in an operator rich environment like OpenShift this can be an issue. Writing custom health checks for critical resources that are not covered by the OOTB ones is a great way to enhance operational readiness.
Separate Argo CD instances for Cluster Configuration versus Application Deployments
In Argo CD there is a single service account that is responsible for deploying all resources on the cluster. This service account requires sufficient Kubernetes permissions sufficient to deploy the resources for all users of that Argo CD instance and in the case of cluster configuration this is typically cluster-admin level permissions.
While it is possible to limit what resources application teams can deploy via the Argo CD AppProject RBAC and resource inclusions/exclusions, it is not uncommon to make mistakes and accidentally leave holes resulting in the possibility of privilege escalation.
As a result, separating cluster configuration and application deployment use cases into separate Argo CD instances for maximum isolation is recommended. Typically with OpenShift GitOps, using the instance in the openshift-gitops namespace for cluster configuration and spinning up a new instance in a different namespace for application teams is the recommended approach.
Note: There is a new alpha feature in Argo CD (Developer Preview in OpenShift GitOps) that supports impersonation to mitigate privilege escalation. This feature will be very useful in multi-tenant Argo CD for additional isolation however my personal preference at this time is to still maintain separate instances for the two use cases given the level of privilege the cluster configuration use case requires and typically different personas interacting with each instance.
Minimize the Application-Controller Privileges
As per the previous section, the application-controller service account requires sufficient Kubernetes permissions to deploy all of the Argo CD managed resources. You should ensure that this service account has only the minimum level of privileges needed to support the use case. For example, for the Application Deployment use case it should not have permissions to deploy cluster level resources like ClusterRole, ClusterRoleBinding, OLM’s Subscription, etc.
In a previous blog when discussing the CONTROLLER_CLUSTER_ROLE setting I mentioned for the Application Deployment use case I like to leverage Kubernetes cluster role aggregation to tie the application-controller service account to the default Kubernetes admin role. You can see this in the blog in the gitops-controller-admin ClusterRole where it uses a clusterRoleSelectors to match the label rbac.authorization.k8s.io/aggregate-to-admin.
This will grant it the permissions that the Kubernetes project and Red Hat have deemed necessary for a namespace administrator to deploy application level resources without having to manually manage this yourself. Additionally if new operators are added OLM will automatically add new permissions for the new operator Custom Resources (CRs) to the standard Kubernetes cluster roles (admin, edit and view).
Of course if you operate in a highly regulated or secure environment you can choose to define the Kubernetes privileges independently and not use the default admin role.
Use Apps-In-Any-Namespace for Multi-Tenant Argo CD
When using multi-tenant Argo CD a common challenge is how to allow tenants to define Applications declaratively while preventing tenants from circumventing security by preventing them from modifying the referenced AppProject. Fortunately the Applications in any Namespace feature neatly solves this issue by enabling tenants to define Applications in their own namespaces while enforcing security by binding them to a specific AppProject controlled by the platform team.
Note that this feature requires a cluster scoped instance of Argo CD so some additional consideration is required when configuring cluster roles for the Argo CD application-controller as per the previous suggested practice. One side benefit of using a cluster scoped instance is better scalability since namespace scoped instances tend to scale poorly past a significant number of namespaces (typically in the 50-100 range).
Situational Practices
Use Resource Inclusion/Exclusion to Minimize Managed Resources
OpenShift is very operator heavy and as a result it includes a large number of Custom Defined Resources (CRDs) out of the box. This can sometimes be a challenge for Argo CD since by default the Application Controller will monitor all resource types in the cluster potentially leading to more utilization in Argo CD, the Kubernetes API server and Etcd.
The number of resources that Argo CD is monitoring can be minimized by using the resource inclusion/exclusion feature in Argo CD. When this is set Argo CD will completely ignore any resources that are not defined in this setting. One downside of this setting is that it can be onerous to manage on top of the Kubernetes RBAC permissions. A recent feature in Argo CD, Auto Respect RBAC, enables Argo CD to automatically exclude resources for which it has no privileges, minimizing this configuration.
Persist Health Status in Redis
Argo CD will by default persist resource health status in the Argo CD Application object. There is a setting in Argo CD 2.x that will cause Argo CD to persist this information in Redis instead which can reduce the number of writes on the Application thereby improving performance and reducing load on the API server and Etcd. Benefits are typically modest but it can be useful if you are having performance issues and need to squeeze a little bit more out of things.
Note that if you have tools that rely on reading this information out of the Application object that changing this setting could potentially impact those tools.
This setting is planned to be the default in Argo CD 3.0.
Monorepo Scaling Considerations
Monorepos is when unrelated projects are stored in the same git repository. This is a pattern that some larger organizations such as Google adopt and it is popular when organizations need to maintain a consistent commit history across multiple projects.
However Monorepos introduces the need for additional consideration with Argo CD in terms of scaling. By default Argo CD maintains caches and detects changes at the repository level, when changes happen in the repository this can lead to Argo CD invalidating the cache for all Applications as well as doing unnecessary reconciliation work for Applications that are not impacted by the change.
The Argo CD documentation does a great job of laying out these challenges in Monorepo Scaling Considerations and how to mitigate them. In particular users need to be aware of the manifest-generate-paths annotation which enables you to specify the path(es) the Application is tied to in the monorepo.
Conclusion
In this blog we went through a set of OpenShift GitOps Practices and when to consider them. Feedback on these practices as well as ones I may not have covered are greatly appreciated, feel free to leave a note in the blog comments with your thoughts.