The client is a large multinational that operates in different parts of the world. In each of these geolocations, there are different products, which means different developer platforms with different security requirements. For this reason, we needed to come up with a flexible solution that would allow us to have configurable rules and integrations per region.
Overview
More than following certain rules, or using a specific set of technologies, the main and probably more interesting challenge for this client was different.
This is a large multinational client, multi-tenanted, multi-cloud, multi-platform, multi-everything! Besides the obvious scalability challenge among many other interesting topics, this also created a big issue in terms of security as each deployment would have different security requirements. For example, for Host Vulnerability Scanning, some would use Crowdstrike , others Qualys while others would use even both.
There were two fronts to think about, the way to deploy the platform for this and the way for tenants to deploy their applications in the platform.
For the Platform
It was developed as a sort of “white-label” platform. This translated to a platform with enough levels of abstractions where we could simply input “branding” and security rules and configurations depending on the requirements for each of the platforms.
Some things were customizable such as:
- Host Scanning
- Supporting multiple tools and vendors to perform host scanning
- Image Vulnerability Scanning
- Ability to scan images as part of the CI and block unscanned images from being deployed by a chosen tool
- Governance policies
- Denying containers running with root user
- Denying privileged containers
- Enforce pulling images from a specific URL
- EDR
- Endpoint Detection Response - Configurable per platform/cluster
- SIEM
- Each platform could have its own rules and its own set of data that would be required to be sent to SIEM.
Others were standard between all clusters:
- Secure communication
- All communication made to clusters was encrypted
- RBAC was implemented to ensure only specific users could access specific resources. This is integrated with AzureAD.
- Network policies
- Same basic network policy setup but each cluster could have custom rules.
For the Tenants
Tenants should have as seamless an experience as possible. That would mean that there would be a simple and standard way to deploy to all of these different platforms. A CI/CD tool was developed that would achieve this by giving the required configurations for each of the different parts.
This means that if you want to deploy to Platform X they would be required to scan the image using Trivy , which could be implemented as a step in the CI.
Any breaking changes or implementations could also be done in an almost transparent way, provided the tenant was using this tool. This can speed up the release process of features since you would not need to wait for hundreds of tenants to complete that.
Security Team
For the security of the platform as a whole, a new team was created that assessed discovered vulnerabilities in the system. In one of the cases, it was found that by using a special Jenkins agent you would gain high privileges in the system that would allow you to change resources in the Kubernetes platform that the team should not have permission to do.
Similar to any other vulnerability, the team has to:
- Based on a variety of symptoms like if the vulnerability allows for privilege escalation, a CSVV score is given to understand the level of vulnerability.
- Have 30 days to fix/mitigate a major vulnerability.
The solutions can be varied. One of the potential solutions to restrict certain things from happening in the cluster is by creating a validating webhook to restrict certain keys in the created objects.
Besides this, the team is also responsible for maintaining and developing the security tools such as the policy controller and its rules.
Lessons Learnt
Having a secure platform has many layers! Not only do you need to ensure that the platform itself is secure, but you also need to have secure tooling for things like CI/CD, as well as providing ways for the tenant applications to be compliant with the security requirements.
Because of all of this complexity, we do see it being ignored or pushed to the backlog and there it only becomes a priority when it’s already too late and the vulnerability has already caused issues. That’s why it’s important to invest in it from the get-go and make it useful and easy to use for everyone involved.