Horizontal Scaling Strategy¶
Context and Problem Statement¶
As the number of WorkspaceClaim resources managed by the cloudapi-operator grows, a single instance of the operator may become a bottleneck. This can lead to performance degradation, increased reconciliation times, and reduced availability. To ensure the operator can handle a large number of resources and maintain high availability, a horizontal scaling strategy is required. This ADR explores different options for horizontally scaling the cloudapi-operator.
The key challenge is to distribute the reconciliation load of WorkspaceClaim resources across multiple operator replicas without causing conflicts or race conditions, where multiple replicas might try to reconcile the same resource simultaneously.
Glossary¶
- Sharding: The process of dividing the full set of resources into disjoint subsets, called shards. Each operator replica is then responsible for managing a single shard.
- Lease: A Kubernetes object used for leader election and coordination between replicas.
Considered Options¶
1. Leader Election (Default Kubebuilder Behavior)¶
The default behavior for operators built with Kubebuilder is to use a leader election mechanism. All operator replicas are running, but only one, the "leader," is actively reconciling resources. The other replicas are on standby and will only take over if the leader fails.
Pros¶
- Simplicity: This is the default implementation and requires no additional configuration.
- Safety: It inherently prevents conflicts by ensuring only one replica is active at a time.
Cons¶
- No Horizontal Scaling: This model does not distribute the workload. It only provides high availability (failover). A single replica still has to handle all resources, so it doesn't solve the performance bottleneck issue.
2. Controller Sharding¶
This approach involves dividing the WorkspaceClaim resources into multiple shards and assigning each operator replica to manage a specific shard. The sharding can be implemented by annotating the WorkspaceClaim resources with a shard identifier. Each operator replica is configured to only watch and reconcile resources belonging to its assigned shard.
A mechanism is needed to assign shards to replicas and to rebalance shards if a replica fails. This can be achieved using a coordinator pattern where a separate process or a designated leader assigns shard leases to the available replicas. The kubernetes-controller-sharding repository provides a reference implementation for this pattern.
Sharding Mechanism¶
- Resource Annotation: Each
WorkspaceClaimresource is annotated with a label indicating which shard it belongs to, e.g.,cloudapi.team-orca.dev/shard: "shard-1". - Operator Configuration: Each operator replica is started with an environment variable or command-line flag specifying the shard it is responsible for, e.g.,
-shard-name=shard-1. - Controller Predicate: The controller is configured with a predicate that filters out resources that do not belong to its shard.
- Shard Coordination: A separate coordination mechanism (e.g., a stateful set of shard masters or a leader-elected coordinator) is responsible for:
- Discovering available replicas.
- Assigning shards to replicas.
- Monitoring replica health and re-assigning shards if a replica becomes unavailable.
Architecture Diagram¶
---
config:
layout: elk
---
graph TD
subgraph Kubernetes Cluster
subgraph Shard 1
Replica1["Operator Replica 1<br>(manages shard-1)"]
Claim1["WorkspaceClaim 1<br>(shard=shard-1)"]
end
subgraph Shard 2
Replica2["Operator Replica 2<br>(manages shard-2)"]
Claim2["WorkspaceClaim 2<br>(shard=shard-2)"]
end
subgraph Shard N
ReplicaN["Operator Replica N<br>(manages shard-n)"]
ClaimN["WorkspaceClaim N<br>(shard=shard-n)"]
end
Coordinator["Shard Coordinator<br>(assigns shards to replicas)"]
end
Replica1 -- reconciles --> Claim1
Replica2 -- reconciles --> Claim2
ReplicaN -- reconciles --> ClaimN
Coordinator -- assigns --> Replica1
Coordinator -- assigns --> Replica2
Coordinator -- assigns --> ReplicaN
Pros¶
- True Horizontal Scaling: Distributes the reconciliation load across multiple replicas, allowing the operator to handle a much larger number of resources.
- Improved Performance: Reduces the reconciliation time for individual resources as each replica has a smaller set of resources to manage.
- High Availability: If a replica fails, its shard can be reassigned to another replica.
Cons¶
- Increased Complexity: Requires implementing a sharding and coordination mechanism.
- Management Overhead: The sharding mechanism itself needs to be managed and monitored.
- Potential for Uneven Distribution: If not designed carefully, some shards might end up with more "active" resources than others, leading to an uneven load distribution.
3. Hash-based Sharding Without Coordinator¶
This approach enables horizontal scaling without requiring an external coordinator service. Each operator replica independently determines which WorkspaceClaim resources to reconcile based on a consistent hash function applied to the resource's unique identifier (e.g., UID or namespace/name combination) modulo the total number of replicas.
Sharding Mechanism¶
- Replica Identification: The operator is deployed as a StatefulSet, providing each replica with an ordinal index (0, 1, 2, ...).
- Replica Count Discovery: Each replica watches the StatefulSet's
spec.replicasfield to obtain the current total replica count. - Resource Assignment: A replica reconciles a resource only if
hash(resource_id) % total_replicas == replica_ordinal. - Hash Function: Use a simple, consistent hash like CRC32 on the resource UID to ensure even distribution.
- Controller Predicate: Implement a predicate in the controller to filter out resources not assigned to the current replica.
This ensures decentralized, automatic load distribution. When replicas scale up or down, assignments adjust immediately without coordination.
Pros¶
- Lightweight Implementation: No external services, coordination logic, or shard management—pure hash-based assignment.
- Automatic Scaling: Kubernetes handles replica changes; assignments redistribute instantly.
- Simplicity: Easy to implement with existing controller-runtime predicates.
- No Operational Overhead: Avoids managing a coordinator (unlike Option 2).
Cons¶
- Rebalancing on Scale Events: Adding/removing replicas causes all resources to be reassigned, potentially increasing temporary reconciliation load.
- Potential Uneven Load: Basic hashing may not perfectly balance active resources; consistent hashing could mitigate this.
- StatefulSet Requirement: Requires StatefulSet deployment for ordinals; not compatible with plain Deployments.
- Temporary Orphaning: During scale-down, resources might briefly lack an assigned replica until the hash recalculates.
Decision Outcome¶
Chosen Option: Hash-based Sharding Without Coordinator (Initial Implementation), with Controller Sharding as Future Evolution
Reasoning:
While the default leader election model is simple, it does not address the core problem of performance bottlenecks under high load. The cloudapi-operator is expected to manage a large and growing number of WorkspaceClaim resources, making horizontal scaling a critical requirement.
The hash-based sharding approach offers a lightweight path to true horizontal scaling with minimal complexity, enabling quick implementation and immediate benefits. It avoids the operational overhead of coordinating shards while providing automatic load distribution. If monitoring reveals issues with uneven load distribution or rebalancing overhead, we can evolve to the more sophisticated controller sharding model (Option 2) for finer control.
The controller sharding approach remains valuable for long-term scalability with advanced rebalancing, drawing inspiration from solutions like the kubernetes-controller-sharding repository.
Implementation Status: Horizontal scaling is currently not implemented. We will revisit this ADR once real-world load requires the additional capacity and operational maturity to justify the sharding investment. As a prerequisite, we will implement monitoring to observe operator load and reconciliation latency trends so we can identify the right moment to invest in sharding. Implement Option 3 first as a lightweight horizontal scaling mechanism. Monitor operator load, reconciliation latency, and load distribution. If real-world usage shows the need for more controlled sharding, evolve to Option 2.