Evaluation¶
This document summarizes the findings from evaluating the Keycloak deployment and integration in Platform-Mesh, including advantages, disadvantages, and open questions.
Advantages¶
| Finding | Description |
|---|---|
| Bitnami Charts & Images | Leverages well-maintained Bitnami Helm charts and container images, reducing custom maintenance burden. |
| No External IdP Dependency | Self-managed Keycloak instance avoids dependencies on external IdPs and potential configuration drift that could cause system-wide issues. |
| Operator-Managed Lifecycle | Security Operator manages realms, users, and clients programmatically. (Note: Clarify if it only bootstraps or fully manages the lifecycle) |
Disadvantages¶
| Finding | Description |
|---|---|
| Mirrored Bitnami Images | Bitnami images are mirrored internally, presumably to work around the "latest-only-free" limitation. |
| Image Replacement Difficulty | Experience with PostgreSQL has shown that Bitnami images cannot easily be swapped for alternatives (e.g., Chainguard) since Bitnami adds additional initialization logic. |
| Weak Auto-Generated Secrets | Bitnami chart auto-generates secrets that are relatively weak (no special characters, length of 10). |
| Overly Permissive Service Accounts | Security Operator and IAM Service use service accounts with full admin privileges on Keycloak. |
| Java Technology Stack | Keycloak is built on Java, which may present a knowledge gap for teams without Java expertise. |
Neutral Observations¶
| Finding | Description |
|---|---|
| Minimal Chart Customization | Currently using mostly Bitnami chart defaults with minimal customization. |
| Open Network Access | PostgreSQL port and Keycloak UI/API are accessible to all pods by default via NetworkPolicy. The Helm chart supports configuration to restrict this. |
Open Questions¶
| Question | Context |
|---|---|
| Backup & Recovery Strategy? | No backup/recovery strategy is currently defined. Should Velero be used for Keycloak state? |
| Bootstrap vs. Full Management? | Does the Security Operator only bootstrap Keycloak resources, or does it continuously reconcile them? |
Analysis¶
When Self-Managed Keycloak Makes Sense¶
Self-managing Keycloak within Platform-Mesh provides value when:
- Full control over IdP configuration and lifecycle is required
- Air-gapped or isolated environments cannot reach external IdPs
- Multi-tenant realm isolation is needed with programmatic provisioning
- Tight integration with platform operators for user/client lifecycle
When External IdP May Be Preferable¶
Consider using an external IdP when:
- Managed IdP services (e.g., Auth0, Okta, Azure AD) are already in use
- Operational overhead of managing Keycloak is not justified
- Enterprise SSO integration is already established elsewhere
- High availability requirements exceed what can be self-managed
Recommended Next Steps¶
Short-Term
- Strengthen secret generation - Override Bitnami defaults to generate stronger secrets (special characters, minimum 32 characters)
- Review service account permissions - Apply principle of least privilege to Security Operator and IAM Service Keycloak clients
- Restrict network access - Configure NetworkPolicies to limit PostgreSQL and Keycloak API access to required components only
Medium-Term
- Define backup strategy - Evaluate Velero or PostgreSQL-native backups for Keycloak data
- Document lifecycle management - Clarify whether Security Operator bootstraps or continuously reconciles Keycloak resources
- Evaluate image alternatives - Assess feasibility of using hardened images (Chainguard, Ironbank) with required initialization logic
Long-Term
- High availability - Plan for Keycloak HA deployment with multiple replicas and session replication
- Monitoring & alerting - Implement Keycloak metrics collection and alerting for authentication failures, latency, etc.
- Disaster recovery testing - Regularly test backup/restore procedures
Component Trade-off Matrix¶
| Aspect | Self-Managed Keycloak | External IdP (e.g., Auth0, Okta) |
|---|---|---|
| Complexity | High | Low |
| Operational overhead | Significant | Minimal |
| Control | Full | Limited |
| Cost | Infrastructure only | License/subscription fees |
| Multi-tenancy | Native (realms) | Depends on provider |
| Customization | Unlimited | Provider constraints |
| Compliance | Self-managed | Provider-dependent |
Risk Mitigation¶
| Risk | Mitigation |
|---|---|
| Data loss | Implement automated backups with tested restore procedures |
| Weak credentials | Override Bitnami secret generation with strong defaults |
| Privilege escalation | Reduce service account permissions to minimum required |
| Network exposure | Restrict access via NetworkPolicies |
| Java vulnerabilities | Keep Keycloak updated, monitor CVEs |
| Knowledge gap | Document operational procedures, consider training |