Skip to content

Resource Broker - Overview & Architecture Analysis

Relevance for CloudAPI Project

NOT IMMEDIATELY RELEVANT - Resource Broker is a future enhancement ("cherry on top") for our CloudAPI/CLI/Portal scenario.

Current Priority: CloudAPI, CLI, Portal implementation Future Consideration: Resource Broker as cloud broker layer once core CloudAPI is stable

Why Document Now?: Understanding Platform Mesh's full capabilities helps inform our architecture decisions, even though we won't implement this advanced feature initially.

Executive Summary

Resource Broker is a Kubernetes operator that abstracts multiple vendor APIs into unified, transferable specifications, enabling zero-downtime resource migration and dynamic provider selection. It solves vendor lock-in by allowing platforms to standardize how databases, certificates, and similar resources are requested and provisioned across diverse backends.

Key Innovation: Change a single field (e.g., domain suffix) and resources automatically migrate between providers without application code changes.

Our Use Case: After CloudAPI/CLI/Portal are stable, Resource Broker could provide multi-cloud brokering capabilities - automatically routing cloud resources (compute, storage, databases) to different providers based on policies.

Status: Alpha (v1alpha1) | License: Apache 2.0 | Language: Go (84%) Repository: https://github.com/platform-mesh/resource-broker


Quick Facts

Category Details
Purpose Multi-provider resource abstraction & zero-downtime migration
Maturity Alpha (active development, 360 commits)
API Version v1alpha1 (broker.platform-mesh.io)
Core CRDs AcceptAPI, Migration, MigrationConfiguration
Integration KCP APIExport/APIBinding, Virtual Workspaces
Key Feature Zero-downtime lifecycle management with staged migrations
Use Cases Multi-tenant SaaS, hybrid cloud, DR, cost optimization
Dependencies KCP, cert-manager (example), KRO

What is Resource Broker?

Core Value Proposition

Organizations can offer generic APIs while dynamically routing fulfillment to different providers based on policies and capabilities. No vendor lock-in, no manual migrations, no downtime.

Resource Broker abstracts multiple vendor APIs into unified specifications, solving the fundamental problem that "running services requires other services".

The Problem It Solves

Traditional Approach Problems

Vendor Lock-In

  • Applications directly depend on specific provider APIs (AWS RDS, Azure Database)
  • Changing providers requires code refactoring

Manual Migrations

  • Moving workloads requires downtime windows
  • Complex coordination across teams
  • High risk of data loss or service disruption

Platform Complexity

  • Each provider has different APIs and configurations
  • Operations teams must learn multiple systems
  • No unified monitoring or management

No Abstraction

  • Consumers must know provider-specific details
  • Cannot switch providers without application changes
  • Testing against multiple providers is difficult

Resource Broker Solution

The Magic of Abstraction

Change app.internal.corpapp.corp.com and the broker automatically switches providers without code changes, downtime, or manual intervention.

┌─────────────────────────────────────────────────────────────┐
│                    Consumer Application                      │
│                  (Generic Certificate API)                   │
│                                                              │
│    kubectl apply -f certificate.yaml                        │
│    spec:                                                    │
│      fqdn: app.internal.corp  ← Change this one field!     │
└────────────────────────┬────────────────────────────────────┘
                         │ Creates: Certificate
┌─────────────────────────────────────────────────────────────┐
│                  Resource Broker                             │
│              (Routing & Lifecycle Management)                │
│                                                              │
│  • Filter Matching    • Zero-Downtime Migration            │
│  • Provider Selection • Automated Cutover                   │
└──────┬──────────────────────────────────┬───────────────────┘
       │                                   │
       │ *.internal.corp                  │ *.corp.com
       │                                   │
       ▼                                   ▼
┌─────────────────┐              ┌─────────────────┐
│  InternalCA     │              │  ExternalCA     │
│  Provider       │              │  Provider       │
│  (cert-manager) │              │  (Let's Encrypt)│
│  Private PKI    │              │  Public CA      │
└─────────────────┘              └─────────────────┘

Real-World Scenario

Before Resource Broker: Migrating from internal CA to Let's Encrypt

  • Update application code (certificate paths, CA bundles)
  • Change deployment configs (secrets, volumes)
  • Schedule downtime window (coordinate with stakeholders)
  • Manual certificate reissuance (for all apps)
  • Update secret references (in multiple places)
  • Test thoroughly (high risk)
  • Downtime: 2-4 hours

With Resource Broker:

  • Change one field: fqdn: app.corp.com (or label)
  • Broker detects change and matches new provider
  • Routes to Let's Encrypt automatically
  • New certificate issued in background
  • Zero downtime cutover
  • Automatic secret sync to consumer
  • Downtime: 0 seconds

Architecture

Three-Tier Design Philosophy

Resource Broker uses a separation of concerns architecture where consumers, coordination logic, and providers are completely decoupled. This enables dynamic routing, zero-downtime migrations, and vendor independence.

Three Distinct Roles

1. Platform/Coordination Cluster

Platform Layer Details

Purpose: Central control plane for routing and lifecycle management

Hosts:

  • Resource Broker Operator
  • KCP Control Plane
  • APIExports (AcceptAPI, Generic Resource APIs)
  • Virtual Workspaces

Responsibilities:

  • Route requests to matching providers
  • Record migration states
  • Orchestrate zero-downtime cutover
  • Manage provider health and availability
  • Enforce routing policies

Location: Typically shared infrastructure cluster

2. Consumer Clusters/Workspaces

Consumer Layer Details

Purpose: Where users create high-level resource requests

User Experience:

# Generic API - no provider-specific details!
apiVersion: example.com/v1alpha1
kind: Certificate
metadata:
  name: my-app-cert
spec:
  fqdn: app.internal.corp
  validity: 90d

Benefits:

  • Simplified API: No provider knowledge needed
  • Abstraction: Same API across all providers
  • Portability: Move between providers seamlessly
  • Consistency: Uniform resource model
  • Security: No direct provider access needed

Location: Per-team/per-tenant workspaces

3. Provider Clusters/Workspaces

Provider Layer Details

Purpose: Execute actual resource provisioning

Components:

  • Specialized controllers (cert-manager, database operators)
  • AcceptAPI declarations (capability advertising)
  • Physical infrastructure (databases, CAs, storage)

Capabilities:

# Provider declares: "I can serve Certificates for *.internal.corp"
kind: AcceptAPI
spec:
  gvr:
    resource: certificates
  filters:
  - key: spec.fqdn
    suffix: ".internal.corp"

Benefits:

  • Specialization: Each provider optimized for specific use cases
  • Isolation: Providers don't see other providers
  • Flexibility: Add/remove providers dynamically
  • Cost Control: Route to cheaper providers for dev/test

Location: Per-provider infrastructure (cloud regions, on-prem, etc.)

Component Diagram

┌────────────────────────────────────────────────────────────────┐
│                Platform Cluster (KCP)                           │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │        Resource Broker Operator                           │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌───────────────┐  │ │
│  │  │   Routing    │  │  Migration   │  │   Provider    │  │ │
│  │  │    Logic     │  │ Orchestrator │  │   Selection   │  │ │
│  │  └──────────────┘  └──────────────┘  └───────────────┘  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                 │
│  ┌──────────────────┐         ┌───────────────────────────┐  │
│  │ AcceptAPI        │         │ Generic Resource APIs      │  │
│  │   (APIExport)    │         │   (Certificate, Database)  │  │
│  │                  │         │    (APIExport)             │  │
│  │ Providers bind   │         │ Consumers bind             │  │
│  │ to declare caps  │         │ to create resources        │  │
│  └──────────────────┘         └───────────────────────────┘  │
│                                                                 │
└────────┬────────────────────────────────────────┬──────────────┘
         │                                         │
         │ APIBinding                              │ APIBinding
         │ (Provider capabilities)                 │ (Consumer requests)
         │                                         │
    ┌────▼─────────────────┐              ┌───────▼──────────────┐
    │ Provider Workspace   │              │ Consumer Workspace   │
    │  (InternalCA)        │              │  (Team-A)            │
    │                      │              │                      │
    │  ┌────────────────┐  │              │  ┌─────────────────┐│
    │  │ AcceptAPI      │  │              │  │ Certificate     ││
    │  │  - GVR: Cert   │  │              │  │  - FQDN: app... ││
    │  │  - Filter:     │  │              │  │  - Status: Ready││
    │  │    *.internal  │  │              │  └─────────────────┘│
    │  └────────────────┘  │              │                      │
    │                      │              │  Secret synced       │
    │  cert-manager +      │              │     automatically    │
    │     KRO              │              │                      │
    │                      │              └──────────────────────┘
    │  Physical PKI        │
    │                      │
    └──────────────────────┘

Key Features

1. AcceptAPI Declaration

Provider Capability Advertising

Providers declare which APIs they can service with optional filtering based on resource properties.

Example: Accept certificates only for internal domains

apiVersion: broker.platform-mesh.io/v1alpha1
kind: AcceptAPI
metadata:
  name: internal-ca-certificates
  namespace: internal-ca-provider
spec:
  gvr:
    group: example.com
    version: v1alpha1
    resource: certificates
  filters:
  - key: spec.fqdn
    suffix: ".internal.corp"
  - key: spec.certType
    valueIn: ["server", "client"]
status:
  conditions:
  - type: Available
    status: "True"

Filter Capabilities

Suffix Matching: suffix: ".internal.corp" → Matches app.internal.corp Value Lists: valueIn: ["postgres", "mysql"] → Matches specific values Numeric Ranges: boundary: {min: 10, max: 1000} → Storage between 10-1000 GB Multiple Filters: Logical AND - resource must satisfy ALL filters

2. Zero-Downtime Lifecycle Management

The Killer Feature

When a provider can no longer back a resource, the operator provisions a replacement elsewhere before switching consumers over. Critical for stateful services like databases.

How it Works:

sequenceDiagram
    participant C as Consumer
    participant RB as Resource Broker
    participant P1 as Provider A
    participant P2 as Provider B

    C->>RB: Resource no longer matches Provider A
    RB->>P2: Provision replacement on Provider B
    P2->>RB: Resource Ready
    RB->>RB: Initial data copy (if stateful)
    RB->>C: Switch pointer to Provider B
    RB->>P1: Cleanup old resource
Hold "Alt" / "Option" to enable pan & zoom
  1. Detect Change: Resource properties no longer match current provider
  2. Provision New: Create replacement on matching provider (background)
  3. Data Copy: Replicate state if stateful (databases, storage)
  4. Cutover: Switch consumer reference (atomic operation)
  5. Cleanup: Remove old resource after confirmation

Database Migration Example

Scenario: Move PostgreSQL from AWS RDS to Azure Database

Traditional Approach:

  • Downtime: 4-6 hours
  • Risk: Data loss, connection errors
  • Team coordination: DBAs, DevOps, App teams

With Resource Broker:

  • Downtime: 0 seconds (read-only mode: ~30 seconds)
  • Automated: Data copy → Cutover → Cleanup
  • Single operator: Update label or field

3. Migration Framework

Staged Migration Control

Controlled transitions between providers via Migration and MigrationConfiguration resources with defined stages.

Migration Stages:

stages:
1. Initial Data Copy
   ├─ Backup current data
   ├─ Replicate to new provider
   └─ Verify consistency

2. Cutover
   ├─ Enable read-only mode (brief)
   ├─ Final delta sync
   ├─ Switch active pointer
   └─ Verify application connectivity

3. Finalize
   ├─ Monitor new provider
   ├─ Cleanup old resources
   └─ Remove migration artifacts

Example MigrationConfiguration:

apiVersion: broker.platform-mesh.io/v1alpha1
kind: MigrationConfiguration
metadata:
  name: database-aws-to-azure
spec:
  from:
    group: database.example.com
    version: v1alpha1
    kind: Database
  to:
    group: azure.database.example.com
    version: v1alpha1
    kind: AzureDatabase
  stages:
  - name: initial-copy
    successConditions:
    - "status.copyProgress == 100"
    - "status.consistencyCheck == 'passed'"
    templates:
      backup-job: |
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: db-backup-{{ .From.Name }}
        spec:
          template:
            spec:
              containers:
              - name: pg-dump
                image: postgres:15
                command: ["pg_dump", "-h", "{{ .From.Host }}"]
    progress: true  # Allow progression to next stage

  - name: cutover
    successConditions:
    - "status.ready == true"
    - "status.connections > 0"
    templates:
      switch-config: |
        # Update connection strings
        # Enable new endpoint

Success Conditions

Each stage has CEL expressions that must evaluate to true before progressing. This ensures data integrity and prevents premature cutover.


Custom Resources (CRDs)

AcceptAPI

Purpose

Provider capability declaration - advertises which APIs a provider can serve

Structure:

apiVersion: broker.platform-mesh.io/v1alpha1
kind: AcceptAPI
metadata:
  name: my-provider-api
  namespace: provider-workspace
spec:
  gvr:  # GroupVersionResource to accept
    group: example.com
    version: v1alpha1
    resource: databases
  filters:  # Optional resource constraints
  - key: spec.engine
    valueIn: ["postgres", "mysql"]
  - key: spec.storage
    boundary:
      min: 10
      max: 1000
status:
  conditions:
  - type: Available
    status: "True"
    reason: ProviderReady

Key Fields:

  • spec.gvr: The API this provider can serve
  • spec.filters: Optional constraints (which resources to accept)
  • status.conditions: Provider availability status

Migration

Purpose

Resource migration between providers with zero downtime

Structure:

apiVersion: broker.platform-mesh.io/v1alpha1
kind: Migration
metadata:
  name: db-migration-aws-to-azure
spec:
  from:  # Source resource
    gvk:
      group: example.com
      version: v1alpha1
      kind: Database
    name: production-db
    namespace: default
    clusterName: aws-provider
  to:  # Target resource
    gvk:
      group: example.com
      version: v1alpha1
      kind: Database
    name: production-db
    namespace: default
    clusterName: azure-provider
status:
  state: CutoverCompleted  # Pending | InitialInProgress | CutoverCompleted | Failed
  stage: finalize
  id: migration-12345
  conditions:
  - type: DataCopied
    status: "True"
  - type: CutoverComplete
    status: "True"

Key Fields:

  • spec.from: Source resource reference
  • spec.to: Target resource reference
  • status.state: Migration phase (lifecycle progression)

MigrationConfiguration

Purpose

Define HOW to migrate between two resource types (the migration playbook)

Structure:

apiVersion: broker.platform-mesh.io/v1alpha1
kind: MigrationConfiguration
metadata:
  name: database-migration-config
spec:
  from:  # Source GVK
    group: example.com
    version: v1alpha1
    kind: Database
  to:  # Target GVK
    group: azure.example.com
    version: v1alpha1
    kind: AzureDatabase
  stages:  # Ordered migration steps
  - name: initial-copy
    successConditions:  # CEL expressions
    - "status.copyProgress == 100"
    templates:  # Kubernetes resources to create
      backup-job: |
        # Job YAML template
    progress: true  # Advance to next stage on success

Key Fields:

  • spec.from/to: Source and target GVKs
  • spec.stages: Ordered migration steps with success criteria
  • spec.stages[].templates: Kubernetes resources to create per stage

Integration with Platform Mesh

KCP Workspaces

KCP APIExport/APIBinding Pattern

Resource Broker leverages KCP's APIExport/APIBinding mechanism for workspace isolation and virtual resource access.

Architecture:

Platform Workspace (root:platform-mesh)
├── AcceptAPI (APIExport) ← Providers bind this to advertise capabilities
├── Certificate (APIExport) ← Consumers bind this to request resources
└── Resource Broker Operator (sees all via Virtual Workspace)

Provider Workspace (root:providers:internal-ca)
├── APIBinding → AcceptAPI (from platform)
├── AcceptAPI CR (declares: "I serve *.internal.corp certs")
└── Compute Cluster (physical cert-manager)

Consumer Workspace (root:orgs:acme:production)
├── APIBinding → Certificate (from platform)
└── Certificate CR (requests: cert for app.internal.corp)

Benefits of this Architecture

  • No Direct Access: Consumers don't need provider kubeconfigs or credentials
  • Workspace Isolation: Providers can't see other providers or consumers
  • Virtual Workspaces: Resource Broker sees all resources through APIExport virtual workspaces
  • Multi-Tenancy: Perfect for SaaS platforms with per-customer workspaces

Virtual Workspace Magic

Request Flow Through Virtual Workspaces

1. Consumer creates Certificate
   ├─ Workspace: root:orgs:acme:production
   └─ Resource: Certificate(fqdn: app.internal.corp)

2. APIExport Virtual Workspace exposes it
   ├─ Resource Broker sees it via VirtualWorkspace
   └─ All consumers' resources visible in one view

3. Resource Broker matches filters
   ├─ InternalCA: *.internal.corp MATCH
   ├─ ExternalCA: *.corp.com NO MATCH
   └─ Routes to InternalCA

4. Provider sees resource via APIBinding
   ├─ InternalCA workspace has APIBinding to Certificate APIExport
   └─ Certificate appears in provider workspace

5. Provider fulfills request
   ├─ KRO creates cert-manager Certificate
   ├─ cert-manager issues certificate
   └─ Stores in Secret

6. Secret syncs back through Virtual Workspace
   ├─ api-syncagent syncs from provider cluster to KCP
   ├─ Secret appears in consumer workspace
   └─ Consumer sees Certificate status: Ready + Secret reference

Use Cases

1. Multi-Tenant SaaS Platforms

Scenario: Different Backends Per Customer

Problem: Different customers need different database backends based on compliance, performance, or cost requirements.

Solution with Resource Broker:

# Customer A (GDPR compliance - needs EU data residency)
apiVersion: example.com/v1alpha1
kind: Database
metadata:
  name: customer-a-db
  labels:
    region: eu-central-1
spec:
  engine: postgres
  storage: 100Gi

# Broker routes to EU-based provider automatically

Providers:

  • EU Provider: Accepts region: eu-*
  • US Provider: Accepts region: us-*
  • On-Prem Provider: Accepts deployment-model: on-premises

Customer Experience: Same API, automatic routing based on labels/fields

2. Disaster Recovery & Migration

Scenario: Provider Failover

Problem: Need to migrate databases from failing provider to healthy one with minimal downtime.

Solution:

apiVersion: broker.platform-mesh.io/v1alpha1
kind: Migration
metadata:
  name: failover-migration
spec:
  from:
    name: production-db
    clusterName: failing-provider
  to:
    name: production-db
    clusterName: healthy-provider

Timeline:

  • 00:00 - Migration started
  • 00:15 - Initial data copy complete (15 min)
  • 00:16 - Cutover (30 seconds downtime)
  • 00:20 - Finalize and cleanup
  • Total Downtime: 30 seconds (vs. hours traditional)

3. Cost Optimization

Scenario: Environment-Based Routing

Problem: Want to use cheaper providers for dev/staging, expensive enterprise providers for production.

Solution:

# Dev/Staging: Free Let's Encrypt
kind: AcceptAPI
metadata:
  name: letsencrypt-ca
spec:
  gvr:
    resource: certificates
  filters:
  - key: metadata.labels.environment
    valueIn: ["dev", "staging"]

---
# Production: Paid Enterprise CA with SLA
kind: AcceptAPI
metadata:
  name: enterprise-ca
spec:
  gvr:
    resource: certificates
  filters:
  - key: metadata.labels.environment
    valueIn: ["production"]

Cost Savings: ~70% reduction by routing non-prod to free providers

4. Hybrid Cloud

Scenario: Unified API Across Cloud and On-Prem

Problem: Need to support both on-premises PKI and cloud-based certificate authorities.

Solution: Single Certificate API, multiple providers

Certificate API (Generic)
├─ InternalCA (on-prem PKI) → *.internal.corp
├─ Let's Encrypt (cloud) → *.corp.com
└─ DigiCert (enterprise) → *.public.corp.com

Developer Experience: Same kubectl apply -f certificate.yaml regardless of target provider


Technical Details

Technology Stack

Component Technology Version
Language Go 1.21+
Framework controller-runtime v0.22+
API Machinery k8s.io/apimachinery v0.35+
KCP Integration github.com/kcp-dev/kcp Latest
Testing Ginkgo + Gomega -
Build Docker + make -

Repository Structure

resource-broker/
├── api/
│   ├── broker/v1alpha1/          # Core CRD definitions
│   │   ├── acceptapi_types.go
│   │   ├── migration_types.go
│   │   └── migrationconfiguration_types.go
│   └── example/v1alpha1/          # Example resource types
├── cmd/                           # Operator binary entry points
│   └── main.go
├── pkg/                           # Core operator logic
│   ├── controllers/               # Reconciliation loops
│   │   ├── acceptapi_controller.go
│   │   └── migration_controller.go
│   └── routing/                   # Provider selection logic
├── config/                        # Kubernetes manifests
│   ├── crd/                       # CRD YAML files
│   ├── rbac/                      # RBAC permissions
│   └── manager/                   # Operator deployment
├── examples/
│   └── kcp-certs/                 # Certificate brokering example
│       ├── README.md              # Step-by-step walkthrough
│       └── setup.sh
├── contrib/kcp/                   # KCP integration components
├── dist/chart/                    # Helm chart for deployment
└── test/                          # Test suites
    ├── e2e/                       # End-to-end tests
    └── integration/               # Integration tests

Comparison with Alternatives

vs. Crossplane

Feature Resource Broker Crossplane
Primary Focus Dynamic routing & zero-downtime migration Infrastructure provisioning (IaC)
Provider Model KCP workspaces + APIExport Provider packages
Migration Support Built-in with staged migrations Manual (requires custom logic)
Multi-Tenancy Native (KCP integration) Requires additional setup
Abstraction Level High (generic APIs, filters) Medium (compositions)
Use Case Multi-tenant SaaS, frequent migrations GitOps IaC, cloud provisioning

When to Choose

Resource Broker: Multi-tenant platforms, zero-downtime requirements, dynamic provider selection

Crossplane: Single-tenant, GitOps workflows, cloud infrastructure provisioning

vs. Service Catalog

Feature Resource Broker Service Catalog (DEPRECATED)
Status Active development Deprecated (archived)
Routing Dynamic, policy-based Static broker selection
Migration Automated with stages Not supported
KCP Integration Native None
Filter Matching Advanced (suffix, valueIn, boundaries) Basic plans

Deployment

Prerequisites

# Required components
- Kubernetes 1.28+
- KCP v0.30.0+
- kubectl 1.28+
- Helm 3.12+ (optional, for Helm installation)

Helm Installation

# Add Helm repository (when available)
helm repo add platform-mesh https://platform-mesh.github.io/helm-charts
helm repo update

# Install resource-broker
helm install resource-broker platform-mesh/resource-broker \
  --namespace platform-mesh-system \
  --create-namespace \
  --set image.tag=latest

# Verify installation
kubectl get deploy -n platform-mesh-system resource-broker
kubectl get crd | grep broker.platform-mesh.io

Manual Installation

# Clone repository
git clone https://github.com/platform-mesh/resource-broker.git
cd resource-broker

# Install CRDs
make install

# Deploy operator
make deploy IMG=platform-mesh/resource-broker:latest

# Verify CRDs installed
kubectl get crd | grep broker.platform-mesh.io
# Expected output:
#   acceptapis.broker.platform-mesh.io
#   migrations.broker.platform-mesh.io
#   migrationconfigurations.broker.platform-mesh.io

# Check operator running
kubectl get pods -n resource-broker-system

Configuration

Operator Environment Variables

Configure Resource Broker behavior via deployment environment variables:

env:
- name: ENABLE_MIGRATION
  value: "true"                   # Enable migration features
- name: RECONCILE_INTERVAL
  value: "30s"                    # Reconciliation frequency
- name: LEADER_ELECTION
  value: "true"                   # Enable HA leader election
- name: METRICS_ADDR
  value: ":8080"                  # Prometheus metrics endpoint
- name: HEALTH_PROBE_ADDR
  value: ":8081"                  # Health/readiness probes
- name: LOG_LEVEL
  value: "info"                   # Logging level (debug, info, warn, error)

Monitoring & Observability

Prometheus Metrics

Exposed Metrics

Resource Broker exposes Prometheus metrics on port 8080 by default:

# AcceptAPI availability
resource_broker_acceptapi_available{name="internal-ca", namespace="provider"} 1

# Active migrations
resource_broker_migrations_active{state="CutoverInProgress"} 3
resource_broker_migrations_total{state="Completed"} 45

# Provider selection timing
resource_broker_routing_duration_seconds{provider="internal-ca"} 0.05

# Reconciliation metrics
resource_broker_reconcile_duration_seconds{controller="acceptapi"} 0.12
resource_broker_reconcile_errors_total{controller="migration"} 2

Health Checks

# Liveness probe
curl http://localhost:8081/healthz
# Returns: 200 OK if operator is alive

# Readiness probe
curl http://localhost:8081/readyz
# Returns: 200 OK if operator is ready to serve requests

# Leader election status
kubectl get lease -n platform-mesh-system resource-broker

Logging

# View operator logs
kubectl logs -n platform-mesh-system deployment/resource-broker -f

# Filter for specific events
kubectl logs -n platform-mesh-system deployment/resource-broker | grep -i "routing\|migration"

# Enable debug logging
kubectl set env -n platform-mesh-system deployment/resource-broker LOG_LEVEL=debug

Troubleshooting

Common Issues

Resources Not Routed to Provider

Symptom

Consumer creates resource but it never gets fulfilled. Resource stays in Pending state.

Debugging:

# Check AcceptAPI status
kubectl get acceptapi -A
kubectl describe acceptapi internal-ca -n provider-workspace

# Verify filters match resource
kubectl get acceptapi internal-ca -n provider-workspace -o yaml | grep -A 10 filters
kubectl get certificate my-cert -o yaml | grep -A 10 spec

# Check resource-broker routing logs
kubectl logs -n platform-mesh-system deployment/resource-broker \
  | grep -i "routing\|filter\|provider selection"

Common Causes:

  • Filters don't match resource properties
  • AcceptAPI not in Available state
  • Provider workspace not bound to AcceptAPI APIExport
  • Virtual Workspace not accessible to Resource Broker

Migration Stuck in InitialInProgress

Symptom

Migration starts but never progresses past InitialInProgress state.

Debugging:

# Check migration status
kubectl get migration db-migration -o yaml

# Check stage conditions (CEL expressions)
kubectl get migration db-migration -o jsonpath='{.status.conditions}' | jq .

# Check MigrationConfiguration success conditions
kubectl get migrationconfiguration db-config -o yaml | grep -A 5 successConditions

# View migration controller logs
kubectl logs -n platform-mesh-system deployment/resource-broker \
  | grep -i "migration\|stage\|success"

Common Causes:

  • Success conditions never satisfied (CEL expression always false)
  • Backup/copy job failed
  • Target provider unavailable
  • Insufficient permissions

AcceptAPI Not Available

Symptom

AcceptAPI CR shows Available: False in conditions.

Debugging:

# Check provider workspace APIBinding
kubectl get apibinding -n provider-workspace

# Verify AcceptAPI APIExport exists in platform
kubectl get apiexport acceptapi -n platform-workspace

# Check if Resource Broker can see AcceptAPI (via Virtual Workspace)
kubectl get acceptapi --all-namespaces

# Check provider cluster connectivity
kubectl get clusters -n provider-workspace

Development

Building from Source

# Clone repository
git clone https://github.com/platform-mesh/resource-broker.git
cd resource-broker

# Build operator binary
make build

# Run unit tests
make test

# Run integration tests
make test-integration

# Run locally (against current kubeconfig)
export KUBECONFIG=~/.kube/config
make run

Running Examples

# Run KCP certificate brokering example
cd examples/kcp-certs

# Setup clusters and install components
./setup.sh

# Follow step-by-step walkthrough
cat README.md

# Cleanup
./cleanup.sh

Why Document Resource Broker Now?

Architecture Decisions:

  • Design CloudAPI with abstraction layers
  • Avoid tight coupling to specific cloud providers
  • Use resource models compatible with future brokering

API Design:

  • Include label/annotation fields for future policies
  • Support generic resource specs (not provider-specific)
  • Design for eventual provider abstraction

Example - Good CloudAPI Design (Phase 1, but broker-ready):

apiVersion: cloudapi.example.com/v1alpha1
kind: VirtualMachine
metadata:
  name: web-server
  labels:
    environment: production  # Future: Route to premium providers
    region: eu-central-1     # Future: GDPR-compliant providers only
spec:
  # Generic specs, not AWS-specific
  cpu: 4
  memory: 16Gi
  storage: 100Gi
  # Provider field optional - can be auto-selected later
  provider: aws  # Explicit now, automatic in Phase 3

  • Repository: https://github.com/platform-mesh/resource-broker
  • KCP Project: https://github.com/kcp-dev/kcp
  • Platform Mesh: https://github.com/platform-mesh
  • Examples: https://github.com/platform-mesh/resource-broker/tree/main/examples

Project Status

Alpha Maturity

Resource Broker is in alpha stage (v1alpha1). API may change, production use should be carefully evaluated.

Activity Indicators:

  • 360 commits
  • 7 open issues
  • 6 pull requests
  • Active development
  • Apache 2.0 License