Dispatch

Cluster

Distributed worker coordination, leader election, heartbeats, and work stealing.

The cluster package enables multiple Dispatch instances to coordinate work. One instance is the cluster leader at any time; the others are followers.

How it works

  1. Every Dispatch instance registers a cluster.Worker record at startup with its hostname, queues, and concurrency.
  2. Workers send heartbeats on HeartbeatInterval (default: 10s) to update LastSeen.
  3. The leader is determined by AcquireLeadership using optimistic locking in the store. Leadership is renewed before it expires.
  4. The leader periodically:
    • Fires due cron entries
    • Scans for stale workers (workers whose heartbeat has been missing longer than StaleJobThreshold)
    • Reclaims stale workers' in-flight jobs and re-enqueues them (work stealing)

Worker states

StateMeaning
activeHealthy and processing jobs
drainingFinishing in-flight jobs, not accepting new ones (shutdown)
deadStopped responding; in-flight jobs eligible for reassignment

Leadership

Only the leader performs:

  • Cron scheduling
  • Stale-job recovery
  • Work rebalancing

If leadership is lost mid-operation, dispatch.ErrLeadershipLost is returned. This is normal and safe to ignore — another worker will acquire leadership shortly.

Kubernetes consensus

For Kubernetes deployments, use cluster/k8s which implements leader election via the Kubernetes API (client-go):

import "github.com/xraph/dispatch/cluster/k8s"

leaderElector := k8s.NewLeaderElector(kubeClient, namespace, leaseName)
eng := engine.Build(d,
    engine.WithLeaderElector(leaderElector),
)

This ensures correct behavior across pod restarts and rolling updates.

Configuration

Config fieldDefaultMeaning
HeartbeatInterval10sHow often workers update LastSeen
StaleJobThreshold30sHow long before a worker is considered dead

On this page