Skip to content

baldur.core.exceptions — Exception Hierarchy

Top-level re-export selection rule: a name is exported by baldur/__init__.py iff it is (a) a domain base class, or (b) a leaf class raised by code reachable from a top-level public surface. Everything else stays in baldur.core.exceptions and is reachable via the nested import path.

When-to-catch (13 top-level exceptions)

Exception When raised Recovery hint
BaldurError Catch-all base Last-resort logging + alert
AdapterError Adapter base (Redis/SQL/Kafka misuse) Reconnect; fall back to in-memory
AdapterNotFoundError ProviderRegistry.resolve(...) finds no match Wire the missing adapter
CircuitBreakerError CB-base — domain-specific subclasses Do NOT retry — non-retryable
DLQError DLQ-base Inspect DLQ entry; manual replay if needed
DLQReplayError ReplayService.replay(...) could not complete Mark for manual review
ResilienceError Resilience-pattern base Domain-specific fallback
RetryExhaustedError All retry attempts failed Send to DLQ; alert on SLA breach
TimeoutPolicyError protect(timeout=...) exceeded Cancel downstream; degrade
RateLimitExceeded @rate_limit rejected the call Surface 429; honor reset_at
IdempotencyDuplicateError @idempotent detected a duplicate Return cached result; do NOT retry
DomainValidationError Domain-tag validation rejected the call Caller bug — fix the domain identifier
ConfigurationError Settings or wiring misconfiguration Crash fast — operator must fix env

Nested-only classes

exceptions

Baldur library exception hierarchy.

All library exceptions inherit from BaldurError. Callers can use except BaldurError to catch any library error.

Top-level re-export selection rule

A name in core/exceptions.__all__ is re-exported by baldur/__init__.py iff it is either (a) a domain base class, or (b) a leaf class raised by code reachable from a top-level public surface (protect, decorators, ...).

Re-exported at baldur top-level (12 names): Bases — BaldurError, AdapterError, CircuitBreakerError, DLQError, ResilienceError, ConfigurationError Leaves — AdapterNotFoundError, RetryExhaustedError, TimeoutPolicyError, RateLimitExceeded, IdempotencyDuplicateError, DLQReplayError

Internal / nested-only (baldur.core.exceptions): AdapterInitializationError, AdapterConnectionError, RecoveryAdapterError, StoreError, CircuitBreakerTransitionError, InvalidStateTransitionError, DLQEntryNotFoundError, AuditError, RunbookError, SettingsValidationError, StepExecutionError, StepTimeoutError, CompensationError, ConcurrencyConflictError.

BaldurError

BaldurError(message: str = '', *, code: str = '')

Bases: Exception

Base exception for all baldur library errors.

extra_context

extra_context() -> dict[str, Any]

Return structlog-bindable context. Override in subclasses.

AdapterError

AdapterError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for adapter-related errors.

AdapterNotFoundError

AdapterNotFoundError(
    message: str = "",
    *,
    adapter_type: str = "",
    adapter_name: str = "",
    code: str = ""
)

Bases: AdapterError

Raised when a requested adapter is not registered in ProviderRegistry.

AdapterInitializationError

AdapterInitializationError(
    message: str = "", *, code: str = ""
)

Bases: AdapterError

Raised when an adapter fails to initialize.

AdapterConnectionError

AdapterConnectionError(
    message: str = "", *, code: str = ""
)

Bases: AdapterError

Raised when an external system connection fails.

RecoveryAdapterError

RecoveryAdapterError(
    message: str = "",
    *,
    service_name: str = "",
    replicas: int | None = None,
    namespace: str = "",
    code: str = ""
)

Bases: AdapterError

Raised when the recovery adapter encounters an error.

Used for input validation and workload detection failures.

StoreError

StoreError(message: str = '', *, code: str = '')

Bases: AdapterError

Base exception for domain state store errors.

Used by domain-specific stores (ConfigHistoryStore, CanaryRolloutStore, ChaosExperimentStore, CrossClusterStore) for data corruption or store-level logic errors.

Infrastructure failures (Redis down) are handled transparently by ResilientStorageBackend's silent fallback — this exception covers cases where the store itself encounters an unrecoverable problem (e.g. unparseable data, schema mismatch).

CircuitBreakerError

CircuitBreakerError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for circuit breaker errors.

CircuitBreakerTransitionError

CircuitBreakerTransitionError(
    message: str = "", *, code: str = ""
)

Bases: CircuitBreakerError

Raised when a circuit breaker state transition fails.

InvalidStateTransitionError

InvalidStateTransitionError(
    message: str = "",
    *,
    current: str = "",
    target: str = "",
    entity_id: str = "",
    code: str = ""
)

Bases: BaldurError

Raised when an invalid state transition is attempted.

Follows the same pattern as CircuitBreakerTransitionError but is domain-agnostic (Recovery Session, etc.).

DLQError

DLQError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for DLQ (Dead Letter Queue) errors.

DLQEntryNotFoundError

DLQEntryNotFoundError(message: str = '', *, code: str = '')

Bases: DLQError

Raised when a DLQ entry is not found.

DLQStateConflictError

DLQStateConflictError(message: str = '', *, code: str = '')

Bases: DLQError

Raised when a DLQ operation violates an entry's state precondition.

Covers resolved/archived/at-cap/not-in-replayable-state conflicts (e.g. a double-click force-redrive, or a retry of an already-resolved entry). Maps to HTTP 409 Conflict at the handler layer, distinct from a not-found (404) or an unexpected replay-execution failure (500).

DLQReplayError

DLQReplayError(message: str = '', *, code: str = '')

Bases: DLQError

Raised when a DLQ replay operation fails.

ResilienceError

ResilienceError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for resilience pattern errors (bulkhead, hedging, retry).

RetryExhaustedError

RetryExhaustedError(message: str = '', *, code: str = '')

Bases: ResilienceError

Raised when all retry attempts are exhausted.

TimeoutPolicyError

TimeoutPolicyError(
    timeout_seconds: float, message: str = ""
)

Bases: ResilienceError

Raised when a protected call exceeds its timeout budget.

RateLimitExceeded

RateLimitExceeded(
    message: str = "",
    *,
    key: str = "",
    limit: int = 0,
    window_seconds: int = 0,
    reset_at: int = 0
)

Bases: ResilienceError

Raised by @rate_limit when a call is rejected by the limiter.

Function-level rejection signal — distinct from RateLimitStorageError (storage-backend failure).

IdempotencyDuplicateError

IdempotencyDuplicateError(
    message: str = "",
    *,
    key: str = "",
    domain: str = "",
    decision: str = ""
)

Bases: BaldurError

Raised by @idempotent on a detected duplicate or in-flight collision.

Inherits BaldurError directly (correctness contract, not a resilience stage). Non-retryable by default — outer @dlq_protect layers should treat this as a terminal signal.

IdempotencyUnavailableError

IdempotencyUnavailableError(
    message: str = "", *, key: str = "", error: str = ""
)

Bases: BaldurError

Raised when an idempotency check cannot complete due to a cache I/O failure (e.g. Redis unreachable) on an enabled, explicitly-requested gate.

Distinct from IdempotencyDuplicateError (a successful dedup verdict): this signals the verdict is unknown, so the caller can assume neither "safe to skip" nor "safe to run". Fail-closed by default — opt into fail-open via BALDUR_IDEMPOTENCY_FAIL_OPEN_ON_CACHE_ERROR or the per-call idempotency_fail_open=True. Wraps the original cache exception (raised from it) so a backend-specific error never leaks across the boundary.

DomainValidationError

DomainValidationError(
    message: str = "",
    *,
    original_domain: str = "",
    reason: Any = None
)

Bases: BaldurError

Raised when a domain input string fails validation.

Carries the original (pre-normalization) input and a typed reject reason for downstream logging / metric labelling.

Modeled on RecoveryAdapterError: raised at validation sites that have a loud failure mode (decoration-time, where a CI/dev surface can recover via test or rename). Runtime APIs catch this and fall back to FALLBACK_DOMAIN instead of propagating.

AuditError

AuditError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for audit-related errors (cascade, WAL, mmap buffer).

RunbookError

RunbookError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for runbook errors.

ConfigurationError

ConfigurationError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for configuration and settings errors.

SettingsValidationError

SettingsValidationError(
    message: str = "", *, code: str = ""
)

Bases: ConfigurationError

Raised when settings validation fails.

StepExecutionError

StepExecutionError(message: str = '', *, code: str = '')

Bases: BaldurError

Base exception for step execution engine errors.

Shared by Saga, Runbook, and other step-based execution engines. Domain-specific subclasses live in each service module.

StepTimeoutError

StepTimeoutError(
    step_type: str | float = "",
    timeout_seconds: float | int = 0,
    message: str = "",
)

Bases: StepExecutionError

Step execution timed out.

Raised by TimeoutExecutor when a handler does not complete within timeout_seconds.

Supports two call conventions

StepTimeoutError(step_type="X", timeout_seconds=30) # keyword StepTimeoutError("X", 30) # positional (legacy compat) StepTimeoutError(timeout_seconds=30) # timeout only (TimeoutExecutor)

CompensationError

CompensationError(message: str = '', *, code: str = '')

Bases: StepExecutionError

Raised when step compensation fails.

Domain-specific subclasses (SagaCompensationError, RunbookCompensationError) add service-specific extra_context.

ConcurrencyConflictError

ConcurrencyConflictError(
    message: str = "",
    *,
    entity_id: str = "",
    expected_version: int = 0,
    actual_version: int = 0
)

Bases: BaldurError

Raised on optimistic concurrency control (OCC) conflicts.

Covers version CAS failures in Saga, Runbook, and other engines.

non_retryable_exceptions

non_retryable_exceptions() -> tuple[type[Exception], ...]

Exceptions that must never be retried.

CircuitBreakerError: CB OPEN means 'stop sending traffic'. Retrying defeats circuit breaker semantics. Industry standard (Hystrix, Resilience4j, Polly).