Workflows — definitions, publishing, running, bindings¶

A workflow is the atom of the Axon system: a universal, typeless container ("Magic Box"). This manual: how a programmer writes a definition, how it's published, how a manager binds connectors, how an operator runs a process and manages a run. The source of truth is WORKFLOW-ARCHITECTURE.md + ARCHITECTURE-V6.md + the code; on disagreement — the canon/code wins.

1. What it is and why¶

A workflow is the atom (VISION invariant 2): everything in Axon is a workflow, at any scale, fractally nestable (SubFlows). Magic Box: a workflow is a universal container; there are no workflow types — the execution strategy is inferred from the config shape, and execution_mode decides whether there are children:

execution_mode = leaf — no children; runs its own steps / agent_config / planner_config (one of three leaf strategies, by config shape);
execution_mode = graph — children only; orchestrates child workflows by a DAG;
execution_mode = hybrid — pre_steps → children graph → post_steps.

Authoring vs Runtime (strictly separated): the developer writes a WorkflowSpec (authoring Pydantic, IDE autocomplete) in modules/workflows/<name>/definition.py; axon push (= the publish_definition command) splits the file into an execution payload + metadata and stores an immutable snapshot in workflow_definitions.definition. The runtime (Temporal workers, Console, API) knows nothing about the filesystem — it reads only PostgreSQL. The Workflow row (a running instance) is created later by a separate create_workflow / start_workflow — push itself runs nothing and activates no trigger/schedule/ambient.

PostgreSQL = source of truth, Temporal = orchestration. Workers are forbidden to scan folders.

Three notions not to confuse:

Notion	What it is	Where it lives	Table
Definition	the blueprint of a business process (this is what a developer writes)	`modules/workflows/` → DB	`workflow_definitions`
Workflow	a running instance of the blueprint (processing a specific email)	DB + Temporal	`workflows`
Template	an anonymized snapshot for the marketplace (no private keys)	DB	`module_catalog`
Catalog Entry	a global card for a reusable definition for search/install (not the SoT for runtime)	DB	`workflow_definition_catalog`

2. Roles and access¶

From core/models/auth.py. The full matrix — Roles-And-Permissions.md.

Action	Permission	owner	admin	manager	operator	reviewer	read_only	system
Create / edit a workflow	`create_workflow` / `edit_workflow`	✅	✅	✅	❌	❌	❌	❌
Publish a definition (`axon push`)	`publish_definition`	✅	✅	✅	❌	❌	❌	❌
Run a workflow (Run Wizard)	`start_workflow`	✅	✅	✅	✅	❌	❌	—
Pause / resume / cancel a run	`pause_workflow` / `resume_workflow` / `cancel_workflow`	✅	✅	✅	❌	❌	❌	❌
Retry / replan a run	`retry_workflow` / `replan_workflow`	✅	✅	❌	❌	❌	❌	❌
Configure a binding profile	`workflow:configure_bindings`	✅	✅	✅	❌	❌	❌	❌
Collections / labels (UI navigation)	`manage_workflow_classification`	✅	✅	❌	❌	❌	❌	❌
Catalog: install into a project	`catalog:install`	✅	✅	✅	❌	❌	❌	❌
Catalog: publish/archive	`catalog:manage`	✅	✅	❌	❌	❌	❌	❌
Replay	`replay`	✅	✅	❌	❌	❌	❌	❌
Approve / reject a step (`human_gate`, `external_write`)	`approve` / `reject_approval`	✅	✅	✅	❌	✅	❌	❌
Pin a run's binding snapshot	`workflow_run:pin_bindings`	❌	❌	❌	❌	❌	❌	✅*

* pin_workflow_run_bindings — system-only, runs as a sub-step of start_workflow (actor workflow_start_dispatcher); workflow_binding:invalidate is reserved (no command). manager has workflow CRUD/lifecycle except retry_workflow/replan_workflow (those are owner/admin) and manage_workflow_classification/replay (also owner/admin).

The "code ↔ Console" boundary (engineer): WorkflowDefinition/WorkflowSpec, agent/prompt definitions, custom activities are written in code (modules/workflows/, git) and deployed; publishing (publish_definition) — via Console/API with a manager+ role; bindings/running/managing runs — in the Console. engineer is not an RBAC role (see Roles-And-Permissions.md §1).

3. Where it is in Console¶

The Workflows sidebar section (+ Catalog under Administration).

Screen	What's on it
Definition list	published `WorkflowDefinition`s (name, `definition_key`, version/`definition_id`, status, collections/labels filters)
Definition (detail)	steps/graph (visualized with React Flow), `ConnectorRequirement[]`, trigger, versions; buttons: open Bindings, Run Wizard
Bindings (binding profile)	the list of the definition's `ConnectorRequirement` slots → pick a connector instance for each (`Auto-map` / manual); profile status (`incomplete` / `complete` / `auto_configured`)
Run Wizard	pick a `case` (for case-aware) → `input_data` → optionally override bindings (incl. multi-credential choice) → `Create + Start`
Run list	running `workflows` (status, definition, case, step progress, cost)
Run (detail)	steps (timeline: status, duration, tokens, cost), artifacts, current step, pending approvals; actions: Pause / Resume / Cancel / Retry / Replan
Catalog	instance-wide cards for reusable definitions; `Install` into a project (latest-by-default); for admin/owner — `Publish` / `Archive`

The Console reads only read.* projections and writes via CommandEnvelope; it never touches app.* directly. Collections/labels are UI navigation only (they don't affect execution semantics).

4. Concepts (mental model)¶

4.1 The Step model¶

Each step is a Step model (core/models/workflow_definitions.py) with a unique step_key (a string). Branching is via on_success / on_failure (references by key), not by numeric indices. Terminal routing: on_success=None → the workflow completed; on_failure=None → failed. For condition: on_success = the path on True, on_failure = on False.

The canonical step_type taxonomy (13):

step_type	Purpose	side_effect	where it runs
`load_context`	load the workflow/task context from the DB	none	activity-side
`retrieve_knowledge`	query the knowledge base	none	activity-side
`model_call`	an LLM call with structured output (Instructor)	none	activity-side
`tool_call`	call a registered tool/activity (or a connector — see §4.3)	varies	activity-side
`transform`	transform data: `extract` / `merge` / `format` / `map` (implemented); `compute` — planned future	none	activity-side
`validate`	validation (schema / semantic)	none	activity-side
`human_gate`	a human approval point (`assignee_role`, `deadline_seconds`, `on_timeout`, `decisions`)	none	activity-side
`external_write`	write to an external system through the policy gate	external_high	activity-side
`finalize`	finalization, preparing the result	internal	activity-side
`condition`	if/else: `on_success` (true) / `on_failure` (false)	none	workflow-side
`wait`	Temporal sleep (`duration_seconds`) and/or wait for a signal (`wait_for_signal`); zero resources while waiting	none	workflow-side
`subflow`	start a child workflow (fractal nesting); the runtime validates the pinned `definition_id`, the depth limit, propagates the budget + cancel cascade	none	workflow-side
`fan_out`	process a collection: N parallel child workflows per item (`source_key`, `child_definition`, `max_parallel`, `on_item_failure` ∈ `skip`/`fail_all`/`retry`)	none	workflow-side

The runtime partition (core/workflows/step_runner/types.py): the 9 activity-side step types run via the Temporal activity execute_workflow_step; the 4 workflow-side ones run inside WorkflowRunner._run_steps() via Temporal primitives. subflow is a first-class step type, not a tool_call hack.

How the StepRunner executes a step: stagnation check (>N transitions → StagnationError; the limit comes from ExecutorProfile.max_steps, default 50) → step-type validation (executor profile) → tool allowlist check (for tool_call) → policy gate (if side_effect_class ∈ {external_low, external_high}) → budget check (if model_call) → execute the handler → handle interrupts (human_gate/external_write → awaiting_approval) → record the result (timing, tokens, cost).

Interrupt lifecycle (human_gate / external_write): the handler → StepResult(interrupt_type="awaiting_approval") → the run is in awaiting_approval → waits for the Temporal signal approval_decided → on resume the StepRunner continues from the current step (not on_success) → after approve the handler re-executes → completed → routing by on_success. Approvals — see Approvals.md.

4.2 The execution_mode + leaf-strategy magic¶

WorkflowDefinition
  ├─ execution_mode = leaf   →  one leaf strategy (by config shape):
  │       steps:  list[Step]            — predefined steps (sequential/branching)
  │       agent_config: AgentConfig     — an AI agent (model profile, tools, system prompt)  → see Agents-And-Prompts.md
  │       planner_config: PlannerConfig — a planner builds a DAG (+ optional reviewer_config)
  │     (ambiguous combos steps+agent_config / steps+planner_config / agent_config+planner_config → reject at materialization)
  ├─ execution_mode = graph  →  children: list[ChildWorkflowSpec] + child_dependencies + graph_config
  └─ execution_mode = hybrid →  graph_config.pre_steps → children graph → post_steps
  +  connector_requirements: list[ConnectorRequirement]   — named slots (what's needed: email/crm/...)

Magic Box — 10 patterns, one WorkflowRunner: Sequential · AI-enhanced (model_call) · Parallel/DAG (graph children) · Plan-based (planner + reviewer) · Ambient/Daemon (continue_as_new — an observer shell + a separate reaction workflow) · Wait+Signal (sleep+signal — days/months) · Schedule (a Temporal Schedule) · Batch/Fan-out (child workflows + a semaphore) · Saga/Compensation (a compensation chain — see Undo-And-Compensation.md) · Human Pipeline (signals+timers per gate). No special workflow types — only combinations of step_type inside the universal container.

4.3 Connector binding in step.config (Mode A/B/B-ref/C, D27 XOR)¶

A connector-backed tool_call references the binding via one top-level field in config (a sibling of tool_name/tool_args), D27 XOR — exactly one of: - connector_requirement — Mode A (Stage 3): authoring declares a named slot; a reviewed binding profile links it to a project-scoped connector instance; - connector_instance_id — Mode B / B-ref (Stage 1/2): a literal cred_<ulid> (Stage 1: connector_instance_id == credential_id) OR a typed ConnectorInstanceBinding = {"$credential_ref": "input_data.<field>"} (Stage 2 Run Wizard light for multi-credential — the concrete credential is chosen at start_workflow); - connector_key — Mode C (legacy grandfathered; new publishes are rejected — D31 quarantine).

external_write with a connector mode is currently fail-closed (connector_mode_not_supported_for_external_write). Secrets are never in config; the runtime resolver resolves the credential via (project_id, credential_id) and decrypts the payload in worker memory — before policy/approval/audit/dispatch/side effect. Details — Connectors-Credentials.md §4.

4.4 The binding layer (D17/D36)¶

ConnectorRequirement[] (in the definition, written by the engineer) → WorkflowBindingProfile (app.workflow_binding_profiles; admin/manager once: slot → connector instance; review state) → on create/start_workflow the handler pins app.workflow_run_bindings_snapshot (D36, system). The definition is not mutated at start; the runtime reads the snapshot and materializes the concrete instance/credential. Three override layers at run start (D17): profile.bindings ← case.connector_overrides ← run.binding_overrides. See Connectors-Credentials.md §4.

4.5 Case-aware workflows¶

A case-aware workflow is bound to a case (the carrier of a business case). The entry step is load_context with a bounded projection invariant; publish_definition rejects case_aware=True without it. Details — Cases.md, CONCEPT-CASE-WORKFLOWS.md.

5. Flows: step-by-step scenarios¶

Flow 1 — Write and publish a definition (`engineer` → `manager+`)¶

(engineer, code) Create modules/workflows/<name>/ with a definition.py that exports definition: WorkflowSpec (steps, connector_requirement[], trigger, config). Optional: activities.py + __manifest__.py for local activities.
(engineer/CI) axon push → the CLI sends an HTTP POST to app-api → the router builds a CommandEnvelope of type publish_definition → the handler validates (Mode XOR D27, the case_aware invariant, connector mode fields only for tool_call, new connector_key Mode C — reject) → an immutable snapshot into workflow_definitions.definition ({execution, metadata}), a pinned definition_id (wdef_<ulid>). push does NOT create a workflows row and does NOT activate any trigger/schedule/ambient.
On the first publish, if the project already has exactly one usable connector instance per required logical type → the binding profile is auto-created (status=complete, auto_configured=true, a "Review" banner in the Console).
Alternative (no code): Install a reusable definition from the Catalog (catalog:install, manager+) — latest-by-default; a name collision → 409 + rename.

Flow 2 — Configure a binding profile (`manager+`)¶

Console → Workflows → <definition> → Bindings.
The UI shows the ConnectorRequirement slots (name + logical type + required_actions).
Auto-map (ConnectorMapper proposes instances by type: a single candidate → auto; several → choice; none → "needs connecting") or pick a connector instance for each slot manually.
Save → set_workflow_bindings → profile status=complete. Set-once-use-many.

Flow 3 — Run a workflow for a case (`operator+`)¶

Console → Workflows → <definition> → Run Wizard.
Pick a case (for case-aware), set input_data; optionally override bindings for this run (incl. choosing the concrete credential for Mode B-ref slots).
Create + Start → the canonical path create_workflow(definition_id) → start_workflow(workflow_id) (a pinned definition_id, no latest-key fallback) → the pre-dispatch hook pins workflow_run_bindings_snapshot (D36, system) → Temporal starts the WorkflowRunner.
Execution: the StepRunner walks on_success/on_failure; on a connector step it resolves the instance → takes the credential → policy gate / approval / audit / idempotency → side effect; model_call → budget check; human_gate/external_write → awaiting_approval (the signal approval_decided).
Monitoring — Run (detail): the step timeline, tokens/cost, the current step, pending approvals.

Flow 4 — Managing a run¶

Action	Command	Permission	Effect
Pause	`pause_workflow`	`manager+`	the run is paused; no resources spent (Temporal)
Resume	`resume_workflow`	`manager+`	continue from where it stopped
Cancel	`cancel_workflow`	`manager+`	stop the run; for already-performed side effects — compensation (if defined) or a manual runbook (Undo-And-Compensation.md)
Retry	`retry_workflow`	`owner`/`admin`	retry the failed step/run
Replan	`replan_workflow`	`owner`/`admin`	for plan-based: rebuild the DAG (planner + reviewer again)
Replay	`replay`	`owner`/`admin`	a deterministic replay (debugging/recovery)

Flow 5 — Running via event/schedule/ambient¶

event — a connector (gmail/webhook/telegram) → app-api ingress → the canonical create_workflow/start_workflow (the reaction workflow is given by a pinned workflow_definition_id) → Temporal.
schedule — a Temporal Schedule runs the definition periodically (pattern 5.8).
ambient — an observer shell continuously monitors a source; on a condition it dispatches a reaction workflow by a pinned ID (pattern 5.6). Trigger/schedule/ambient are activated not on push but separately.

Flow 6 — Publish a definition to the Catalog (`admin`/`owner`)¶

Console → Catalog → Publish (catalog:manage) — an instance-wide card for a reusable definition; Archive removes it. Managers see it and can Install it into their projects. See Templates-And-Catalog.md.

6. Options reference¶

6.1 `WorkflowDefinition` / `WorkflowSpec` (key fields)¶

Field	What it does
`execution_mode`	`leaf` / `graph` / `hybrid`
`steps`	`list[Step]` — for the leaf step strategy
`agent_config`	`AgentConfig` — for the leaf agent strategy (model profile, tools, system prompt)
`planner_config` (+ `reviewer_config`)	for the leaf plan strategy (the planner builds a DAG, the reviewer reviews)
`children` / `child_dependencies` / `graph_config`	for `graph`/`hybrid` (a DAG of child workflows; `graph_config.pre_steps`/`post_steps` for hybrid)
`connector_requirements`	`list[ConnectorRequirement]` — named slots (logical type + `required_actions`); default `[]`
`case_aware`	bool; if `True` — the entry step must be `load_context` with a bounded projection
metadata (`WorkflowSpec`): `name`, `trigger`, `definition_key`, `project_id` hint, `config`	`definition_key` — the authoring/listing identity (not scope authority); if not set — derived from the ASCII name; non-ASCII → fail-closed

6.2 `Step` (fields)¶

Field	What it does
`step_key`	a unique string id; routing is by keys
`step_type`	one of 13 (see §4.1)
`config`	a dict; for `tool_call` — `tool_name`/`tool_args` + (optionally) one connector mode field (D27 XOR); for `transform` — `type`+params; for `condition` — `expression`/`language`; for `subflow` — `definition_id`; for `human_gate` — `assignee_role`/`deadline_seconds`/`on_timeout`/`decisions`; for `wait` — `duration_seconds`/`wait_for_signal`; for `fan_out` — `source_key`/`child_definition`/`max_parallel`/`on_item_failure`
`side_effect_class`	`none` / `internal` / `external_low` / `external_high` — gates policy/budget
`model_profile`	for `model_call` (e.g. `haiku`)
`timeout_seconds`	the step timeout
`on_success` / `on_failure`	references to `step_key`; `None` → terminal (`completed` / `failed`)
`compensation`	an activity for the rollback in a Saga compensation (called in reverse order for completed steps with side effects)
`fan_out`	the fan-out config (see §4.1)

6.3 Transform sub-types¶

extract (pick fields from a dict), merge (combine several dict sources), format (a template + data), map (field→field) — implemented. compute (a safe expression evaluator) — planned future, currently an unknown subtype = passthrough; for computations use condition/ConditionEvaluator or an explicitly registered activity.

6.4 Run states¶

running → paused (pause_workflow) / awaiting_approval (human_gate/external_write) / paused_budget (a budget interrupt) → back to running → completed / failed / cancelled. The definition is immutable throughout (pinned definition_id); changing the definition doesn't affect runs already in flight.

7. Lifecycle and maintenance¶

Definition versions. Every publish is a new immutable snapshot + a new definition_id (wdef_<ulid>). definition_key — a stable listing id; the pinned definition_id — what's used at runtime/replay (no latest-key fallback on the runtime/replay boundary). Runs already started pin their definition_id — changing the definition doesn't touch them.
push doesn't activate side effects. Trigger/schedule/ambient are enabled separately; push only stores the definition.
Run lifecycle. pause→resume; cancel (+ compensation/runbook); retry/replan/replay — owner/admin.
Catalog. Install — latest-by-default; a name collision → 409 + rename; Archive — remove the card; install creates a project-level runnable definition.
Stagnation guard. >N transitions (ExecutorProfile.max_steps, default 50) → StagnationError (loop protection).

8. Troubleshooting¶

Symptom	Cause	What to do
A workflow won't start — "binding profile incomplete"	not all `ConnectorRequirement` slots are mapped to usable instances	Bindings → `Auto-map` / pick instances; make sure instances are usable (D42) and the connectors are `Allowlist` ON (Connectors-Credentials.md §8)
`publish_definition` 422	Mode XOR (D27) violated in a connector-backed `tool_call`; or `connector_key` Mode C in a new publish (D31 quarantine); or `connector_mode_not_supported_for_external_write`; or `case_aware=True` without a `load_context` entry step	Keep exactly one connector mode field; for new code — Mode A (`connector_requirement`) or Mode B (`connector_instance_id`); move the connector call into a `tool_call`; add a `load_context` entry step for case-aware
A run stuck in `awaiting_approval`	waiting for approve, no approver with access	assign/find an approver (`approve` role with access to the project) — Approvals.md
A run in `paused_budget`	a budget interrupt on a `model_call`	raise/set the project budget (Budgets-And-Cost.md) → resume
`StagnationError`	a transition loop `A→B→A` longer than `max_steps`	review the routing / raise `max_steps` in the executor profile (deliberately)
A `subflow` fails "definition not found"	`config.definition_id` is not pinned / doesn't exist	give a pinned `wdef_*` (not a `definition_key`) — the runtime doesn't resolve latest-key
Catalog `Install` → 409	the definition name is already taken in the project	rename on install
`tool '<name>' not found in registry`	`tool_name` isn't resolvable via registry / generic dispatch / connector / sandbox	check the tool/activity registration or the correctness of the connector mode field

9. Constraints and invariants¶

A workflow is the atom (invariant 2): there are no workflow types (Magic Box); the execution strategy is by config shape, not by "type".
Undo is a fundamental right (invariant 3): every process has an inverse flow; to roll back side effects — a compensation chain / a runbook (Undo-And-Compensation.md).
PostgreSQL = SoT, Temporal = orchestration. Workers don't scan the filesystem; definitions reach the runtime only via publish_definition.
Authoring ≠ scope authority. The project_id in an authoring file is only a hint; the canonical scope comes from the CLI/API (CommandEnvelope.project_id); a mismatch is rejected.
A pinned definition_id on the runtime/replay boundary — no latest-key fallback (replay determinism).
Security and budget — before execution (invariants 4, 5): the policy gate (external_* steps) + the budget check (model_call) — before the side effect.
All mutations go via CommandEnvelope (publish_definition, create/start/pause/resume/cancel/retry/replan_workflow, binding commands, catalog commands); RBAC is server-side.
Secrets are not in the definition / profile / run snapshot — only in the encrypted credential payload; the runtime materializes them before the side effect.
subflow is a first-class step type (not a tool_call hack): budget propagation, cancel cascade, depth check.
Collections / labels are UI navigation only — they don't affect execution semantics, RBAC, or routing.

Connectors-Credentials.md — ConnectorRequirement[], connector mode fields, binding profiles, run snapshot.
Cases.md — case-aware workflows, the load_context entry step, identity keys.
Undo-And-Compensation.md — the compensation chain, cancelling/rolling back a run.
Agents-And-Prompts.md — agent_config, executor profiles, model_call vs agent execution.
Approvals.md — the human_gate/external_write interrupt lifecycle, who approves.
Templates-And-Catalog.md — Catalog publish/install, templates.
Roles-And-Permissions.md — the full RBAC matrix; First-Project-Walkthrough.md — where publish/run sit in the overall flow.
Canon: WORKFLOW-ARCHITECTURE.md §1-§5 (authoring/runtime, the Step model, the 10 patterns); ARCHITECTURE-V6.md §5-§6 (execution, graph), §11-§12 (connectors, Step Runner); CONCEPT-CASE-WORKFLOWS.md, CONCEPT-WORKFLOW-LABELS.md, CONCEPT-COMPENSATION-V2.md; core/models/workflow_definitions.py (Step), core/models/auth.py (RBAC).