Workflows — definitions, publishing, running, bindings¶
A workflow is the atom of the Axon system: a universal, typeless container ("Magic Box"). This manual: how a programmer writes a definition, how it's published, how a manager binds connectors, how an operator runs a process and manages a run. The source of truth is
WORKFLOW-ARCHITECTURE.md+ARCHITECTURE-V6.md+ the code; on disagreement — the canon/code wins.
1. What it is and why¶
A workflow is the atom (VISION invariant 2): everything in Axon is a workflow, at any scale, fractally nestable (SubFlows). Magic Box: a workflow is a universal container; there are no workflow types — the execution strategy is inferred from the config shape, and execution_mode decides whether there are children:
execution_mode = leaf— no children; runs its ownsteps/agent_config/planner_config(one of three leaf strategies, by config shape);execution_mode = graph— children only; orchestrates child workflows by a DAG;execution_mode = hybrid—pre_steps → children graph → post_steps.
Authoring vs Runtime (strictly separated): the developer writes a WorkflowSpec (authoring Pydantic, IDE autocomplete) in modules/workflows/<name>/definition.py; axon push (= the publish_definition command) splits the file into an execution payload + metadata and stores an immutable snapshot in workflow_definitions.definition. The runtime (Temporal workers, Console, API) knows nothing about the filesystem — it reads only PostgreSQL. The Workflow row (a running instance) is created later by a separate create_workflow / start_workflow — push itself runs nothing and activates no trigger/schedule/ambient.
PostgreSQL = source of truth, Temporal = orchestration. Workers are forbidden to scan folders.
Three notions not to confuse:
| Notion | What it is | Where it lives | Table |
|---|---|---|---|
| Definition | the blueprint of a business process (this is what a developer writes) | modules/workflows/ → DB |
workflow_definitions |
| Workflow | a running instance of the blueprint (processing a specific email) | DB + Temporal | workflows |
| Template | an anonymized snapshot for the marketplace (no private keys) | DB | module_catalog |
| Catalog Entry | a global card for a reusable definition for search/install (not the SoT for runtime) | DB | workflow_definition_catalog |
2. Roles and access¶
From core/models/auth.py. The full matrix — Roles-And-Permissions.md.
| Action | Permission | owner | admin | manager | operator | reviewer | read_only | system |
|---|---|---|---|---|---|---|---|---|
| Create / edit a workflow | create_workflow / edit_workflow |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
Publish a definition (axon push) |
publish_definition |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Run a workflow (Run Wizard) | start_workflow |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | — |
| Pause / resume / cancel a run | pause_workflow / resume_workflow / cancel_workflow |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Retry / replan a run | retry_workflow / replan_workflow |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Configure a binding profile | workflow:configure_bindings |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Collections / labels (UI navigation) | manage_workflow_classification |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Catalog: install into a project | catalog:install |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Catalog: publish/archive | catalog:manage |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Replay | replay |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Approve / reject a step (human_gate, external_write) |
approve / reject_approval |
✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| Pin a run's binding snapshot | workflow_run:pin_bindings |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅* |
*
pin_workflow_run_bindings— system-only, runs as a sub-step ofstart_workflow(actorworkflow_start_dispatcher);workflow_binding:invalidateis reserved (no command).managerhas workflow CRUD/lifecycle exceptretry_workflow/replan_workflow(those are owner/admin) andmanage_workflow_classification/replay(also owner/admin).
The "code ↔ Console" boundary (engineer): WorkflowDefinition/WorkflowSpec, agent/prompt definitions, custom activities are written in code (modules/workflows/, git) and deployed; publishing (publish_definition) — via Console/API with a manager+ role; bindings/running/managing runs — in the Console. engineer is not an RBAC role (see Roles-And-Permissions.md §1).
3. Where it is in Console¶
The Workflows sidebar section (+ Catalog under Administration).
| Screen | What's on it |
|---|---|
| Definition list | published WorkflowDefinitions (name, definition_key, version/definition_id, status, collections/labels filters) |
| Definition (detail) | steps/graph (visualized with React Flow), ConnectorRequirement[], trigger, versions; buttons: open Bindings, Run Wizard |
| Bindings (binding profile) | the list of the definition's ConnectorRequirement slots → pick a connector instance for each (Auto-map / manual); profile status (incomplete / complete / auto_configured) |
| Run Wizard | pick a case (for case-aware) → input_data → optionally override bindings (incl. multi-credential choice) → Create + Start |
| Run list | running workflows (status, definition, case, step progress, cost) |
| Run (detail) | steps (timeline: status, duration, tokens, cost), artifacts, current step, pending approvals; actions: Pause / Resume / Cancel / Retry / Replan |
| Catalog | instance-wide cards for reusable definitions; Install into a project (latest-by-default); for admin/owner — Publish / Archive |
The Console reads only
read.*projections and writes via CommandEnvelope; it never touchesapp.*directly. Collections/labels are UI navigation only (they don't affect execution semantics).
4. Concepts (mental model)¶
4.1 The Step model¶
Each step is a Step model (core/models/workflow_definitions.py) with a unique step_key (a string). Branching is via on_success / on_failure (references by key), not by numeric indices. Terminal routing: on_success=None → the workflow completed; on_failure=None → failed. For condition: on_success = the path on True, on_failure = on False.
The canonical step_type taxonomy (13):
| step_type | Purpose | side_effect | where it runs |
|---|---|---|---|
load_context |
load the workflow/task context from the DB | none | activity-side |
retrieve_knowledge |
query the knowledge base | none | activity-side |
model_call |
an LLM call with structured output (Instructor) | none | activity-side |
tool_call |
call a registered tool/activity (or a connector — see §4.3) | varies | activity-side |
transform |
transform data: extract / merge / format / map (implemented); compute — planned future |
none | activity-side |
validate |
validation (schema / semantic) | none | activity-side |
human_gate |
a human approval point (assignee_role, deadline_seconds, on_timeout, decisions) |
none | activity-side |
external_write |
write to an external system through the policy gate | external_high | activity-side |
finalize |
finalization, preparing the result | internal | activity-side |
condition |
if/else: on_success (true) / on_failure (false) |
none | workflow-side |
wait |
Temporal sleep (duration_seconds) and/or wait for a signal (wait_for_signal); zero resources while waiting |
none | workflow-side |
subflow |
start a child workflow (fractal nesting); the runtime validates the pinned definition_id, the depth limit, propagates the budget + cancel cascade |
none | workflow-side |
fan_out |
process a collection: N parallel child workflows per item (source_key, child_definition, max_parallel, on_item_failure ∈ skip/fail_all/retry) |
none | workflow-side |
The runtime partition (core/workflows/step_runner/types.py): the 9 activity-side step types run via the Temporal activity execute_workflow_step; the 4 workflow-side ones run inside WorkflowRunner._run_steps() via Temporal primitives. subflow is a first-class step type, not a tool_call hack.
How the StepRunner executes a step: stagnation check (>N transitions → StagnationError; the limit comes from ExecutorProfile.max_steps, default 50) → step-type validation (executor profile) → tool allowlist check (for tool_call) → policy gate (if side_effect_class ∈ {external_low, external_high}) → budget check (if model_call) → execute the handler → handle interrupts (human_gate/external_write → awaiting_approval) → record the result (timing, tokens, cost).
Interrupt lifecycle (human_gate / external_write): the handler → StepResult(interrupt_type="awaiting_approval") → the run is in awaiting_approval → waits for the Temporal signal approval_decided → on resume the StepRunner continues from the current step (not on_success) → after approve the handler re-executes → completed → routing by on_success. Approvals — see Approvals.md.
4.2 The execution_mode + leaf-strategy magic¶
WorkflowDefinition
├─ execution_mode = leaf → one leaf strategy (by config shape):
│ steps: list[Step] — predefined steps (sequential/branching)
│ agent_config: AgentConfig — an AI agent (model profile, tools, system prompt) → see Agents-And-Prompts.md
│ planner_config: PlannerConfig — a planner builds a DAG (+ optional reviewer_config)
│ (ambiguous combos steps+agent_config / steps+planner_config / agent_config+planner_config → reject at materialization)
├─ execution_mode = graph → children: list[ChildWorkflowSpec] + child_dependencies + graph_config
└─ execution_mode = hybrid → graph_config.pre_steps → children graph → post_steps
+ connector_requirements: list[ConnectorRequirement] — named slots (what's needed: email/crm/...)
Magic Box — 10 patterns, one WorkflowRunner: Sequential · AI-enhanced (model_call) · Parallel/DAG (graph children) · Plan-based (planner + reviewer) · Ambient/Daemon (continue_as_new — an observer shell + a separate reaction workflow) · Wait+Signal (sleep+signal — days/months) · Schedule (a Temporal Schedule) · Batch/Fan-out (child workflows + a semaphore) · Saga/Compensation (a compensation chain — see Undo-And-Compensation.md) · Human Pipeline (signals+timers per gate). No special workflow types — only combinations of step_type inside the universal container.
4.3 Connector binding in step.config (Mode A/B/B-ref/C, D27 XOR)¶
A connector-backed tool_call references the binding via one top-level field in config (a sibling of tool_name/tool_args), D27 XOR — exactly one of:
- connector_requirement — Mode A (Stage 3): authoring declares a named slot; a reviewed binding profile links it to a project-scoped connector instance;
- connector_instance_id — Mode B / B-ref (Stage 1/2): a literal cred_<ulid> (Stage 1: connector_instance_id == credential_id) OR a typed ConnectorInstanceBinding = {"$credential_ref": "input_data.<field>"} (Stage 2 Run Wizard light for multi-credential — the concrete credential is chosen at start_workflow);
- connector_key — Mode C (legacy grandfathered; new publishes are rejected — D31 quarantine).
external_write with a connector mode is currently fail-closed (connector_mode_not_supported_for_external_write). Secrets are never in config; the runtime resolver resolves the credential via (project_id, credential_id) and decrypts the payload in worker memory — before policy/approval/audit/dispatch/side effect. Details — Connectors-Credentials.md §4.
4.4 The binding layer (D17/D36)¶
ConnectorRequirement[] (in the definition, written by the engineer) → WorkflowBindingProfile (app.workflow_binding_profiles; admin/manager once: slot → connector instance; review state) → on create/start_workflow the handler pins app.workflow_run_bindings_snapshot (D36, system). The definition is not mutated at start; the runtime reads the snapshot and materializes the concrete instance/credential. Three override layers at run start (D17): profile.bindings ← case.connector_overrides ← run.binding_overrides. See Connectors-Credentials.md §4.
4.5 Case-aware workflows¶
A case-aware workflow is bound to a case (the carrier of a business case). The entry step is load_context with a bounded projection invariant; publish_definition rejects case_aware=True without it. Details — Cases.md, CONCEPT-CASE-WORKFLOWS.md.
5. Flows: step-by-step scenarios¶
Flow 1 — Write and publish a definition (engineer → manager+)¶
- (engineer, code) Create
modules/workflows/<name>/with adefinition.pythat exportsdefinition: WorkflowSpec(steps,connector_requirement[], trigger, config). Optional:activities.py+__manifest__.pyfor local activities. - (engineer/CI)
axon push→ the CLI sends an HTTP POST to app-api → the router builds aCommandEnvelopeof typepublish_definition→ the handler validates (Mode XOR D27, the case_aware invariant, connector mode fields only fortool_call, newconnector_keyMode C — reject) → an immutable snapshot intoworkflow_definitions.definition({execution, metadata}), a pinneddefinition_id(wdef_<ulid>).pushdoes NOT create aworkflowsrow and does NOT activate any trigger/schedule/ambient. - On the first publish, if the project already has exactly one usable connector instance per required logical type → the binding profile is auto-created (
status=complete,auto_configured=true, a "Review" banner in the Console). - Alternative (no code):
Installa reusable definition from the Catalog (catalog:install, manager+) — latest-by-default; a name collision → 409 + rename.
Flow 2 — Configure a binding profile (manager+)¶
- Console → Workflows →
<definition>→ Bindings. - The UI shows the
ConnectorRequirementslots (name + logical type +required_actions). Auto-map(ConnectorMapper proposes instances by type: a single candidate → auto; several → choice; none → "needs connecting") or pick a connector instance for each slot manually.- Save →
set_workflow_bindings→ profilestatus=complete. Set-once-use-many.
Flow 3 — Run a workflow for a case (operator+)¶
- Console → Workflows →
<definition>→ Run Wizard. - Pick a
case(for case-aware), setinput_data; optionally override bindings for this run (incl. choosing the concrete credential for Mode B-ref slots). Create + Start→ the canonical pathcreate_workflow(definition_id)→start_workflow(workflow_id)(a pinneddefinition_id, no latest-key fallback) → the pre-dispatch hook pinsworkflow_run_bindings_snapshot(D36, system) → Temporal starts theWorkflowRunner.- Execution: the StepRunner walks
on_success/on_failure; on a connector step it resolves the instance → takes the credential → policy gate / approval / audit / idempotency → side effect;model_call→ budget check;human_gate/external_write→awaiting_approval(the signalapproval_decided). - Monitoring — Run (detail): the step timeline, tokens/cost, the current step, pending approvals.
Flow 4 — Managing a run¶
| Action | Command | Permission | Effect |
|---|---|---|---|
| Pause | pause_workflow |
manager+ |
the run is paused; no resources spent (Temporal) |
| Resume | resume_workflow |
manager+ |
continue from where it stopped |
| Cancel | cancel_workflow |
manager+ |
stop the run; for already-performed side effects — compensation (if defined) or a manual runbook (Undo-And-Compensation.md) |
| Retry | retry_workflow |
owner/admin |
retry the failed step/run |
| Replan | replan_workflow |
owner/admin |
for plan-based: rebuild the DAG (planner + reviewer again) |
| Replay | replay |
owner/admin |
a deterministic replay (debugging/recovery) |
Flow 5 — Running via event/schedule/ambient¶
event— a connector (gmail/webhook/telegram) → app-api ingress → the canonicalcreate_workflow/start_workflow(the reaction workflow is given by a pinnedworkflow_definition_id) → Temporal.schedule— a Temporal Schedule runs the definition periodically (pattern 5.8).ambient— an observer shell continuously monitors a source; on a condition it dispatches a reaction workflow by a pinned ID (pattern 5.6). Trigger/schedule/ambient are activated not onpushbut separately.
Flow 6 — Publish a definition to the Catalog (admin/owner)¶
Console → Catalog → Publish (catalog:manage) — an instance-wide card for a reusable definition; Archive removes it. Managers see it and can Install it into their projects. See Templates-And-Catalog.md.
6. Options reference¶
6.1 WorkflowDefinition / WorkflowSpec (key fields)¶
| Field | What it does |
|---|---|
execution_mode |
leaf / graph / hybrid |
steps |
list[Step] — for the leaf step strategy |
agent_config |
AgentConfig — for the leaf agent strategy (model profile, tools, system prompt) |
planner_config (+ reviewer_config) |
for the leaf plan strategy (the planner builds a DAG, the reviewer reviews) |
children / child_dependencies / graph_config |
for graph/hybrid (a DAG of child workflows; graph_config.pre_steps/post_steps for hybrid) |
connector_requirements |
list[ConnectorRequirement] — named slots (logical type + required_actions); default [] |
case_aware |
bool; if True — the entry step must be load_context with a bounded projection |
metadata (WorkflowSpec): name, trigger, definition_key, project_id hint, config |
definition_key — the authoring/listing identity (not scope authority); if not set — derived from the ASCII name; non-ASCII → fail-closed |
6.2 Step (fields)¶
| Field | What it does |
|---|---|
step_key |
a unique string id; routing is by keys |
step_type |
one of 13 (see §4.1) |
config |
a dict; for tool_call — tool_name/tool_args + (optionally) one connector mode field (D27 XOR); for transform — type+params; for condition — expression/language; for subflow — definition_id; for human_gate — assignee_role/deadline_seconds/on_timeout/decisions; for wait — duration_seconds/wait_for_signal; for fan_out — source_key/child_definition/max_parallel/on_item_failure |
side_effect_class |
none / internal / external_low / external_high — gates policy/budget |
model_profile |
for model_call (e.g. haiku) |
timeout_seconds |
the step timeout |
on_success / on_failure |
references to step_key; None → terminal (completed / failed) |
compensation |
an activity for the rollback in a Saga compensation (called in reverse order for completed steps with side effects) |
fan_out |
the fan-out config (see §4.1) |
6.3 Transform sub-types¶
extract (pick fields from a dict), merge (combine several dict sources), format (a template + data), map (field→field) — implemented. compute (a safe expression evaluator) — planned future, currently an unknown subtype = passthrough; for computations use condition/ConditionEvaluator or an explicitly registered activity.
6.4 Run states¶
running → paused (pause_workflow) / awaiting_approval (human_gate/external_write) / paused_budget (a budget interrupt) → back to running → completed / failed / cancelled. The definition is immutable throughout (pinned definition_id); changing the definition doesn't affect runs already in flight.
7. Lifecycle and maintenance¶
- Definition versions. Every
publishis a new immutable snapshot + a newdefinition_id(wdef_<ulid>).definition_key— a stable listing id; the pinneddefinition_id— what's used at runtime/replay (no latest-key fallback on the runtime/replay boundary). Runs already started pin theirdefinition_id— changing the definition doesn't touch them. pushdoesn't activate side effects. Trigger/schedule/ambient are enabled separately;pushonly stores the definition.- Run lifecycle.
pause→resume;cancel(+ compensation/runbook);retry/replan/replay— owner/admin. - Catalog.
Install— latest-by-default; a name collision → 409 + rename;Archive— remove the card; install creates a project-level runnable definition. - Stagnation guard. >N transitions (
ExecutorProfile.max_steps, default 50) →StagnationError(loop protection).
8. Troubleshooting¶
| Symptom | Cause | What to do |
|---|---|---|
| A workflow won't start — "binding profile incomplete" | not all ConnectorRequirement slots are mapped to usable instances |
Bindings → Auto-map / pick instances; make sure instances are usable (D42) and the connectors are Allowlist ON (Connectors-Credentials.md §8) |
publish_definition 422 |
Mode XOR (D27) violated in a connector-backed tool_call; or connector_key Mode C in a new publish (D31 quarantine); or connector_mode_not_supported_for_external_write; or case_aware=True without a load_context entry step |
Keep exactly one connector mode field; for new code — Mode A (connector_requirement) or Mode B (connector_instance_id); move the connector call into a tool_call; add a load_context entry step for case-aware |
A run stuck in awaiting_approval |
waiting for approve, no approver with access | assign/find an approver (approve role with access to the project) — Approvals.md |
A run in paused_budget |
a budget interrupt on a model_call |
raise/set the project budget (Budgets-And-Cost.md) → resume |
StagnationError |
a transition loop A→B→A longer than max_steps |
review the routing / raise max_steps in the executor profile (deliberately) |
A subflow fails "definition not found" |
config.definition_id is not pinned / doesn't exist |
give a pinned wdef_* (not a definition_key) — the runtime doesn't resolve latest-key |
Catalog Install → 409 |
the definition name is already taken in the project | rename on install |
tool '<name>' not found in registry |
tool_name isn't resolvable via registry / generic dispatch / connector / sandbox |
check the tool/activity registration or the correctness of the connector mode field |
9. Constraints and invariants¶
- A workflow is the atom (invariant 2): there are no workflow types (Magic Box); the execution strategy is by config shape, not by "type".
- Undo is a fundamental right (invariant 3): every process has an inverse flow; to roll back side effects — a compensation chain / a runbook (Undo-And-Compensation.md).
- PostgreSQL = SoT, Temporal = orchestration. Workers don't scan the filesystem; definitions reach the runtime only via
publish_definition. - Authoring ≠ scope authority. The
project_idin an authoring file is only a hint; the canonical scope comes from the CLI/API (CommandEnvelope.project_id); a mismatch is rejected. - A pinned
definition_idon the runtime/replay boundary — no latest-key fallback (replay determinism). - Security and budget — before execution (invariants 4, 5): the policy gate (
external_*steps) + the budget check (model_call) — before the side effect. - All mutations go via CommandEnvelope (
publish_definition,create/start/pause/resume/cancel/retry/replan_workflow, binding commands, catalog commands); RBAC is server-side. - Secrets are not in the definition / profile / run snapshot — only in the encrypted credential payload; the runtime materializes them before the side effect.
subflowis a first-class step type (not atool_callhack): budget propagation, cancel cascade, depth check.- Collections / labels are UI navigation only — they don't affect execution semantics, RBAC, or routing.
10. Related manuals and canon¶
- Connectors-Credentials.md —
ConnectorRequirement[], connector mode fields, binding profiles, run snapshot. - Cases.md — case-aware workflows, the
load_contextentry step, identity keys. - Undo-And-Compensation.md — the compensation chain, cancelling/rolling back a run.
- Agents-And-Prompts.md —
agent_config, executor profiles,model_callvs agent execution. - Approvals.md — the
human_gate/external_writeinterrupt lifecycle, who approves. - Templates-And-Catalog.md — Catalog publish/install, templates.
- Roles-And-Permissions.md — the full RBAC matrix; First-Project-Walkthrough.md — where publish/run sit in the overall flow.
- Canon:
WORKFLOW-ARCHITECTURE.md§1-§5 (authoring/runtime, the Step model, the 10 patterns);ARCHITECTURE-V6.md§5-§6 (execution, graph), §11-§12 (connectors, Step Runner);CONCEPT-CASE-WORKFLOWS.md,CONCEPT-WORKFLOW-LABELS.md,CONCEPT-COMPENSATION-V2.md;core/models/workflow_definitions.py(Step),core/models/auth.py(RBAC).