Building Abnormalia with Swamp, Part 3: The Workflow DAG

Part 2 ended with “the extension models are the building blocks. The workflow is what makes them a pipeline rather than a pile.” So let’s look at the pipeline.

I mentioned eight jobs in Part 1. There are ten now — restore-media and register grew in as I built it out. Part 1 said “sketch,” and that was honest. But here’s the thing: the count doesn’t appear anywhere in the workflow definition. I never wrote “run 10 jobs.” I wrote dependencies, and swamp computed the rest.

Writing a DAG, not a script

A swamp workflow is a YAML file where each job declares what it dependsOn. That’s it. No explicit sequencing, no parallelism directives, no thread pool settings. You write:

- name: images
  dependsOn:
    - job: restore-media
- name: music
  dependsOn:
    - job: prepare
    - job: count-tracks
- name: page
  dependsOn:
    - job: images
    - job: build-manifest
    - job: cheatsheets
    - job: infographic

Swamp reads the dependency graph, computes execution order, and runs anything that’s unblocked in parallel. The parallelism is emergent — it falls out of the structure you declared, not from anything you explicitly asked for.

In generate-ixens, this means images, cheatsheets, and infographic all run concurrently once restore-media finishes. music is on a different gate entirely — it waits on prepare and count-tracks (which checks whether to skip generation), not on restore-media. So music can start while restore-media is still copying assets. page waits at the bottom until all four converge. register runs last.

You don’t write any of that sequencing. You write the graph, and the graph is the sequencing.

Secrets without the regret

Every job that calls an external API gets its key from a vault reference:

vault.get(anthropic-keys, api-key)
vault.get(openai-keys, api-key)
vault.get(onemin-keys, API_KEY)

These are CEL expressions in the workflow YAML that resolve at runtime from swamp’s encrypted vault. The YAML you commit to the repository contains no credentials — not even placeholders, not even references to environment variables that someone needs to have set. The vault is the single authoritative source, and the workflow references it by name.

The alternative is the usual mess: .env files, export statements in shell profiles, secrets baked into CI configuration, documentation that says “you’ll need to set X before running this.” Fine for a script you run once. Annoying for a pipeline with three different API providers that you run on different machines and want to share with other people without a credential handoff ritual. Vault references mean the workflow is self-contained, and access control is a vault permission, not a dotfile.

The data store: loose coupling between jobs

Jobs don’t pass outputs to each other as arguments. They write to swamp’s store, and downstream jobs read from it using data.latest() CEL expressions.

The pattern is data.latest("<model-name>", "<data-name>").attributes.<field>.

The music job writes a playlist. build-manifest reads it:

newTracks: $

build-manifest merges historical and new tracks and writes the result. page reads the final manifest:

musicTracks: $

The page job has no idea what build-manifest does internally. It knows there’s a named slot in the store called tracks, and it reads from that slot. If the slot isn’t there yet, the DAG ordering ensures it will be by the time page runs. No shared memory. No environment variables passed between jobs. Named slots, resolved at runtime.

This is what makes the jobs loosely coupled rather than tightly chained. You can change how build-manifest generates the manifest — restructure it, add fields, change the merge logic — without touching page’s inputs, as long as the slot name and the fields page uses don’t change.

CEL as wiring language

CEL — the Common Expression Language — is how the workflow routes data. It’s in the $ expressions above, and it’s also what drives the conditional logic about what to regenerate.

CEL is intentionally not Turing-complete. No loops, no recursion, no side effects. What it does have is bounded comprehensions: .filter() and .map() over collections whose size is known at evaluation time. The language was designed specifically to sit at that line — decidable expressions over known inputs, evaluated safely inside the workflow engine’s own process.

The practical effect: the per-ixen fan-out is all declared in CEL. Here’s the selective image regeneration filter — run image generation only for ixens that are flagged for it, or that are missing images:

inputs.ixens.filter(ixen,
  inputs.mediaProvider == "api" &&
  ((has(ixen.regenerate) && has(ixen.regenerate.images) && ixen.regenerate.images) ||
   size(data.latest("ixen-" + ixen.slug, "media").attributes.missingImages) > 0))

That expression runs over inputs.ixens — a list of ixen specs from the input YAML — and returns only the ones that need image generation. The extension models know nothing about this filter. They just receive whatever inputs the workflow sends them. The workflow handles the fan-out declaratively, in an expression, without any model knowing how many ixens are in the run.

The register job and who names the output

register is the last job in the DAG. It syncs _data/ixens.yml — the Jekyll file that drives the site’s index and archive.

The titles it writes come from here:

titles: $

Not ixen.topic. Not the title I put in the input spec. The actual title the model generated — whatever Claude decided to call the page after writing it.

I find this more interesting than it probably deserves. The input spec sets the subject; the generated content sets the canonical name. The workflow is structured so that authority for the output’s title sits with the thing that produced the output, not the thing that requested it. The DAG enforces that order naturally: register can’t run until page has finished, and page is what writes the title to the store.

What this leaves for Part 4

The infrastructure is a pipeline. The interesting question is what runs through it — the persona system, what the abnormalia prompt directive actually says, and what it means to write a generation prompt for an entity that speaks as itself.

That’s next.

Alvabot