Dossier · Internal docs
Internal · Engine Story

Dossier Engine

The Insight

Every regulated industry has the same problem: structured forms that need to be filled with data that already exists somewhere. A bankruptcy attorney types the debtor's name into Form 101, then types the same name into Schedule A/B, Schedule D, Schedule E/F, the Statement of Financial Affairs, and every other form in the package. The data exists — it was entered once. The routing is what's missing.

Dossier solves this with three ideas:

  1. A shared vocabulary (Schema) — Every data point in a domain gets a canonical name. debtor1.first_name is the debtor's first name, regardless of which form asks for it, regardless of which source provided it.

  2. Bindings that route data to forms — Each form declares how its PDF fields map to schema keys. Enter debtor1.first_name once, and bindings carry it to every form that needs it — across the 69 federal forms in a bankruptcy package.

  3. Data sources that are interchangeable — Manual entry, credit report XML, a case management API, or an LLM reading a pay stub all produce the same thing: schema key = value pairs. The binding engine doesn't care where the data came from.

The engine is the schema + binding resolver + expression engine + PDF filler. It knows nothing about law, bankruptcy, insurance, tax, or any specific domain. All domain knowledge is externalized into three JSON artifacts: schemas, forms (with bindings), and data sources.


How It Works

Data Flow

The Schema sits at the center. DataSources write to it (inbound), Bindings read from it (outbound). Every interaction — manual or automated — produces the same artifact: an Entry.

External data (XML, API, CSV, PDF, manual entry)
    ↓
DataSource (parse rules + field mapping)
    ↓
Schema keys (the shared vocabulary)
    ↓
Entry on Case (batch of key=value with source + timestamp)
    ↓
Binding engine (resolves entries → form fields)
    ↓
Filled PDF

Fill Once, Populate Everywhere

The debtor's first name is entered once. The binding engine routes it to every form:

debtor1.first_name
    → Form 101, field "Debtor1.First name"
    → Schedule A/B, field "Debtor 1 First name"
    → Schedule D, field "Debtor 1 Name"
    → Schedule E/F, field "Debtor 1 Name"
    → Statement of Financial Affairs, field "Debtor 1"
    → Declaration, field "Name of Debtor"
    → ...every form in the package that needs it

This works for every data point. Creditor names cascade to every schedule. Addresses appear on every form that asks. Social security numbers are masked where required. One schema key, many targets.

Any Source, Same Result

All data sources produce the same thing: an Entry with schema key = value pairs. Each entry also carries a raw payload — the full source data before mapping — with values being the schema-mapped subset derived from it.

Source What happens Entry
Manual entry Lawyer types 5 fields, clicks save 1 entry, 5 values, source="manual", auto-confirmed
Credit report MISMO XML parsed by DataSource config 1 entry, 80 values, source="credit-report", pending review
Case management API REST sync from Clio/MyCase 1 entry, 12 values, source="case-mgmt", auto-confirmed
Document upload LLM reads a pay stub, maps to schema keys 1 entry, 5 values, source="pay-stub", pending review
Manual correction Lawyer fixes a creditor name from the credit report 1 entry, 1 value, overrides the credit report's value

Current state = merge all confirmed entries by timestamp (latest wins per key). The case activity feed shows every entry — what changed, where it came from, when. Both entries stay in the timeline; the correction doesn't erase the original.

Forms Are Recursive

A form can be a leaf (has a PDF with extractable fields), a composite (groups other forms and routes data between them), or both. Composites contain composites — a Chapter 7 Individual package contains a Petition group, which contains Form 101, Declaration, and other leaf forms.

Chapter 7 Individual (composite)
├── Petition (composite)
│   ├── Form 101 — Voluntary Petition (leaf, 155 fields)
│   ├── Form 101A — Initial Statement (leaf, conditional)
│   └── Declaration (leaf)
├── Schedules (composite)
│   ├── Schedule Summary (leaf)
│   ├── Schedule A/B — Property (leaf)
│   ├── Schedule C — Exemptions (leaf)
│   ├── Schedule D — Secured Creditors (leaf)
│   ├── Schedule E/F — Unsecured Creditors (leaf)
│   ├── Schedule G — Executory Contracts (leaf)
│   ├── Schedule H — Co-debtors (leaf, conditional)
│   ├── Schedule I — Income (leaf)
│   └── Schedule J — Expenses (leaf)
├── SOFA — Statement of Financial Affairs (leaf)
├── Means Test — Form 122A (leaf)
└── ...other forms

Bindings flow downward only. A composite form's binding can reference any descendant. Cross-form bindings (e.g., income from Schedule I flows to the Means Test) live on the common ancestor. Never up, never sideways.

Expression Engine

Bindings can do more than simple routing. The expression engine supports 34 Excel-style functions:

  • String: CONCAT(first, ' ', last), JOIN(' ', first, middle, last) (skips blanks — replaces TRIM(CONCAT(...))), FORMAT_NAME('debtor' [, 'no-middle', 'no-suffix', 'last-first']), FORMAT_ADDRESS('debtor.address' [, 'multiline', 'with-county', ...]), UPPER(name), RIGHT(acct, 4), SUBSTITUTE(phone, '-', ''), SPLIT(text, delimiter), LEFT, LOWER, TRIM, LEN
  • Math: SUM(line1, line2), ROUND(amount, 2), MAX(income, 0), MIN, ABS, COUNT(creditors)
  • Logical: IF(joint, debtor2_name, ''), AND(employed, income > 0), OR, NOT, IN(type, 'Secured', 'Mortgage'), ISBLANK(value), ANY_FILLED(a, b, c) / ALL_BLANK(a, b, c) (replaces the NOT(ISBLANK(a)) || … chain), COALESCE(a, b, c, …)
  • Date: TODAY(), YEAR(filed_date), MONTH(opened), DAY
  • Array: INDEX(array, n), LOOKUP(array, keyField, keyValue, returnField)
  • Reference: REF_LOOKUP(table_name, ...keys) — read from an in-process registry (Census median income, IRS standards, etc.) populated at server boot

String literals use single quotes. The tokenizer accepts both '...' and "..." but single quotes are canonical (JSON-friendly inside .bindings.json); a corpus-wide codemod has normalized everything to the single-quote form.

Array iteration syntax lives at the parser level, not as functions:

  • creditors[0].name — element-at-index
  • creditors[].lien_value — projection across every element
  • creditors[? classification == 'secured'].lien_value — JMESPath-style predicate filter
  • SUM / COUNT / MAX / MIN flatten one level so SUM(creditors[? classification == 'secured'].lien_value) works directly

Cross-form bindings flow through resolveFiling(forms, schemaValues, options?) — topologically-sorted single-pass resolution backed by dependency-graph.ts. The resolver builds a static dependency graph over the filing's bindings (each $formKey.field / $.field ref becomes an edge from reader to producer), topo-sorts it, and evaluates every binding exactly once in order. Each form's resolved fields are then exposed under $formKey.field (cross-form) and $.field (self-ref). Single-pass resolve() only sees the schema-values map; any binding referencing a sibling field needs resolveFiling(). The server's listFormFields endpoint always goes through resolveFiling() — there is no single-form fallback, so sibling forms' draft_overrides propagate through cross-form refs automatically. Two observability hooks are available on FilingResolveOptions: onParseError({ expression, error }) fires once per unique unparseable expression (the silent-OR trap surfaces here); onCycleError({ cycles }) fires synchronously at graph-build time with the SCCs of participating binding ids (<formKey>::<index>) — cycle members are skipped from the walk rather than evaluated. The old fixed-point loop, maxPasses cap, and onNonConvergence callback are gone.

Binding shape. Every binding is a flat { source, target, when?, note? }. Iteration is expressed at the expression level: an array-projection source (creditors[].name) fans out to one write per element, and target templates may contain {expression} holes evaluated against the iteration scope. There's no kind: 'repeat' and no expansion step — the resolver dispatches on whether the source evaluates to an array.

Two loop variables are bound for array sources:

  • i — 0-based index.
  • e — the current array element (a scalar when the source projected one field, otherwise the whole element).
// Inclusive range over creditors. Target template uses {i + 1} to label
// rows 1, 2, 3, … because i starts at 0.
{
  "source": "plan.secured.maintenance[].creditor_name",
  "target": "$.Name_{i + 1}"
}

// Blank-first underscore suffix — common in PDF forms where the unsuffixed
// field is the first row and "_2", "_3", … follow.
//   "Name", "Name_2", "Name_3", …
{
  "source": "plan.secured.maintenance[].creditor_name",
  "target": "$.Name{IF(i == 0, '', CONCAT('_', i + 1))}"
}

// Explicit string slots — array-literal subscript renders the slot label.
// "Street address 1b Debtor 1", "...1c...", "...1d..."
{
  "source": "sofa.prior_addresses[].address.street",
  "target": "$.Street address 1{['b','c','d'][i]} Debtor 1"
}

// Non-contiguous numeric slots (question number IS the suffix).
{
  "source": "sofa.environmental_issues[].site_name",
  "target": "$.q{[23,24,25][i]}_site_name"
}

// Multi-target fan-out: one source ⇒ two PDF fields per element.
{
  "source": "plan.secured.maintenance[].creditor_name",
  "target": ["$.s3a_r{i + 1}_creditor", "$.s3b_r{i + 1}_creditor"]
}

// `when` can gate per-element using `e`:
{
  "source": "creditors[]",
  "target": "$.Active_{i + 1}",
  "when":   "e.amount > 0"
}

Template holes are arbitrary expressions — anything the parser accepts. Array literals (['a','b','c']) and computed subscripts (arr[i]) are first-class in the expression grammar. The template parser splits on {...} and feeds each hole through parse().

Checkbox value normalization. PDF AcroForms store checkbox state as a string export value ('Yes' / 'On' / 'Off' / 'Yes_2'). normalizeOverrides(overrides, fields) coerces these to true / false for fields with type: 'checkbox' before the resolver sees them. Bindings now use $.check5 == true (or bare $.check5 — JS truthiness on a real boolean does the right thing). The older $.check5 == 'yes' style has been migrated across the corpus by domain_tools/scripts/migrate-checkbox-bool-comparisons.mjs.

Editing model: three override tiers + a buffer

The Forms tab is the live editor. Edits cascade in the browser — resolveFiling and reference tables both ship to the client via @dossier/core and GET /api/cases/:id/reference-bundle. The Form lens no longer writes on keystroke; edits land in an in-memory buffer and are flushed by an explicit commit.

Precedence for a draft filing's field value (low → high):

  1. Schema-derived — value computed from confirmed entries via bindings.
  2. case_field_overrides — case-wide pin (per case + form + field). Every filing on the case sees it unless tier 3 shadows it.
  3. filing_forms.draft_overrides — per-filing pin (per filing + form + field). Wins for one filing only.
  4. Buffer — pending edits in the editor, not yet committed.

POST /api/cases/:id/draft/commit dispatches the buffer into three destination tiers — case-data writes to entries, case-fields writes to case_field_overrides, filing-override writes to filing_forms.draft_overrides. The client picks the tier per edit (default: case-fields for direct schema-key bindings, filing-override when the user explicitly pins to one filing). Stateless preview lives at POST /api/cases/:id/filings/:filingId/resolve; the per-filing override map ships from GET /api/cases/:id/filings/:filingId/override-map.

At file-time (status: 'draft' → 'filed'), snapshotOverridesForFiling runs the cross-form resolver once against the case's current schema values, merging case_field_overrides and draft_overrides on top, and freezes the full resolved-field map per form into filing_forms.snapshot (one entry per PDF field bound on that form). draft_overrides is cleared on the same write. Filed envelopes are then immutable: resolveFilingFormFields short-circuits to read the snapshot directly — no binding re-evaluation, no override layer — so subsequent case-level edits never bleed in.

Historical note: the column was renamed from overrides to snapshot when this changed. Pre-rename, the column held only the override values and the engine still recomputed schema-derived bindings live — a debtor-name edit after filing would silently rewrite the "Debtor Name" cell on the filed envelope. The full-resolved-map snapshot closes that gap. A one-shot backfill script (packages/server/src/scripts/backfill-snapshots.ts) re-resolves existing filed filings against current schema values; for cases whose data has drifted, the backfilled snapshot is "what the engine would produce today," not "what was produced at the original file-time."

Conditional bindings apply per-element via when:

{
  "source": "debtor2.first_name",
  "target": ["$b101.Debtor2.First name", "$b106ab.Joint Debtor Name"],
  "when":   "case.is_joint == true"
}

Schema Functions

Some computations are too tall for an expression. The bankruptcy means test reads a dozen schema keys, looks up Census median income by state + household size, computes IRS Local Standards housing/transportation deductions, and emits a verdict plus ~15 derived keys. Authoring that as one mega-expression collapsed under its own weight.

Schema functions are pure (data, ctx) => { values, trace } TS functions whose outputs surface as schema-keyed values to bindings. The signature lives in packages/core/src/functions/types.ts:

interface SchemaFunction {
  name:      string         // referenced by the schema's `x-computed` annotation
  produces:  SchemaKey[]    // keys it writes
  dependsOn: SchemaKey[]    // keys it reads — drives topo-order
  run:       (data, ctx) => SchemaFunctionResult
}

interface SchemaFunctionResult {
  values:       Record<SchemaKey, unknown>
  trace:        TraceStep[]   // line-by-line walk for the audit UI
  warnings?:    WarningEntry[]
  suggestions?: Suggestion[]
}

The registry in packages/core/src/functions/index.ts is an explicit named-import record — no factories, no decorator magic, grep finds every function. At server boot, validateSchemaFunctions cross-checks the schema: every x-computed annotation must point at a registered function, and every produces[] key must be annotated. The resolver pass (evaluateSchemaFunctions in packages/core/src/lib/case-values.ts) topo-sorts the registry by dependsOn ∩ produces, runs each function in order, and merges only declared produces keys back into the values map — extra keys from a buggy function are silently dropped rather than corrupting state.

Functions live next to mergeEntriesValues: the resolver pass runs before bindings, so binding sources can read computed keys as ordinary schema paths. The draft-commit + intake dispatchers reject writes against any x-computed key with schema/computed-readonly — operators see the value but can't shadow it with an entry.

The first wave of bankruptcy functions:

Function Writes Reason
meansTestCalculator means_test.{cmi, deductions, disposable_income, verdict} + ~10 sub-keys Form 122A logic — CMI, IRS Local Standards lookups, presumption math
exemptionResolver exemption.{system, claimed, surplus} per asset row State vs federal election, per-asset cap math against fed_exemptions / state_exemptions/*
ch13PlanCalculator plan.{distributions.*, feasibility.*} MAX-of-three-floor required payment, trustee → attorney → secured → priority → unsecured waterfall, feasibility verdict
partyFormatter parties.{debtor, joint_debtor}.{full_name, formatted_address} Collapses 105 inline FORMAT_NAME / FORMAT_ADDRESS wrappers across the corpus
scheduleTotalsResolver debts.totals.*, property.totals.*, expenses{,_joint}.totals.monthly_total Collapses 22 long classifier SUM expressions across the corpus

Schema functions also drive the audit disclosure UI: each TraceStep carries label, formula?, result, optional inputs (operand breakdown — +/ rows above a hairline, summary line below), detail, cite, and warning. The Plan and Means Test sub-tabs render this as an explained worksheet — section-closer rows (Subtotal / Total / Floor / verdict) get a stronger divider, single-row steps fall through to the existing pill. The reference-data tables are shipped to the client through GET /api/cases/:id/reference-bundle, so the trace renders fully client-side.

DataSource Configuration

A DataSource is a reusable recipe that turns external data into schema key values. Every DataSource config follows the same shape: inputs (source-specific parse config) and mappings (a Binding[] that maps raw fields → schema keys). The mappings reuse the same Binding primitive as form bindings — just inbound (raw → schema) instead of outbound (schema → form field).

{
  "name": "Credit Report (Bankruptcy)",
  "type": "credit-report",
  "schemaId": "bankruptcy.individual",
  "config": {
    "inputs": { "format": "mismo-2.3.1", "bureau": "tri-merge" },
    "mappings": [
      { "source": "CONCAT(borrowers[0].firstName, ' ', borrowers[0].lastName)", "targets": ["debtor1.full_name"] },
      { "source": "borrowers[0].ssn", "targets": ["debtor1.ssn"] }
    ]
  }
}

DataSource mappings are inbound (external field → schema key). Bindings are outbound (schema key → form field). Both use the same key vocabulary — same expression engine, same [] array syntax.

The schema is the clean boundary. DataSources don't know about PDF fields. Bindings don't know about APIs. Build the key picker once, use it everywhere.

Example Flows

Every path through the system follows the same pattern: something produces entries on a case, bindings route them to PDF fields.

A. Lawyer fills in debtor information and clicks save

Lawyer types 5 fields, saves → Entry (manual, 5 values, confirmed)
    → Bindings route each value to every form field that needs it
    → No DataSource needed. Source = "manual", auto-confirmed.

B. Credit report fills creditor schedules

MISMO XML → DataSource: credit-report (parse + classify + map)
    → Entry (credit-report, 80 values, pending review)
    → Bindings → Schedule D creditor rows, Schedule E/F creditor rows, ...
    → Lawyer reviews and confirms before values flow to PDFs

C. Case management sync fills debtor demographics

Clio REST API → DataSource: case-mgmt (field mapping)
    → Entry (case-mgmt, 12 values, confirmed)
    → Bindings → Form 101, Schedule A/B, Schedule I, ...
    → Same schema keys, same bindings, different source
    → If lawyer already typed the name manually, the API value doesn't override

D. Pay stub upload fills income

Pay stub PDF → DataSource: pay-stub (LLM extraction)
    → Entry (pay-stub, 5 values, pending review)
    → Bindings → Schedule I income fields
    → First time: lawyer confirms the LLM mapping (saved as reusable template)
    → Second time: auto-applied from the saved template

E. Lawyer corrects a value from the credit report

Fixes creditor name → Entry (manual, 1 value, confirmed)
    → Overrides the credit report's value for creditors.secured[0].name
    → Credit report entry still exists in timeline — activity feed shows what changed and why

F. Same data source, different domain (future)

Same MISMO XML → DataSource: credit-report-auto (different schema mapping)
    → Entry (credit-report, N values)
    → Different schema (insurance.auto.claim) → different bindings → insurance forms
    → The DataSource is scoped to a schema, not to forms

The Extraction Pipeline

Building a new domain follows a repeatable pipeline:

  1. Define the schema — Enumerate every data point in the domain. Give each a canonical key, type, label, and group.
  2. Process the forms — Take each government/standard PDF, extract its AcroForm fields (key, type, page, position, rect).
  3. Generate bindings — Map each PDF field to the appropriate schema key. This is where domain knowledge is captured: understanding that "Debtor1.First name" on Form 101 and "Debtor 1 Name" on Schedule D both mean debtor1.first_name.
  4. Compose form packages — Group leaf forms into composites (Petition, Schedules, Chapter 7 Package). Write cross-child bindings that route data between sibling forms.
  5. Configure data sources — Define how external data (credit reports, APIs, documents) maps to schema keys.

The result: populate a few schema keys from any source, and the binding engine cascades the values to every form that needs them.


What's Been Built

Bankruptcy Domain

Schemas:

  • bankruptcy.individual — ~1,100 schema keys covering debtor identity, income, expenses, assets, liabilities, creditors (secured, unsecured, priority), executory contracts, co-debtors, prior filings, and administrative data
  • bankruptcy.nonindividual — ~640 schema keys covering entity information, officers, revenue, assets, liabilities, and corporate-specific data
  • 12 shared administrative keys (case.*, attorney.*)

Forms processed:

  • 69 federal leaf forms — Every fillable PDF in the bankruptcy form set. Fields extracted, bindings generated, schema keys mapped.
  • Local forms from IL and GA — State-specific local bankruptcy forms processed with the same pipeline, extending the federal form set.
  • 19 composite forms — Chapter packages (Ch.7 Individual, Ch.7 Non-Individual, Ch.13 Individual, etc.), Petition group, Schedules group, and other logical groupings.

What the bindings encode:

The bindings are the captured domain knowledge. They encode:

  • debtor1.first_name appears on 40+ forms under different field names
  • Creditor arrays in the schema map to repeating table rows on Schedule D, E/F, and the creditor matrix
  • The means test (Form 122A) uses income values from Schedule I and expense values from Schedule J
  • Summary totals on Schedule A/B Sum aggregate values from individual schedules
  • Conditional forms are included or excluded based on case data (e.g., Schedule H only when case.has_codebtors == true)
  • Cross-form sync ensures the same creditor list stays consistent between Schedules and the Chapter 13 Plan

Engine implementation:

  • PDF field extraction (AcroForm field discovery with position and type metadata)
  • Expression engine: tokenizer → parser → AST → evaluator, 34 functions including JOIN/FORMAT_NAME/FORMAT_ADDRESS/ANY_FILLED/ALL_BLANK/REF_LOOKUP, AST caching, 7 AST node types (literal, reference, formRef, binary, call, path, arrayLiteral)
  • Flat binding shape: { source, target, when?, note? }. Array-source dispatch fans out to one write per element with i / e in scope; target templates carry {expression} holes via parseTemplate / renderTemplate. Checkbox value coercion at the read-side via normalizeOverrides(overrides, fields)
  • Parse-error surfacing via resolveFiling's onParseError callback + the validate-binding-syntax.mjs authoring-time gate (parses every source, when, and target template)
  • Array iteration at the parser level: arr[N] index, arr[expr] computed index, arr[] projection, arr[? predicate] filter; aggregators flatten one level; array literals (['a','b','c']) as first-class expressions
  • Binding resolver: single-pass resolve() for one form, topologically-sorted single-pass resolveFiling() for cross-form refs ($formKey.field) — static dependency graph, ~6.6× faster than the legacy fixed-point loop on a 40-form synthetic filing, byte-identical output against the full bankruptcy corpus; cycles surface via onCycleError (members skipped, non-cycle bindings still run)
  • Reference-data registry: REF_LOOKUP(table_name, ...keys) reads from in-process tables registered at server boot from the file-based layer under packages/core/src/reference-data/*.json — each file declares { name, description, source, lookupBy, rows }. loadReferenceTables(rootDir) walks the dir, builds wildcard-aware ('*') lookup tables, registers them through setReferenceTable. Means-test tables (Census median income, IRS National Standards, IRS Local Standards housing/utilities/transportation) plus fed_exemptions, state_exemptions/*, state_exemption_election, and trustee_fee_by_district all live here. The core build copies the JSON into dist/reference-data/ so the loader sees them at runtime
  • Field behavior: per-field visibleIf / requiredIf / readOnlyIf expressions evaluated against a per-form context that resolves the form's own bindings + overrides under $.<fieldKey> keys; supports schema paths (case.*), self-references in either dotted ($.Check_Box4) or bracketed ($[Check Box8], for PDF-native keys with spaces / dots / hyphens) form, and cross-form refs ($child.field / $child[field]). PDF-native flags.required / flags.readOnly and the conditional requiredIf / readOnlyIf layer additively (OR'd at runtime — both always preserved). Static dependency analysis builds a trigger map (gate → dependents) and detects logic-deadlock cycles at form-publish time.
  • Live behavior preview endpoint: client sends an in-memory snapshot of trigger fields, server returns booleans only — expressions never leave the server.
  • PDF filler (pdf-lib AcroFields) + multi-form export (merged PDF or ZIP)
  • DataSource framework — credit report parsing (live), CSV import, Clio API mapping config, pay-stub / document upload (extraction path scaffolded, LLM mapping planned)
  • Client portal (packages/portal) — tenant-branded intake app with embeddable widget (bubble, panel, full-page), dashboard for invited clients, and four published intake configs (Atlas, DebtStoppers, Greenfield, individual self-file)
  • Public intake + portal routes (/intake/:slug, /portal/:tenantSlug) — tokenized invites, multipart file uploads, rate-limited, write directly to entries on a case

Why It Generalizes

The engine has no concept of "law" or "bankruptcy." It knows five things:

  1. Schemas — vocabularies of typed data points with dotted key notation
  2. Forms — PDFs with extractable AcroForm fields, composable into packages
  3. Bindings — routes from schema keys to form fields, with expressions and conditions
  4. DataSources — recipes for importing external data into schema keys
  5. Entries — batches of key=value changes with source tracking

Swap the schema, forms, and bindings — you have a different domain. The engine, server, database, and API routes are unchanged.

Domain-Agnostic Infrastructure

The platform's infrastructure is fully abstract:

  • Database tables know nothing about law, insurance, or tax. They store schemas (JSONB entries), forms (JSONB fields/bindings), cases (JSONB references/dates), entries (JSONB values), and filings (JSONB snapshots).
  • ~45 API routes handle CRUD for cases, entries, filings, contacts, tasks, notes, events, billing, activity, attachments, data sources — all tenant-scoped and role-aware, all abstract.
  • Schema UI config shapes the client app per domain: status labels, party roles, reference fields, date fields, document checklists, event types, and the label for "Case" (which becomes "Claim" or "Return" or "Application" in other domains).
  • Case management (tasks, notes, calendar, billing, activity, attachments, contacts) is universal across all domains without modification.

What Changes Per Domain

To target... What you build Code changes
Another law type (immigration, family, PI) New schemas + forms + bindings + UI config None
Tax preparation New schemas + forms. Expression engine handles calculations. None (maybe DataSource for tax tables)
Insurance claims New schemas + forms + 1-2 tables for payouts/settlements Minimal — new routes for claim financials
Real estate closings New schemas + forms + 1-2 tables for escrow management Minimal — new routes for escrow
Healthcare credentialing New schemas + forms + 1 table for credential expiry Minimal — new routes for re-credentialing
Government permits New schemas + forms + 1-2 tables for inspection workflows Minimal — new routes for inspections

For any industry where structured forms need to be filled with data from a shared vocabulary, the engine works as-is. The only question is whether the domain needs concepts beyond the core model (cases, entries, filings, contacts, tasks, billing) — and if so, it's 1-2 new tables, not a rewrite.

Cross-Domain Concept Mapping

The core concepts translate directly across industries:

Dossier Concept Law Insurance Real Estate Tax Healthcare Government
Case Bankruptcy case Claim Transaction Return Provider app Permit app
Schema Data vocabulary (1,100 keys) Claim fields Transaction fields Tax data Provider info Application data
Form Court forms ACORD forms Closing docs IRS forms Credentialing apps Application forms
Filing Court filing Claim submission County recording IRS e-file Board submission Agency submission
Binding Schema → form fields Same Same Same Same Same
Validation Means test, schedule totals Coverage limits Loan-to-value Tax calculations License expiry Zoning compliance
Contact Debtor, Attorney, Trustee Claimant, Adjuster Buyer, Seller, Agent Taxpayer, CPA Provider, Payer Applicant, Inspector
DataSource Credit report, CSV Policy system MLS, title search W-2 import NPDB, license DB GIS, prior permits
Status workflow Intake → Filed → Discharged Reported → Settled → Closed Listed → Closing → Recorded Intake → Filed → Accepted Submitted → Approved → Enrolled Submitted → Approved → Issued

Concepts that DON'T exist in the core model but specific domains would need:

  • Insurance: Claim payout/settlement tracking (reserves, subrogation)
  • Real estate: Escrow management (trust accounting, disbursement)
  • Tax: Tax calculation engine (brackets, phase-outs — beyond simple expressions)
  • Healthcare: License/credential expiry tracking (recurring re-verification cycles)
  • Government: Multi-stage inspection workflows (sequential pass/fail gates)

None of these are needed for law verticals. Switching between bankruptcy, immigration, family law, PI, or estate — zero code changes.

Next Verticals

Ranked by volume and Dossier fit (detailed analysis in next-verticals.md):

  1. Immigration (10/10 fit) — 8-13M USCIS form receipts/year. ~100+ federal fillable PDFs. Same architecture: federal forms, fillable PDFs, schema → bindings → AcroFields. No local forms needed.

  2. Family Law (8/10 fit) — 1.5-2M matters/year. Very form-dense (15-30 forms per contested case). State-by-state build, but schema overlaps with bankruptcy (asset/debt inventories, financial disclosures).

  3. Eviction (7/10 fit) — 3.6M filings/year. Simple forms (3-7 per case) but massive volume. Good for bulk automation.

  4. Probate (7/10 fit) — 2.6M filings/year. Schema overlaps with bankruptcy (asset/debt inventories, creditor lists). State variation is the main cost.

  5. Workers' Compensation (6/10 fit) — 2.5M claims/year. IAIABC standards provide a natural schema. Attorney-side market is open.

For detailed cross-domain analysis including what translates directly, what needs schema config, and what needs new DB/server work, see domain-comparison.md.


Technical Reference

For complete type definitions, expression syntax, scoping rules, database schema, and ID conventions, see Domain Model Reference.

Source: docs/engine.md