Internal · Intake Architecture

Intake — Many Front Doors, One Artifact

Internal architecture reference for the Dossier dev team. Companion to engine.md; this is the zoomed-in view of the "Any Source, Same Result" section.

The invariant: every path produces an Entry

Every way data enters a Dossier case — a lawyer typing in the Data tab, a client filling a portal form on their phone, an MISMO XML credit report, a pay stub drop, a Clio sync, a CSV import — terminates at the same write: one row in the entries table. An Entry has case_id, source, optional data_source_id, user_id (invoker), timestamp, confirmed, values (JSONB), context (JSONB).

The binding engine reads the merged set of entries on a case and routes schema keys to PDF fields. It does not know, and does not care, where an entry came from. Intake variety is absorbed at the schema boundary. This is what lets us plug in new front doors — a new DataSource config is usually enough, with no changes to the server, the client, or the forms.

Path	`source` tag	Invoker	Review state
Manual entry by lawyer in client app	`manual`	authenticated user	auto-confirmed
Portal intake by client (full-page)	`intake` (+ `data_source_id` identifies slug)	public / portal user	pending review (`confirmed=false`)
Embedded widget on firm's own site	`intake` (same pipeline)	public user	pending review
Credit report XML upload	`credit-report`	user who uploaded	pending review
Doc drop (pay stub, bank statement)	`doc-upload` with subtype	user who uploaded	pending review (extraction pipeline scaffolded; mapping templates planned)
API sync (Clio)	`clio`	system	auto-confirmed per config
CSV import	`csv`	user	pending review

Type reference: Entry in packages/core/src/api/types.ts (lines 312–325). The invoker field is populated by a server join and is read-only on the wire.

Path A — Manual entry

The shortest path. The Data tab in the client app is read-only by default (per UX rule); edits start with an explicit "New Entry" action, which opens a form over the schema sections and writes one Entry on save.

Route: POST /api/cases/:id/entries (packages/server/src/routes/cases.ts)
Source tag: manual
Invoker: the signed-in user from the JWT
Confirmed: true on write

An Entry produced here is immediately visible to the binding engine, so the PDF preview in the split-pane editor updates on save. Corrections to any prior Entry — whether it came from a credit report, a portal submission, or another lawyer — are just new manual entries that append to the case's entry timeline.

Path B — Portal + embeddable widget

This is the biggest intake surface and the one most likely to grow. The portal is a real React app on port 4105:

packages/portal/src/
├── intake/
│   ├── intake-shell.tsx          full-page intake experience
│   ├── intake-modes.ts           grid | spread | wizard | chat | voice
│   ├── landing.tsx
│   └── modes/
│       ├── grid.tsx
│       ├── spread.tsx
│       ├── wizard/
│       ├── chat/                 conversational intake (active)
│       └── voice.tsx
├── pages/
│   ├── dashboard.tsx             invited-client dashboard
│   ├── login.tsx
│   └── complete.tsx
├── widget/
│   ├── widget-entry.tsx          embeddable entry point (Shadow DOM, CSS inlined)
│   ├── widget-bubble.tsx         floating chat-style bubble
│   └── widget-panel.tsx          side panel that hosts the intake mode
└── shared/
    ├── api.ts                    calls /intake/:slug and /portal/:tenantSlug
    ├── intake-resolver.ts        resolves tenant token → IntakeContext
    ├── theme.ts                  applies tenant branding (primary/accent colors)
    └── fallback-*.ts             offline defaults so the widget renders before fetch

Three surfaces

Full-page intake (intake-shell.tsx) — rendered at the tenant-branded portal URL. The client fills the whole intake in one session, optionally picks an intake mode from those the tenant allows (TenantBranding.allowedModes, with legacy allowChat / allowForm fallbacks — see packages/core/src/api/types.ts lines 34–40).
Dashboard (pages/dashboard.tsx) — for invited clients returning via an invite token. Shows progress, pending documents, pending questions.
Embeddable widget (widget/*) — firms drop a <script> onto their own website; the widget renders into a Shadow DOM (see cssText import in widget-entry.tsx) so the host page's CSS cannot bleed in. Three modes:
- Bubble — floating chat-style launcher (widget-bubble.tsx)
- Panel — side panel that hosts the chosen intake mode (widget-panel.tsx)
- Full — the same intake-shell served inline

All three surfaces call the same backend and produce the same artifact.

Server routes

Defined in packages/server/src/routes/intake.ts and packages/server/src/routes/portal.ts. All three routes go through checkIntakeRate(ip) from packages/server/src/services/rate-limit.ts — public endpoints, untrusted callers.

GET /portal/:tenantSlug[?intake=<slug>] — tenant bootstrap. Returns PortalBootstrap { tenant, intake | null } (see types.ts lines 56–72). Used by the portal SPA on first load to pick up branding + the default intake.
GET /intake/:slug — fetch an intake config. Returns { name, slug, schemaId, config, prefill }. If a ?token= is attached and resolves to a live invite whose dataSourceId matches the slug, the server prefills the response with confirmed values from the target case (loadCaseValues(invite.caseId) in services/intake.ts).
POST /intake/:slug — submit. Accepts application/json (values + context) or multipart/form-data (values + context + file parts). File parts are carried as IntakeUploadFile { fieldname, filename, mimeType, buffer } and written into storage/attachments/<caseId>/ with a UUID prefix. See readSubmissionBody in routes/intake.ts and writeAttachmentFiles in services/intake.ts.

IntakeConfig shape

IntakeConfig is the config payload on a DataSource row whose type = 'intake'. From packages/core/src/api/types.ts lines 235–256:

interface IntakeQuestion {
  key: string                 // schema key this answer writes to
  prompt?: string
  hint?: string
  options?: Array<{ value: string; label: string }>
  condition?: string          // expression evaluated against current values
  kind?: 'field' | 'upload'
  uploadCategory?: string     // category for the attachments row
}

interface IntakeSection {
  id: string
  title: string
  description?: string
  condition?: string
  sections?: IntakeSection[]  // recursive
  questions?: IntakeQuestion[]
}

interface IntakeConfig { sections: IntakeSection[] }

Four published intake configs live under domains/bankruptcy/data-sources/:

intake-atlas.json — Atlas firm's new-matter intake
intake-debtstoppers.json — DebtStoppers branded intake
intake-greenfield.json — Greenfield branded intake
intake-individual-self.json — self-file individual chapter 7 intake

Each is a single JSON document with slug, tenantSlug, schemaNamespace, published, and a nested config.sections tree. These are the IP — the lawyer's expertise about what to ask, in what order, conditioned on what — externalized into data. Adding a new intake flow means adding a JSON, not writing code.

Tokenized invites

An attorney sends a client an intake link of the form /intake/<slug>?token=<token>. Behaviors:

GET with token: resolves the invite via findInviteByToken(token) (services/intake.ts lines 219–242), validates used_at and expires_at, then prefills the response with merged confirmed values from the target case. The client sees the questions with their existing answers already populated.
POST with token: writes the new Entry against the invite's case_id (not a new case), marks the invite used_at = NOW(), writes an intake activity row. See submitInviteIntake in services/intake.ts lines 183–217.

Without a token, POST creates a brand-new case in the target tenant (submitPublicIntake, lines 138–176). The tenant is resolved via IntakeRecord.tenantId, falling back to a seeded dev tenant when the intake is platform-level. A cases row is INSERTed with status = 'Intake' and a name derived from debtor1.first_name + debtor1.last_name (see deriveCaseName); the Entry, attachments, and an activity row follow.

From submission to Entry

Client fills portal form (values) + uploads (files)
      ↓
POST /intake/:slug (multipart)
      ↓
readSubmissionBody → IntakeSubmission { values, context, files }
      ↓
submitPublicIntake | submitInviteIntake
      ├── writeAttachmentFiles → storage + attachments rows (category from question.uploadCategory)
      ├── INSERT INTO entries (case_id, source='intake', data_source_id=<slug>, confirmed=false, values, context)
      └── INSERT INTO activity ('intake', 'Intake submitted via <slug>', {dataSourceId, attachments})

The Entry is confirmed=false. It doesn't flow through the binding engine to PDFs until a reviewer confirms it from the Data tab. Context captured: UTM params, referrer, user-agent, invite token.

AI chat in the portal

The portal's chat intake mode lives at packages/portal/src/intake/modes/chat/ (engine, UI, types). Recent commits 8c20180 feat(portal): richer chat header + AI avatars, align seed with branding and 9d2ddd6 client uses server indicate this surface is under active work. From an architectural standpoint it's just another intake mode: the chat engine collects values and passes them to the same submitIntake call that the grid and wizard modes use. It produces the same Entry as every other path.

Path C — Document upload (credit report, pay stub, etc.)

Credit report (live)

The credit-report DataSource parses MISMO 2.3.1 XML (tri-bureau merge) and emits one Entry with roughly 80 values. The mapping is documented in domains/bankruptcy/data-sources/credit-report-mapping.md — every CREDIT_LIABILITY becomes a creditor, classified into creditor.secured[] or creditor.unsecured[] based on _AccountType and collateral.

Source tag: credit-report
Invoker: the user who uploaded
Confirmed: false — the reviewer must confirm before values flow to Schedule D / Schedule E/F
Volume: ~80 schema keys per submission, mostly repeating creditor rows

What the credit report provides: creditor identity, balance, account number, account type, date opened, collateral description. What it does not provide (from credit-report-mapping.md): secured/unsecured/priority legal classification, contingent/unliquidated/disputed flags, collateral value, priority type, who-owes (debtor1/debtor2/joint) beyond Individual/Joint/AuthorizedUser. Those require attorney judgment and arrive as follow-on manual entries.

Generic doc drop (scaffolded)

Pay stubs, bank statements, 1099s, leases. Current state:

The attachment is uploaded through POST /api/cases/:id/attachments (multipart) and lands in the attachments table.
An extraction step is planned but not implemented end-to-end: the LLM-based mapper that reads a pay stub and proposes an Entry with debtor1.income.wages, debtor1.employer, etc. exists as a design in engine.md (Example Flow D) and as a scaffolded path in our DataSource model — the code path from attachment to reviewable Entry for arbitrary documents is not yet wired in production.
Design intent: the first time an attorney confirms an extraction, the field mapping is saved as a reusable template keyed on document shape. Subsequent uploads of the same document type auto-apply. This is also planned, not built.

The invariant holds regardless: whenever the extraction path completes, its output is a doc-upload Entry that gets reviewed like any other. The PDF never sees unconfirmed values.

Path D — API sync

One mapping spec lives in domains/bankruptcy/data-sources/, fully documented but not yet wired to a running external API in this codebase.

Clio — `clio-api-mapping.md`

Clio is a general practice-management platform, not a bankruptcy system. The mapping splits cleanly:

CRM layer → app tables (fully specified): Matter → cases, Contact → contacts, Relationship → case_contacts, Task → tasks, CalendarEntry → events, Note → notes, Bill / Activity → billing_entries, Document → attachments.
Bankruptcy schema → Clio custom fields (documented, firm-specific): Clio stores chapter/SSN/trustee etc. in per-firm custom fields. The per-firm mapping has to be configured by the user.

Clio does not carry creditor schedules, property, income/expense detail, means-test data, SOFA data, or plan data. It provides the case management shell; everything form-specific comes from another source (credit report, direct intake).

Status: documented mapping, not yet wired. There is no running Clio OAuth + sync loop in packages/server. The doc is the spec; hooking it up is a future task.

The mechanism

When wired, this uses the existing DataSource mechanism:

External payload → DataSource.config (field map) → schema keys → Entry (source='clio')

Same database write. Same review flow (or auto-confirmed per config). Same binding engine on the other side. Adding a new API source is a parser plus a DataSource JSON — the Entry shape doesn't change.

The merge model

Current value of any schema key on a case = the value from the latest confirmed Entry that touched that key. loadCaseValues(caseId) in services/intake.ts (lines 244–257) is the canonical implementation:

SELECT "values" FROM entries
WHERE case_id = ? AND confirmed = true
ORDER BY timestamp ASC

…then Object.assign across all rows. Latest-wins per key, because later objects overwrite earlier ones.

Consequences:

Corrections are writes. A lawyer who fixes a creditor name from a credit-report Entry just adds a new manual Entry with that one key. The credit-report Entry stays in the timeline unchanged.
Append-only. No Entry is ever mutated in place. The only state change on an existing Entry is confirmed: false → true.
Reviewable before filing. Unconfirmed Entries are visible in the Data tab and in the Activity feed, but their values don't reach the binding engine. A reviewer has to promote them.
Source-tracked. entries.source, entries.data_source_id, and entries.user_id are always set. Every value on every form is traceable to the Entry that produced it.
Invoker on join. The server populates Entry.invoker = { id, name } on read for UI purposes — it is not a wire-level column on entries.

Why this is powerful

One schema, N sources, one artifact. Manual, portal, widget, document, API — all five produce INSERT INTO entries (...) with the same shape. Reviewable, auditable, append-only, invoker-tracked by construction. There is no second code path for "external data" versus "typed data."
Adding a new intake source is data, not code. A new intake JSON under domains/<vertical>/data-sources/ with a slug and published: true is live on /intake/<slug> immediately. A new API sync needs a small parser plus a DataSource config; the Entry write is the same line.
The intake JSONs are the IP. What to ask, in what order, conditioned on what, in what tone — the lawyer's domain expertise — lives in IntakeConfig.sections[].questions[]. This is the same place the four Atlas / DebtStoppers / Greenfield / self-file configs live. Everything below that point — the portal UI, the rate limiter, the attachment writer, the Entry insert, the binding engine — is generic.
Cross-vertical by default. The same portal, widget, invite, and Entry pipeline runs for any domain whose schema + forms are loaded. Immigration intakes, family-law intakes, and bankruptcy intakes share one intake subsystem. Swap the schema and the intake configs; nothing else changes.

Key files

Types: packages/core/src/api/types.ts
Server routes: packages/server/src/routes/intake.ts, packages/server/src/routes/portal.ts
Server services: packages/server/src/services/intake.ts, packages/server/src/services/portal.ts, packages/server/src/services/rate-limit.ts
Portal app: packages/portal/src/intake/, packages/portal/src/widget/, packages/portal/src/pages/, packages/portal/src/shared/
Intake configs: domains/bankruptcy/data-sources/intake-atlas.json, intake-debtstoppers.json, intake-greenfield.json, intake-individual-self.json
DataSource mappings: domains/bankruptcy/data-sources/credit-report-mapping.md, clio-api-mapping.md
Companion doc: docs/engine.md (Any Source, Same Result)

Source: docs/intake.md