Models
DocAI Fabric uses AI models for every intelligent step of the pipeline: OCR, splitting, classification, extraction, and the in-app assistant. The Models page is where a tenant administrator reviews the available models, brings their own, and chooses which model serves each task.
It has three tabs:
| Tab | Purpose |
|---|---|
| Models | Every model and OCR engine available to the tenant. |
| Defaults | Which model is used by default for each task. |
| Usage | Estimated cost tracking and budgets (see Cost & Budgets). |
The Models page is available to tenant administrators (anyone with the
tenant.manage permission).
System vs. tenant models
The Models tab lists two kinds of models, each with a badge:
- System models (system): provided by the platform and shared across tenants: the built-in OCR engines and the default LLMs. They are read-only for tenants; you can use them and test their endpoints, but only a platform administrator can change them or their cost rates.
- Tenant models (tenant): your own bring-your-own (BYO) LLM endpoints, which you add, configure, and remove. DocAI Fabric load-balances across each model's endpoints using the same routing pool that serves system models.
Each row shows the model's tasks, modalities, context window, endpoint health, and its estimated per-unit cost (see Cost & Budgets).
Bring your own model
To register a custom model, open the Models tab and click Add model. A model is a logical entry that can be served by one or more endpoints (see below).
Model settings
| Field | Meaning |
|---|---|
| Model id (slug) | Lowercase logical identifier (letters, digits, -, _) shared by all of the model's endpoints. It's how you pick the model for per-task defaults and how it appears in usage logs. Immutable after creation. |
| Display name | Human-friendly name shown in the UI. |
| Context window (tokens) | The model's maximum context size; used when fitting prompts. |
| Supported tasks | Which pipeline stages this model may serve: split, classify, extract, assistant, and ocr (only if the model does OCR natively). |
| Modalities | Text, Image (needed for image-based extraction), and Streaming (for the assistant). |
| Usage counted as | Which counters this model contributes to on the Usage tab: pages, input tokens, output tokens. LLMs typically count all token metrics; an OCR-style model may count only pages. |
| Estimated cost | Optional per-unit rates used for cost tracking and budgets (see Cost & Budgets). Leave blank to count as 0. |
| Notes | Free-text notes for your own reference. |
At least one endpoint is required. After saving, you can make the model a per-task default on the Defaults tab.
Endpoints, and why add more than one
An endpoint is a single network destination that serves the model (a deployment, a gateway, a self-hosted server). A model can have several, and DocAI Fabric load-balances and fails over across them automatically. Reasons to add more than one:
- Throughput & rate limits: spreading load across several deployments avoids hitting one endpoint's quota (HTTP 429) under heavy batches.
- Resilience: if an endpoint fails or its circuit breaker trips (after repeated errors), the others keep serving with no manual intervention.
- Cost control: keep cheaper endpoints in the Normal tier and pricier ones in Overflow / Emergency, so the expensive capacity is only used when needed.
- Regional compliance: pin specific endpoints to regions so a project that restricts processing to a region only uses eligible endpoints.
Endpoint settings
| Field | Meaning |
|---|---|
| Name | A label for the endpoint. |
| Protocol | OpenAI-compatible, Azure OpenAI, or Azure AI Inference. |
| Endpoint URL / API path | The base URL. OpenAI-compatible endpoints use an API path (default /v1/chat/completions); Azure endpoints use a deployment name and API version instead. |
| Authentication | Bearer token, API-key header, or custom headers (e.g. Cloudflare Access client id/secret, custom gateways). Secrets are stored encrypted; on edit, leave a secret blank to keep the existing value. |
| Region | global, or a specific region for compliance routing. |
| Tier | Routing tier (see below). |
| Priority (1 = highest) | Tiebreaker within a tier. |
| Enabled | Whether the endpoint is eligible for routing. |
On a saved model, each endpoint has a Test button that sends a probe request and reports latency and status; use it to verify credentials and connectivity.
How routing works
When a model is needed, DocAI Fabric chooses an endpoint like this:
- Tier order: endpoints are tried by tier:
- Normal: the default; always tried first.
- Overflow: used only when every Normal endpoint is over its quota cap or has an open circuit breaker.
- Emergency: last resort (typically the most expensive); used only when Normal and Overflow are exhausted.
- Priority within a tier: a lower priority number is preferred. Endpoints with equal priority are round-robined. Priority never crosses tiers: a Normal endpoint with priority 10 is still tried before any Overflow endpoint.
- Region: if a project restricts processing to a region, only endpoints in that region are eligible.
Defaults tab
The Defaults tab sets which model serves each task (split, classify, extract, assistant) by default for the tenant. Each task can either:
- use the system default for that task, or
- use one of your tenant models that supports the task.
Individual projects can still override these per-task choices in their own workflow configuration; the Defaults tab sets the tenant-wide fallback.
Estimated cost
Every model, system or tenant, can carry an estimated cost rate (LLMs: per 1M input/output tokens; OCR engines: per page). Tenant-model rates are set in the model's Edit dialog; system-model rates are set by the platform administrator. These rates drive the cost figures and budgets on the Usage tab. See Cost & Budgets for the full picture.