Models

DocAI Fabric uses AI models for every intelligent step of the pipeline: OCR, splitting, classification, extraction, and the in-app assistant. The Models page is where a tenant administrator reviews the available models, brings their own, and chooses which model serves each task.

It has three tabs:

Tab	Purpose
Models	Every model and OCR engine available to the tenant.
Defaults	Which model is used by default for each task.
Usage	Estimated cost tracking and budgets (see Cost & Budgets).

The Models page is available to tenant administrators (anyone with the tenant.manage permission).

System vs. tenant models

The Models tab lists two kinds of models, each with a badge:

System models (system): provided by the platform and shared across tenants: the built-in OCR engines and the default LLMs. They are read-only for tenants; you can use them and test their endpoints, but only a platform administrator can change them or their cost rates.
Tenant models (tenant): your own bring-your-own (BYO) LLM endpoints, which you add, configure, and remove. DocAI Fabric load-balances across each model's endpoints using the same routing pool that serves system models.

Each row shows the model's tasks, modalities, context window, endpoint health, and its estimated per-unit cost (see Cost & Budgets).

Bring your own model

To register a custom model, open the Models tab and click Add model. A model is a logical entry that can be served by one or more endpoints (see below).

Model settings

Field	Meaning
Model id (slug)	Lowercase logical identifier (letters, digits, `-`, `_`) shared by all of the model's endpoints. It's how you pick the model for per-task defaults and how it appears in usage logs. Immutable after creation.
Display name	Human-friendly name shown in the UI.
Context window (tokens)	The model's maximum context size; used when fitting prompts.
Supported tasks	Which pipeline stages this model may serve: split, classify, extract, assistant, and ocr (only if the model does OCR natively).
Modalities	Text, Image (needed for image-based extraction), and Streaming (for the assistant).
Usage counted as	Which counters this model contributes to on the Usage tab: pages, input tokens, output tokens. LLMs typically count all token metrics; an OCR-style model may count only pages.
Estimated cost	Optional per-unit rates used for cost tracking and budgets (see Cost & Budgets). Leave blank to count as 0.
Notes	Free-text notes for your own reference.

At least one endpoint is required. After saving, you can make the model a per-task default on the Defaults tab.

Endpoints, and why add more than one

An endpoint is a single network destination that serves the model (a deployment, a gateway, a self-hosted server). A model can have several, and DocAI Fabric load-balances and fails over across them automatically. Reasons to add more than one:

Throughput & rate limits: spreading load across several deployments avoids hitting one endpoint's quota (HTTP 429) under heavy batches.
Resilience: if an endpoint fails or its circuit breaker trips (after repeated errors), the others keep serving with no manual intervention.
Cost control: keep cheaper endpoints in the Normal tier and pricier ones in Overflow / Emergency, so the expensive capacity is only used when needed.
Regional compliance: pin specific endpoints to regions so a project that restricts processing to a region only uses eligible endpoints.

Endpoint settings

Field	Meaning
Name	A label for the endpoint.
Protocol	OpenAI-compatible, Azure OpenAI, or Azure AI Inference.
Endpoint URL / API path	The base URL. OpenAI-compatible endpoints use an API path (default `/v1/chat/completions`); Azure endpoints use a deployment name and API version instead.
Authentication	Bearer token, API-key header, or custom headers (e.g. Cloudflare Access client id/secret, custom gateways). Secrets are stored encrypted; on edit, leave a secret blank to keep the existing value.
Region	`global`, or a specific region for compliance routing.
Tier	Routing tier (see below).
Priority (1 = highest)	Tiebreaker within a tier.
Enabled	Whether the endpoint is eligible for routing.

On a saved model, each endpoint has a Test button that sends a probe request and reports latency and status; use it to verify credentials and connectivity.

How routing works

When a model is needed, DocAI Fabric chooses an endpoint like this:

Tier order: endpoints are tried by tier:
- Normal: the default; always tried first.
- Overflow: used only when every Normal endpoint is over its quota cap or has an open circuit breaker.
- Emergency: last resort (typically the most expensive); used only when Normal and Overflow are exhausted.
Priority within a tier: a lower priority number is preferred. Endpoints with equal priority are round-robined. Priority never crosses tiers: a Normal endpoint with priority 10 is still tried before any Overflow endpoint.
Region: if a project restricts processing to a region, only endpoints in that region are eligible.

Defaults tab

The Defaults tab sets which model serves each task (split, classify, extract, assistant) by default for the tenant. Each task can either:

use the system default for that task, or
use one of your tenant models that supports the task.

Individual projects can still override these per-task choices in their own workflow configuration; the Defaults tab sets the tenant-wide fallback.

Estimated cost

Every model, system or tenant, can carry an estimated cost rate (LLMs: per 1M input/output tokens; OCR engines: per page). Tenant-model rates are set in the model's Edit dialog; system-model rates are set by the platform administrator. These rates drive the cost figures and budgets on the Usage tab. See Cost & Budgets for the full picture.

System vs. tenant models​

Bring your own model​

Model settings​

Endpoints, and why add more than one​

Endpoint settings​

How routing works​

Defaults tab​

Estimated cost​