Welcome to DocAI Fabric

DocAI Fabric DocAI Fabric is an enterprise-grade framework for applying Generative AI to document processing, delivering reliable and consistent results.

It automatically splits, classifies, extracts, validates, and analyzes data from complex business documents without templates or model training. It applies company knowledge to reason over each business case, helping automate and support decision-making, with clear and explainable results.

The platform provides a strong foundation for using Generative AI in document processing with high accuracy, consistency, and control. It supports models from OpenAI, Anthropic, Google, and open-source options. You can bring your own models or use a fully managed setup, allowing you to start quickly and adapt your AI strategy over time.

The system works and improves from day one without complex setup. You can integrate it into your existing processes to start getting value immediately, and later redesign those processes using DocAI Fabric to fully leverage Generative AI with full context and achieve maximum value.

With its single-container design, DocAI Fabric fits enterprise needs for on-premises and private cloud deployments, and is also available as a scalable SaaS solution.

What Does It Do?

Capability	Description
OCR	Convert scanned pages and images into machine-readable text
Document Splitting	Split scanned pages or uploaded files into individual documents
Document Classification	Automatically categorize uploaded documents by type
Data Extraction	Extract structured fields from unstructured documents
Business Rules	Apply validation and transformation rules to extracted data
Matching & Enrichment	Match documents to company records such as vendors, purchase orders, and GL accounts
Human-in-the-Loop Review	Route uncertain cases to reviewers and improve future results from human feedback
Business Case Reasoning	Reason across customer policies, regulations, documentation, and previously processed cases
Document-Centric Workflow Automation	Automate end-to-end document workflows around each business case
Agent-Discoverable Tooling	Expose reliable document processing as MCP-discoverable tools with escalation to users when needed

Why better than established legacy IDP?

No templates or labeling and training - works on any layout immediately with. The accuracy of GenAI will surprise you. Natural-language config instead of months-long implementation. Full explainability instead of black-box results. Single container instead of complex multi-server infrastructure.

Why better than using LLMs directly?

OCR-first architecture eliminates hallucinations on scans. Structured workflow (split → classify → extract → validate) instead of ad-hoc prompting. Built-in evaluation catches regressions when models change. Deterministic validation + human review catches every deviation before output. Few-shot learning from corrections replaces manual prompt tuning. No need to build and retain a 2-3 person document-AI team in-house.

Processing Pipeline

A typical end-to-end document process looks like this:

Upload → OCR → Split → Classification → Extraction → Normalization & Validation → HITL Review & Case-Level Reasoning → Learning → File Format Format Conversion & Data Redaction

In each project, the workflow is configurable: any step can be excluded, every step can be customized, and human review can be designed as one or multiple stages when required.

Key Concepts

Tenants & Projects

Your data is organized hierarchically:

Tenant: Top-level organization (your company or team)
Project: A processing configuration for a specific document type or use case
Transaction: A batch of uploaded documents to process

Documents & Pages

Document: An individual document within a transaction (e.g., one invoice in a batch)
Page: A single page of a document, with its own OCR text and image

Workflows

Workflows define the processing pipeline for a project. Each workflow contains activities connected into a configurable flow. Activities can run in sequence, branch conditionally, loop over documents, pause for human review, or export results.

Current workflow activities include:

Import: Convert uploaded PDFs, TIFFs, and images into normalized page images
OCR: Extract text from page images
Split: Split pages into individual documents
Classify: Determine document types
Extract: Pull structured fields from documents
Validate: Run business rules validation on extracted data
Script / Data Transform (coming soon): Invoke custom processing for external engines, data transformation, and enrichment
Review: Pause processing for human review and approval
Export: Export documents and results to downstream formats
If Condition / Switch: Route documents through different branches based on workflow logic

Some additional activity types, such as For Each Document, also exist in the workflow system and are being expanded.

Evaluation

Evaluation lets teams test draft configurations, measure extraction quality, and compare results before promoting changes to production.

Playground: Test draft configurations on documents and verify they perform better before publishing
Evaluation: A controlled test run used to measure extraction quality and validate configuration changes
Benchmarking: Compare results across prompts, models, workflow settings, or project versions

Memory

Memory is the platform's few-shot learning layer for storing correctly labeled examples that improve future classification and extraction quality.

Memory: A dataset of curated examples used to guide AI behavior on similar future documents
Few-Shot Learning: Improve accuracy by showing the model prior examples instead of retraining a custom model
Golden Set: Build a reusable library of high-quality documents and expected outputs for a use case

Next Steps

Deployment Overview: Choose how you want to run DocAI Fabric
Quick Start: Process your first document
Projects: Learn about project configuration
API Reference: Integrate with the REST API

What Does It Do?​

Why better than established legacy IDP?​

Why better than using LLMs directly?​

Processing Pipeline​

Key Concepts​

Tenants & Projects​

Documents & Pages​

Workflows​

Evaluation​

Memory​

Next Steps​