Welcome to DocAI Fabric
DocAI Fabric DocAI Fabric is an enterprise-grade framework for applying Generative AI to document processing, delivering reliable and consistent results.
It automatically splits, classifies, extracts, validates, and analyzes data from complex business documents without templates or model training. It applies company knowledge to reason over each business case, helping automate and support decision-making, with clear and explainable results.
The platform provides a strong foundation for using Generative AI in document processing with high accuracy, consistency, and control. It supports models from OpenAI, Anthropic, Google, and open-source options. You can bring your own models or use a fully managed setup, allowing you to start quickly and adapt your AI strategy over time.
The system works and improves from day one without complex setup. You can integrate it into your existing processes to start getting value immediately, and later redesign those processes using DocAI Fabric to fully leverage Generative AI with full context and achieve maximum value.
With its single-container design, DocAI Fabric fits enterprise needs for on-premises and private cloud deployments, and is also available as a scalable SaaS solution.
What Does It Do?
| Capability | Description |
|---|---|
| OCR | Convert scanned pages and images into machine-readable text |
| Document Splitting | Split scanned pages or uploaded files into individual documents |
| Document Classification | Automatically categorize uploaded documents by type |
| Data Extraction | Extract structured fields from unstructured documents |
| Business Rules | Apply validation and transformation rules to extracted data |
| Matching & Enrichment | Match documents to company records such as vendors, purchase orders, and GL accounts |
| Human-in-the-Loop Review | Route uncertain cases to reviewers and improve future results from human feedback |
| Business Case Reasoning | Reason across customer policies, regulations, documentation, and previously processed cases |
| Document-Centric Workflow Automation | Automate end-to-end document workflows around each business case |
| Agent-Discoverable Tooling | Expose reliable document processing as MCP-discoverable tools with escalation to users when needed |
Why better than established legacy IDP?
No templates or labeling and training - works on any layout immediately with. The accuracy of GenAI will surprise you. Natural-language config instead of months-long implementation. Full explainability instead of black-box results. Single container instead of complex multi-server infrastructure.
Why better than using LLMs directly?
OCR-first architecture eliminates hallucinations on scans. Structured workflow (split → classify → extract → validate) instead of ad-hoc prompting. Built-in evaluation catches regressions when models change. Deterministic validation + human review catches every deviation before output. Few-shot learning from corrections replaces manual prompt tuning. No need to build and retain a 2-3 person document-AI team in-house.
Processing Pipeline
A typical end-to-end document process looks like this:
Upload → OCR → Split → Classification → Extraction → Normalization & Validation → HITL Review & Case-Level Reasoning → Learning → File Format Format Conversion & Data Redaction
In each project, the workflow is configurable: any step can be excluded, every step can be customized, and human review can be designed as one or multiple stages when required.
Key Concepts
Tenants & Projects
Your data is organized hierarchically:
- Tenant: Top-level organization (your company or team)
- Project: A processing configuration for a specific document type or use case
- Transaction: A batch of uploaded documents to process
Documents & Pages
- Document: An individual document within a transaction (e.g., one invoice in a batch)
- Page: A single page of a document, with its own OCR text and image
Workflows
Workflows define the processing pipeline for a project. Each workflow contains activities connected into a configurable flow. Activities can run in sequence, branch conditionally, loop over documents, pause for human review, or export results.
Current workflow activities include:
- Import: Convert uploaded PDFs, TIFFs, and images into normalized page images
- OCR: Extract text from page images
- Split: Split pages into individual documents
- Classify: Determine document types
- Extract: Pull structured fields from documents
- Validate: Run business rules validation on extracted data
- Script / Data Transform (coming soon): Invoke custom processing for external engines, data transformation, and enrichment
- Review: Pause processing for human review and approval
- Export: Export documents and results to downstream formats
- If Condition / Switch: Route documents through different branches based on workflow logic
Some additional activity types, such as For Each Document, also exist in the workflow system and are being expanded.
Evaluation
Evaluation lets teams test draft configurations, measure extraction quality, and compare results before promoting changes to production.
- Playground: Test draft configurations on documents and verify they perform better before publishing
- Evaluation: A controlled test run used to measure extraction quality and validate configuration changes
- Benchmarking: Compare results across prompts, models, workflow settings, or project versions
Memory
Memory is the platform's few-shot learning layer for storing correctly labeled examples that improve future classification and extraction quality.
- Memory: A dataset of curated examples used to guide AI behavior on similar future documents
- Few-Shot Learning: Improve accuracy by showing the model prior examples instead of retraining a custom model
- Golden Set: Build a reusable library of high-quality documents and expected outputs for a use case
Next Steps
- Deployment Overview: Choose how you want to run DocAI Fabric
- Quick Start: Process your first document
- Projects: Learn about project configuration
- API Reference: Integrate with the REST API