Skip to main content

Welcome to DocAI Fabric

DocAI Fabric DocAI Fabric is an enterprise-grade framework for applying Generative AI to document processing, delivering reliable and consistent results.

It automatically splits, classifies, extracts, validates, and analyzes data from complex business documents without templates or model training. It applies company knowledge to reason over each business case, helping automate and support decision-making, with clear and explainable results.

The platform provides a strong foundation for using Generative AI in document processing with high accuracy, consistency, and control. It supports models from OpenAI, Anthropic, Google, and open-source options. You can bring your own models or use a fully managed setup, allowing you to start quickly and adapt your AI strategy over time.

The system works and improves from day one without complex setup. You can integrate it into your existing processes to start getting value immediately, and later redesign those processes using DocAI Fabric to fully leverage Generative AI with full context and achieve maximum value.

With its single-container design, DocAI Fabric fits enterprise needs for on-premises and private cloud deployments, and is also available as a scalable SaaS solution.

What Does It Do?

CapabilityDescription
OCRConvert scanned pages and images into machine-readable text
Document SplittingSplit scanned pages or uploaded files into individual documents
Document ClassificationAutomatically categorize uploaded documents by type
Data ExtractionExtract structured fields from unstructured documents
Business RulesApply validation and transformation rules to extracted data
Matching & EnrichmentMatch documents to company records such as vendors, purchase orders, and GL accounts
Human-in-the-Loop ReviewRoute uncertain cases to reviewers and improve future results from human feedback
Business Case ReasoningReason across customer policies, regulations, documentation, and previously processed cases
Document-Centric Workflow AutomationAutomate end-to-end document workflows around each business case
Agent-Discoverable ToolingExpose reliable document processing as MCP-discoverable tools with escalation to users when needed

Why better than established legacy IDP?

No templates or labeling and training - works on any layout immediately with. The accuracy of GenAI will surprise you. Natural-language config instead of months-long implementation. Full explainability instead of black-box results. Single container instead of complex multi-server infrastructure.

Why better than using LLMs directly?

OCR-first architecture eliminates hallucinations on scans. Structured workflow (split → classify → extract → validate) instead of ad-hoc prompting. Built-in evaluation catches regressions when models change. Deterministic validation + human review catches every deviation before output. Few-shot learning from corrections replaces manual prompt tuning. No need to build and retain a 2-3 person document-AI team in-house.

Processing Pipeline

A typical end-to-end document process looks like this:

Upload → OCR → Split → Classification → Extraction → Normalization & Validation → HITL Review & Case-Level Reasoning → Learning → File Format Format Conversion & Data Redaction

In each project, the workflow is configurable: any step can be excluded, every step can be customized, and human review can be designed as one or multiple stages when required.

Key Concepts

Tenants & Projects

Your data is organized hierarchically:

  • Tenant: Top-level organization (your company or team)
  • Project: A processing configuration for a specific document type or use case
  • Transaction: A batch of uploaded documents to process

Documents & Pages

  • Document: An individual document within a transaction (e.g., one invoice in a batch)
  • Page: A single page of a document, with its own OCR text and image

Workflows

Workflows define the processing pipeline for a project. Each workflow contains activities connected into a configurable flow. Activities can run in sequence, branch conditionally, loop over documents, pause for human review, or export results.

Current workflow activities include:

  1. Import: Convert uploaded PDFs, TIFFs, and images into normalized page images
  2. OCR: Extract text from page images
  3. Split: Split pages into individual documents
  4. Classify: Determine document types
  5. Extract: Pull structured fields from documents
  6. Validate: Run business rules validation on extracted data
  7. Script / Data Transform (coming soon): Invoke custom processing for external engines, data transformation, and enrichment
  8. Review: Pause processing for human review and approval
  9. Export: Export documents and results to downstream formats
  10. If Condition / Switch: Route documents through different branches based on workflow logic

Some additional activity types, such as For Each Document, also exist in the workflow system and are being expanded.

Evaluation

Evaluation lets teams test draft configurations, measure extraction quality, and compare results before promoting changes to production.

  • Playground: Test draft configurations on documents and verify they perform better before publishing
  • Evaluation: A controlled test run used to measure extraction quality and validate configuration changes
  • Benchmarking: Compare results across prompts, models, workflow settings, or project versions

Memory

Memory is the platform's few-shot learning layer for storing correctly labeled examples that improve future classification and extraction quality.

  • Memory: A dataset of curated examples used to guide AI behavior on similar future documents
  • Few-Shot Learning: Improve accuracy by showing the model prior examples instead of retraining a custom model
  • Golden Set: Build a reusable library of high-quality documents and expected outputs for a use case

Next Steps