Projects

Under Review

This article is currently under review. Some content may be incomplete or inaccurate.

Projects are the central configuration unit in DocAI Fabric. Each project defines how a specific type of document flow is processed, including what document types to expect, what data to extract, and how the processing pipeline runs.

Project Structure

A project contains:

Document Classes: The types of documents this project handles
Extraction Fields: What data to extract from each document class
Workflow Configuration: The processing pipeline settings
Settings: OCR, classification, and extraction parameters

Creating a Project

Navigate to the main dashboard
Click New Project
Enter a project name and description
Configure your first document class

When to Create a New Project vs. Add to an Existing One

A project represents a single document processing pipeline. The key question is: do these documents arrive together or separately?

Use the Same Project When

Different document types appear mixed together in the same uploaded files or batches (e.g., invoices and receipts scanned in one PDF)
The documents share the same processing workflow (same review process, same export destination, same team)
You need the system to split and classify documents automatically from combined uploads

Example

An accounts payable team receives mixed scans containing invoices, credit notes, and purchase orders. These all belong in one project with three document classes, with splitting and classification enabled.

Create a Separate Project When

The document types come from completely different sources or workflows and will never appear in the same upload
Different teams or departments handle them independently
They have different review processes, export destinations, or quality requirements
You want independent settings (confidence thresholds, OCR options, etc.) for each document flow

Example

One team processes bank statements, another processes tax invoices. These are separate document flows with different reviewers and different downstream systems. Each should be its own project.

Rule of Thumb

If documents arrive in the same envelope (physical or digital), they belong in the same project. If they arrive through different channels, separate projects are cleaner.

Document Classes

Document classes define the categories of documents your project processes. For example, an accounts payable project might have:

Invoice
Purchase Order
Receipt
Credit Note

What Makes a Distinct Document Type?

A document type is typically defined by two things:

The set of fields to extract: If two documents require different extraction fields, they are different document types.
The set of validation rules to apply: If different business rules apply (different required fields, different format validations, different cross-field checks), they are different types.

Same fields + same rules = same type. Even visually different documents are the same class if they share the same fields and validation logic. Variations within a type can be captured as field values rather than separate classes.

Example

A "Standard Invoice" and a "Proforma Invoice" both have invoice number, vendor, date, line items, and total. They follow the same validation rules. Instead of two classes, use one "Invoice" class and add a field like "Invoice Subtype" to distinguish them.

Different fields or different rules = different type. If two documents need fundamentally different data extracted or different validation logic, they should be separate classes.

Example

An "Invoice" needs vendor, line items, and totals. A "Packing Slip" needs item descriptions, quantities, and tracking numbers. Different fields → different document classes.

Quick Reference

Scenario	Recommendation
Same fields, same rules, different layout	One class: the AI handles layout variation
Same fields, same rules, different issuer	One class: add an "Issuer" field
Same core fields, a few extra fields for some variants	One class: make the extra fields optional
Fundamentally different fields	Separate classes
Same fields but different validation rules	Separate classes
Regional variants (US invoice vs. EU invoice)	Usually one class, unless field sets diverge significantly

Avoid Over-Classification

Creating too many classes increases classification errors and adds configuration overhead. Start with broader categories and split only when the field sets truly diverge.

Too granular: "Vendor A Invoice", "Vendor B Invoice", "Vendor C Invoice"
Better: "Invoice" with a "Vendor Name" extraction field

Class Fields

Each document class has its own set of extraction fields:

{
  "class_name": "Invoice",
  "fields": [
    {
      "name": "Invoice Number",
      "type": "text",
      "required": true,
      "description": "Unique invoice identifier"
    },
    {
      "name": "Total Amount",
      "type": "number",
      "required": true,
      "description": "Total amount due"
    }
  ]
}

Field Types

Type	Description	Example
`text`	Free-form text	Invoice numbers, names
`number`	Numeric values	Amounts, quantities
`integer`	Whole numbers	Quantities, counts
`date`	Date values	Invoice dates, due dates
`currency`	Monetary amounts	Totals, subtotals
`boolean`	True/false values	Flags, checkboxes
`email`	Email addresses	Contact emails
`phone`	Phone numbers	Contact numbers
`address`	Physical addresses	Billing/shipping addresses
`list`	Multiple values / repeating groups	Line items, table rows

Writing Effective Field Descriptions

The field description is the single most important factor for extraction quality. It guides the AI on what to look for and where.

Good description practices:

Be specific: "The unique invoice identifier, usually starting with 'INV-'"
Include location hints: "Usually found in the top-right corner of the first page"
Mention expected format: "Date in format MM/DD/YYYY"
Clarify ambiguity: "The total amount due including tax, not the subtotal"

Required vs Optional Fields

Required: Field must have a value; missing values are flagged for review
Optional: Field may be empty; no warning if not found

Use required fields for critical data that must always be present (e.g., invoice number, total amount). Use optional fields for data that may not exist on every document (e.g., PO number, discount).

Repeating Groups (Tables)

For extracting tabular data with multiple rows, define a repeating group with sub-fields:

Item Description (text)
Quantity (integer)
Unit Price (currency)
Line Total (currency)

Each row in the table becomes one entry in the group. Describe the table structure clearly so the AI knows what to look for.

Extraction Confidence

Each extracted value includes a confidence score:

High (> 0.8): Value clearly found in document
Medium (0.5-0.8): Probable match, may need verification
Low (< 0.5): Uncertain extraction, should be reviewed

Values with low confidence are highlighted in the Transaction Viewer for easy review.

Choosing the Right Processing Mode

When creating a project, configure the processing stages to match your document flow.

Split + Classification + Extraction

Use when: Uploaded files may contain multiple documents of different types mixed together.

Split detects document boundaries within multi-page files
Classification identifies each document's type
Extraction pulls data based on the classified type

Example: A batch scanner produces PDFs containing a mix of invoices, receipts, and delivery notes.

Classification + Extraction (No Split)

Use when: Each uploaded file contains exactly one document, but documents may be of different types.

Each file is treated as one document (no boundary detection)
Classification determines the document type
Extraction uses the classified type's field definitions

Example: Users upload individual PDFs; each is one document, but could be an invoice, a contract, or a receipt.

Extraction Only (No Split, No Classification)

Use when: All documents are the same known type and each file contains exactly one document.

Each file is one document, all assigned to a single class
Extraction runs using the single configured document class's fields

Example: A process that only ever receives invoices, one per file.

Decision Flowchart

Ask yourself three questions to pick the right mode:

Can a single file contain multiple documents? → Enable Split
Do you process more than one type of document? → Enable Classification
What data do you need to extract? → Extraction is always enabled: this is the core value

Project Settings

Project settings control the behavior of each processing step. Settings follow an inheritance chain:

Project Settings → Workflow Activity Config → Runtime Overrides

This means you can set defaults at the project level and override them per-workflow or per-transaction.

Key Settings

Setting	Description	Default
OCR Engine	Which OCR service to use	Azure AI Vision
Classification Model	GPT model for classification	gpt-4.1-mini
Extraction Model	GPT model for extraction	gpt-4.1-mini
Confidence Threshold	Minimum confidence for auto-approval	0.7

Project Structure​

Creating a Project​

When to Create a New Project vs. Add to an Existing One​

Use the Same Project When​

Create a Separate Project When​

Rule of Thumb​

Document Classes​

What Makes a Distinct Document Type?​

Quick Reference​

Class Fields​

Field Types​

Writing Effective Field Descriptions​

Required vs Optional Fields​

Repeating Groups (Tables)​

Extraction Confidence​

Choosing the Right Processing Mode​

Split + Classification + Extraction​

Classification + Extraction (No Split)​

Extraction Only (No Split, No Classification)​

Decision Flowchart​

Project Settings​

Key Settings​

Project Structure

Creating a Project

When to Create a New Project vs. Add to an Existing One

Use the Same Project When

Create a Separate Project When

Rule of Thumb

Document Classes

What Makes a Distinct Document Type?

Quick Reference

Class Fields

Field Types

Writing Effective Field Descriptions

Required vs Optional Fields

Repeating Groups (Tables)

Extraction Confidence

Choosing the Right Processing Mode

Split + Classification + Extraction

Classification + Extraction (No Split)

Extraction Only (No Split, No Classification)

Decision Flowchart

Project Settings

Key Settings