Skip to main content

Projects

Under Review

This article is currently under review. Some content may be incomplete or inaccurate.

Projects are the central configuration unit in DocAI Fabric. Each project defines how a specific type of document flow is processed, including what document types to expect, what data to extract, and how the processing pipeline runs.

Project Structure

A project contains:

  • Document Classes: The types of documents this project handles
  • Extraction Fields: What data to extract from each document class
  • Workflow Configuration: The processing pipeline settings
  • Settings: OCR, classification, and extraction parameters

Creating a Project

  1. Navigate to the main dashboard
  2. Click New Project
  3. Enter a project name and description
  4. Configure your first document class

When to Create a New Project vs. Add to an Existing One

A project represents a single document processing pipeline. The key question is: do these documents arrive together or separately?

Use the Same Project When

  • Different document types appear mixed together in the same uploaded files or batches (e.g., invoices and receipts scanned in one PDF)
  • The documents share the same processing workflow (same review process, same export destination, same team)
  • You need the system to split and classify documents automatically from combined uploads
Example

An accounts payable team receives mixed scans containing invoices, credit notes, and purchase orders. These all belong in one project with three document classes, with splitting and classification enabled.

Create a Separate Project When

  • The document types come from completely different sources or workflows and will never appear in the same upload
  • Different teams or departments handle them independently
  • They have different review processes, export destinations, or quality requirements
  • You want independent settings (confidence thresholds, OCR options, etc.) for each document flow
Example

One team processes bank statements, another processes tax invoices. These are separate document flows with different reviewers and different downstream systems. Each should be its own project.

Rule of Thumb

If documents arrive in the same envelope (physical or digital), they belong in the same project. If they arrive through different channels, separate projects are cleaner.

Document Classes

Document classes define the categories of documents your project processes. For example, an accounts payable project might have:

  • Invoice
  • Purchase Order
  • Receipt
  • Credit Note

What Makes a Distinct Document Type?

A document type is typically defined by two things:

  1. The set of fields to extract: If two documents require different extraction fields, they are different document types.
  2. The set of validation rules to apply: If different business rules apply (different required fields, different format validations, different cross-field checks), they are different types.

Same fields + same rules = same type. Even visually different documents are the same class if they share the same fields and validation logic. Variations within a type can be captured as field values rather than separate classes.

Example

A "Standard Invoice" and a "Proforma Invoice" both have invoice number, vendor, date, line items, and total. They follow the same validation rules. Instead of two classes, use one "Invoice" class and add a field like "Invoice Subtype" to distinguish them.

Different fields or different rules = different type. If two documents need fundamentally different data extracted or different validation logic, they should be separate classes.

Example

An "Invoice" needs vendor, line items, and totals. A "Packing Slip" needs item descriptions, quantities, and tracking numbers. Different fields → different document classes.

Quick Reference

ScenarioRecommendation
Same fields, same rules, different layoutOne class: the AI handles layout variation
Same fields, same rules, different issuerOne class: add an "Issuer" field
Same core fields, a few extra fields for some variantsOne class: make the extra fields optional
Fundamentally different fieldsSeparate classes
Same fields but different validation rulesSeparate classes
Regional variants (US invoice vs. EU invoice)Usually one class, unless field sets diverge significantly
Avoid Over-Classification

Creating too many classes increases classification errors and adds configuration overhead. Start with broader categories and split only when the field sets truly diverge.

Too granular: "Vendor A Invoice", "Vendor B Invoice", "Vendor C Invoice"
Better: "Invoice" with a "Vendor Name" extraction field

Class Fields

Each document class has its own set of extraction fields:

{
"class_name": "Invoice",
"fields": [
{
"name": "Invoice Number",
"type": "text",
"required": true,
"description": "Unique invoice identifier"
},
{
"name": "Total Amount",
"type": "number",
"required": true,
"description": "Total amount due"
}
]
}

Field Types

TypeDescriptionExample
textFree-form textInvoice numbers, names
numberNumeric valuesAmounts, quantities
integerWhole numbersQuantities, counts
dateDate valuesInvoice dates, due dates
currencyMonetary amountsTotals, subtotals
booleanTrue/false valuesFlags, checkboxes
emailEmail addressesContact emails
phonePhone numbersContact numbers
addressPhysical addressesBilling/shipping addresses
listMultiple values / repeating groupsLine items, table rows

Writing Effective Field Descriptions

The field description is the single most important factor for extraction quality. It guides the AI on what to look for and where.

Good description practices:

  • Be specific: "The unique invoice identifier, usually starting with 'INV-'"
  • Include location hints: "Usually found in the top-right corner of the first page"
  • Mention expected format: "Date in format MM/DD/YYYY"
  • Clarify ambiguity: "The total amount due including tax, not the subtotal"

Required vs Optional Fields

  • Required: Field must have a value; missing values are flagged for review
  • Optional: Field may be empty; no warning if not found

Use required fields for critical data that must always be present (e.g., invoice number, total amount). Use optional fields for data that may not exist on every document (e.g., PO number, discount).

Repeating Groups (Tables)

For extracting tabular data with multiple rows, define a repeating group with sub-fields:

  • Item Description (text)
  • Quantity (integer)
  • Unit Price (currency)
  • Line Total (currency)

Each row in the table becomes one entry in the group. Describe the table structure clearly so the AI knows what to look for.

Extraction Confidence

Each extracted value includes a confidence score:

  • High (> 0.8): Value clearly found in document
  • Medium (0.5-0.8): Probable match, may need verification
  • Low (< 0.5): Uncertain extraction, should be reviewed

Values with low confidence are highlighted in the Transaction Viewer for easy review.

Choosing the Right Processing Mode

When creating a project, configure the processing stages to match your document flow.

Split + Classification + Extraction

Use when: Uploaded files may contain multiple documents of different types mixed together.

  • Split detects document boundaries within multi-page files
  • Classification identifies each document's type
  • Extraction pulls data based on the classified type

Example: A batch scanner produces PDFs containing a mix of invoices, receipts, and delivery notes.

Classification + Extraction (No Split)

Use when: Each uploaded file contains exactly one document, but documents may be of different types.

  • Each file is treated as one document (no boundary detection)
  • Classification determines the document type
  • Extraction uses the classified type's field definitions

Example: Users upload individual PDFs; each is one document, but could be an invoice, a contract, or a receipt.

Extraction Only (No Split, No Classification)

Use when: All documents are the same known type and each file contains exactly one document.

  • Each file is one document, all assigned to a single class
  • Extraction runs using the single configured document class's fields

Example: A process that only ever receives invoices, one per file.

Decision Flowchart

Ask yourself three questions to pick the right mode:

  1. Can a single file contain multiple documents? → Enable Split
  2. Do you process more than one type of document? → Enable Classification
  3. What data do you need to extract?Extraction is always enabled: this is the core value

Project Settings

Project settings control the behavior of each processing step. Settings follow an inheritance chain:

Project Settings → Workflow Activity Config → Runtime Overrides

This means you can set defaults at the project level and override them per-workflow or per-transaction.

Key Settings

SettingDescriptionDefault
OCR EngineWhich OCR service to useAzure AI Vision
Classification ModelGPT model for classificationgpt-4.1-mini
Extraction ModelGPT model for extractiongpt-4.1-mini
Confidence ThresholdMinimum confidence for auto-approval0.7