Projects
This article is currently under review. Some content may be incomplete or inaccurate.
Projects are the central configuration unit in DocAI Fabric. Each project defines how a specific type of document flow is processed, including what document types to expect, what data to extract, and how the processing pipeline runs.
Project Structure
A project contains:
- Document Classes: The types of documents this project handles
- Extraction Fields: What data to extract from each document class
- Workflow Configuration: The processing pipeline settings
- Settings: OCR, classification, and extraction parameters
Creating a Project
- Navigate to the main dashboard
- Click New Project
- Enter a project name and description
- Configure your first document class
When to Create a New Project vs. Add to an Existing One
A project represents a single document processing pipeline. The key question is: do these documents arrive together or separately?
Use the Same Project When
- Different document types appear mixed together in the same uploaded files or batches (e.g., invoices and receipts scanned in one PDF)
- The documents share the same processing workflow (same review process, same export destination, same team)
- You need the system to split and classify documents automatically from combined uploads
An accounts payable team receives mixed scans containing invoices, credit notes, and purchase orders. These all belong in one project with three document classes, with splitting and classification enabled.
Create a Separate Project When
- The document types come from completely different sources or workflows and will never appear in the same upload
- Different teams or departments handle them independently
- They have different review processes, export destinations, or quality requirements
- You want independent settings (confidence thresholds, OCR options, etc.) for each document flow
One team processes bank statements, another processes tax invoices. These are separate document flows with different reviewers and different downstream systems. Each should be its own project.
Rule of Thumb
If documents arrive in the same envelope (physical or digital), they belong in the same project. If they arrive through different channels, separate projects are cleaner.
Document Classes
Document classes define the categories of documents your project processes. For example, an accounts payable project might have:
- Invoice
- Purchase Order
- Receipt
- Credit Note
What Makes a Distinct Document Type?
A document type is typically defined by two things:
- The set of fields to extract: If two documents require different extraction fields, they are different document types.
- The set of validation rules to apply: If different business rules apply (different required fields, different format validations, different cross-field checks), they are different types.
Same fields + same rules = same type. Even visually different documents are the same class if they share the same fields and validation logic. Variations within a type can be captured as field values rather than separate classes.
A "Standard Invoice" and a "Proforma Invoice" both have invoice number, vendor, date, line items, and total. They follow the same validation rules. Instead of two classes, use one "Invoice" class and add a field like "Invoice Subtype" to distinguish them.
Different fields or different rules = different type. If two documents need fundamentally different data extracted or different validation logic, they should be separate classes.
An "Invoice" needs vendor, line items, and totals. A "Packing Slip" needs item descriptions, quantities, and tracking numbers. Different fields → different document classes.
Quick Reference
| Scenario | Recommendation |
|---|---|
| Same fields, same rules, different layout | One class: the AI handles layout variation |
| Same fields, same rules, different issuer | One class: add an "Issuer" field |
| Same core fields, a few extra fields for some variants | One class: make the extra fields optional |
| Fundamentally different fields | Separate classes |
| Same fields but different validation rules | Separate classes |
| Regional variants (US invoice vs. EU invoice) | Usually one class, unless field sets diverge significantly |
Creating too many classes increases classification errors and adds configuration overhead. Start with broader categories and split only when the field sets truly diverge.
Too granular: "Vendor A Invoice", "Vendor B Invoice", "Vendor C Invoice"
Better: "Invoice" with a "Vendor Name" extraction field
Class Fields
Each document class has its own set of extraction fields:
{
"class_name": "Invoice",
"fields": [
{
"name": "Invoice Number",
"type": "text",
"required": true,
"description": "Unique invoice identifier"
},
{
"name": "Total Amount",
"type": "number",
"required": true,
"description": "Total amount due"
}
]
}
Field Types
| Type | Description | Example |
|---|---|---|
text | Free-form text | Invoice numbers, names |
number | Numeric values | Amounts, quantities |
integer | Whole numbers | Quantities, counts |
date | Date values | Invoice dates, due dates |
currency | Monetary amounts | Totals, subtotals |
boolean | True/false values | Flags, checkboxes |
email | Email addresses | Contact emails |
phone | Phone numbers | Contact numbers |
address | Physical addresses | Billing/shipping addresses |
list | Multiple values / repeating groups | Line items, table rows |
Writing Effective Field Descriptions
The field description is the single most important factor for extraction quality. It guides the AI on what to look for and where.
Good description practices:
- Be specific: "The unique invoice identifier, usually starting with 'INV-'"
- Include location hints: "Usually found in the top-right corner of the first page"
- Mention expected format: "Date in format MM/DD/YYYY"
- Clarify ambiguity: "The total amount due including tax, not the subtotal"
Required vs Optional Fields
- Required: Field must have a value; missing values are flagged for review
- Optional: Field may be empty; no warning if not found
Use required fields for critical data that must always be present (e.g., invoice number, total amount). Use optional fields for data that may not exist on every document (e.g., PO number, discount).
Repeating Groups (Tables)
For extracting tabular data with multiple rows, define a repeating group with sub-fields:
- Item Description (text)
- Quantity (integer)
- Unit Price (currency)
- Line Total (currency)
Each row in the table becomes one entry in the group. Describe the table structure clearly so the AI knows what to look for.
Extraction Confidence
Each extracted value includes a confidence score:
- High (> 0.8): Value clearly found in document
- Medium (0.5-0.8): Probable match, may need verification
- Low (< 0.5): Uncertain extraction, should be reviewed
Values with low confidence are highlighted in the Transaction Viewer for easy review.
Choosing the Right Processing Mode
When creating a project, configure the processing stages to match your document flow.
Split + Classification + Extraction
Use when: Uploaded files may contain multiple documents of different types mixed together.
- Split detects document boundaries within multi-page files
- Classification identifies each document's type
- Extraction pulls data based on the classified type
Example: A batch scanner produces PDFs containing a mix of invoices, receipts, and delivery notes.
Classification + Extraction (No Split)
Use when: Each uploaded file contains exactly one document, but documents may be of different types.
- Each file is treated as one document (no boundary detection)
- Classification determines the document type
- Extraction uses the classified type's field definitions
Example: Users upload individual PDFs; each is one document, but could be an invoice, a contract, or a receipt.
Extraction Only (No Split, No Classification)
Use when: All documents are the same known type and each file contains exactly one document.
- Each file is one document, all assigned to a single class
- Extraction runs using the single configured document class's fields
Example: A process that only ever receives invoices, one per file.
Decision Flowchart
Ask yourself three questions to pick the right mode:
- Can a single file contain multiple documents? → Enable Split
- Do you process more than one type of document? → Enable Classification
- What data do you need to extract? → Extraction is always enabled: this is the core value
Project Settings
Project settings control the behavior of each processing step. Settings follow an inheritance chain:
Project Settings → Workflow Activity Config → Runtime Overrides
This means you can set defaults at the project level and override them per-workflow or per-transaction.
Key Settings
| Setting | Description | Default |
|---|---|---|
| OCR Engine | Which OCR service to use | Azure AI Vision |
| Classification Model | GPT model for classification | gpt-4.1-mini |
| Extraction Model | GPT model for extraction | gpt-4.1-mini |
| Confidence Threshold | Minimum confidence for auto-approval | 0.7 |