Skip to main content

Workflows

Under Review

This article is currently under review. Some content may be incomplete or inaccurate.

Workflows define the processing pipeline for a project. They determine which activities run and in what order.

Workflow Structure

A workflow is a sequence of activities:

{
"activities": [
{ "type": "import", "enabled": true },
{ "type": "data_transform", "enabled": true },
{ "type": "split", "enabled": true },
{ "type": "classify", "enabled": true },
{ "type": "extract", "enabled": true }
]
}

Activity Types

TypePurposeInputOutput
importIngest uploaded filesRaw filesRegistered documents
data_transformConvert to processable formatDocumentsPage images + OCR
splitDetect document boundariesPagesDocument groupings
classifyCategorize documentsDocument textClassification labels
extractPull structured dataDocument text + schemaField values

Configuring Workflows

Enabling/Disabling Activities

Toggle any activity on or off in the project settings. For example, if your documents are already single-page and pre-classified:

{
"activities": [
{ "type": "import", "enabled": true },
{ "type": "data_transform", "enabled": true },
{ "type": "split", "enabled": false },
{ "type": "classify", "enabled": false },
{ "type": "extract", "enabled": true }
]
}

Activity Configuration

Each activity can be configured independently:

Classification Config

SettingDescriptionDefault
modelAzure OpenAI model to usegpt-4.1-mini
temperatureModel creativity (0-1)0.0
confidence_thresholdMinimum confidence score0.7

Extraction Config

SettingDescriptionDefault
modelAzure OpenAI model to usegpt-4.1-mini
temperatureModel creativity (0-1)0.0
include_coordinatesReturn bounding boxestrue
confidence_thresholdMinimum confidence score0.7

Settings Inheritance

Configuration follows an inheritance chain:

Project Defaults → Workflow Activity Config → Runtime Overrides (per transaction)

This lets you set sensible defaults at the project level while allowing per-transaction customization when needed.

Workflow Execution

The workflow engine:

  1. Picks up transactions from the Redis queue
  2. Runs each enabled activity in sequence
  3. Tracks progress and status per activity
  4. Handles retries for transient failures
  5. Reports completion or failure

Monitoring

View workflow progress in the transaction detail view. Each activity shows:

  • Status: Pending, running, completed, or failed
  • Duration: How long the activity took
  • Errors: Detailed error messages if something went wrong