Webhooks: Get Results Without Polling
Webhooks let your application be notified the moment a transaction's results are ready, instead of polling GET /transactions/{id} in a loop. When the workflow finishes (or fails), DocAI Fabric POSTs a JSON payload to a URL you configure, including direct download links to the export files.
Use a webhook when:
- You're processing many transactions and don't want to poll each one.
- Latency matters: you want results pushed as soon as they exist.
- Your receiver is reachable from the public internet (webhooks cannot deliver to private hosts, see Security).
If you can't accept inbound HTTP requests, stick with polling.
How It Works
The webhook is delivered by a Notification activity placed in your project's workflow. Add it once in the workflow designer; every transaction that reaches the node fires a POST to your endpoint.
Import → Convert → OCR → Classify → Extract → Export → Notification ▶ POST https://your-app.example.com/hooks/docai
- New projects created in DocAI Fabric ship with a Notification activity already wired in at the end of the workflow; you only need to fill in the webhook URL to activate it.
- Place the Notification activity after Export so the payload's export list is non-empty.
- A single workflow can have multiple Notification nodes (e.g. one after classification for an early "ready for review" ping, another at the end for "results ready").
Two trigger events
| Event | Fires when | Payload |
|---|---|---|
reached (success) | Workflow execution reaches the Notification node | Includes the export manifest with download URLs |
failed | The workflow aborts before reaching the node | Includes an error field; exports is empty |
Both are independently toggleable via the activity's events setting. By default, both are enabled.
Configuring the Notification Activity
Open your project in Project Settings → Workflow, click the Notification node (or drag one in from the activity palette and place it after Export), and fill in the configuration:
| Field | Default | Purpose |
|---|---|---|
webhook_url | (required) | Endpoint that receives the POST. Must resolve to a public host. |
events | ["reached", "failed"] | Which triggers fire this webhook. |
event_name | null | Overrides the event field on the success payload. Defaults to transaction.step_reached. |
http_method | POST | POST or PUT. |
custom_headers | {} | Extra HTTP headers (e.g. Authorization: Bearer ... for your receiver). |
hmac_secret | null | If set, the body is signed with HMAC-SHA256 (see Verifying the signature). |
timeout_seconds | 15 | Per-attempt request timeout. |
retry_max_attempts | 3 | Total delivery attempts (1 = no retry). |
retry_backoff_seconds | 2.0 | Base delay between retries; doubles each attempt. |
api_base_url | null | Base URL used to build absolute download links. Defaults to the deployment's public URL. |
fail_workflow_on_error | false | If true, a delivery failure fails the workflow step (so the transaction shows failed). |
While developing, point webhook_url at a service like webhook.site or requestbin.com to inspect payloads. Switch to your own endpoint once the schema is wired up.
Webhook Payload
The receiver gets a JSON POST with this shape:
Success (transaction.step_reached)
{
"event": "transaction.step_reached",
"tenant_id": "your-tenant-id",
"project_id": "your-project-id",
"transaction_id": "550e8400-e29b-41d4-a716-446655440000",
"transaction_name": "Invoice batch - 2026-05-19",
"occurred_at": "2026-05-19T12:00:00+00:00",
"exports": {
"manifest_url": "https://app.docaifabric.com/transactions/550e.../exports",
"total_files": 2,
"total_size_bytes": 51200,
"files": [
{
"profile_id": "default-json",
"profile_name": "Transaction JSON",
"output_format": "json",
"filename": "550e..._data.json",
"size_bytes": 4821,
"download_url": "https://app.docaifabric.com/transactions/550e.../exports/download/550e..._data.json"
},
{
"profile_id": "invoice-pdf",
"profile_name": "Invoice PDF",
"output_format": "pdf",
"filename": "INV-12345.pdf",
"size_bytes": 46379,
"download_url": "https://app.docaifabric.com/transactions/550e.../exports/download/INV-12345.pdf"
}
]
}
}
Failure (transaction.failed)
{
"event": "transaction.failed",
"tenant_id": "your-tenant-id",
"project_id": "your-project-id",
"transaction_id": "550e8400-e29b-41d4-a716-446655440000",
"transaction_name": "Invoice batch - 2026-05-19",
"occurred_at": "2026-05-19T12:00:03+00:00",
"exports": {},
"error": "Extraction step failed: model returned invalid JSON"
}
Field reference
| Field | Description |
|---|---|
event | transaction.step_reached (or your custom event_name) for success, transaction.failed on failure. Use this, not status, to branch. |
tenant_id, project_id, transaction_id | Identifiers for the transaction. Use transaction_id to correlate with your original submission (or with the correlation_id you set when creating it). |
transaction_name | Display name from the UI; may be null if you didn't set one. |
occurred_at | ISO-8601 UTC timestamp when the notification was built. |
exports.manifest_url | API URL to fetch the full export manifest (same as GET /transactions/{id}/exports). |
exports.files[] | One entry per generated export file (JSON, PDF, XLSX, CSV, XML). Each has download_url pointing at the file. |
error | Present only on transaction.failed. Short human-readable description of what failed. |
Downloading the Results
The download_url and manifest_url values point at the API-key-authenticated endpoints; the webhook payload itself does not carry credentials. Your receiver must already hold a valid project API key and send it as X-API-Key when downloading.
import requests
def handle_webhook(payload):
if payload["event"] == "transaction.failed":
log.error("Transaction %s failed: %s",
payload["transaction_id"], payload.get("error"))
return
for file in payload["exports"]["files"]:
if file["output_format"] != "json":
continue
resp = requests.get(
file["download_url"],
headers={"X-API-Key": API_KEY},
)
resp.raise_for_status()
data = resp.json()
for doc in data["documents"]:
for field_id, field in doc["fields"].items():
print(f"{field['name']}: {field['value']}")
The downloaded JSON has the same schema as the polling-based flow (see JSON Results Schema).
Verifying the Signature
When hmac_secret is set on the Notification activity, every delivery includes:
X-DocAI-Signature: sha256=<hex digest>
The digest is computed over the exact raw request body with HMAC-SHA256 using your secret. Verify it before trusting the payload: anyone who knows the URL could otherwise forge requests.
The body is serialized compactly (separators=(",", ":")) with sorted keys, so the same bytes are signed and delivered.
Python (Flask)
import hmac, hashlib
from flask import Flask, request, abort
app = Flask(__name__)
SECRET = b"your-shared-secret"
@app.post("/hooks/docai")
def receive():
raw = request.get_data() # important: raw bytes, before any JSON parsing
sent = request.headers.get("X-DocAI-Signature", "")
expected = "sha256=" + hmac.new(SECRET, raw, hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, sent):
abort(401)
payload = request.get_json()
# ... handle payload
return "", 200
Node.js (Express)
const crypto = require("crypto");
const express = require("express");
const app = express();
const SECRET = "your-shared-secret";
// Capture the raw body so the signature can be verified
app.use(express.json({
verify: (req, _res, buf) => { req.rawBody = buf; }
}));
app.post("/hooks/docai", (req, res) => {
const sent = req.header("X-DocAI-Signature") || "";
const expected = "sha256=" + crypto
.createHmac("sha256", SECRET)
.update(req.rawBody)
.digest("hex");
if (!crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(sent))) {
return res.status(401).end();
}
// ... handle req.body
res.status(200).end();
});
Re-serializing req.body with your own JSON encoder will produce different bytes and the signature check will fail. Always sign-check against the bytes that arrived on the wire.
Delivery Semantics
- DocAI Fabric expects a
2xxresponse withintimeout_seconds. Anything else is treated as a failed attempt. 5xxand429responses are retried up toretry_max_attemptstimes with exponential backoff (retry_backoff_seconds, doubling each attempt).- Other
4xxresponses are terminal: they won't change on retry, so delivery gives up immediately. - A failed delivery is logged on the server but does not affect the transaction unless
fail_workflow_on_erroris enabled. - The same transaction may fire the webhook more than once in some failure modes (e.g. a Notification node placed mid-workflow before a step that later fails will fire both
reachedandfailed). Make your receiver idempotent: key offtransaction_id+event.
Recommended receiver pattern
- Verify the HMAC signature.
- Read
transaction_idandevent. - Look up whether you've already processed this
(transaction_id, event)pair; if so, return200immediately. - Otherwise, download the export files using your API key, persist the results, then return
200. - If anything fails, return a
5xxso DocAI Fabric retries.
Security
Public host requirement (SSRF guard)
Webhook URLs are validated before every delivery. DocAI Fabric rejects:
- Non-
http/httpsschemes. - Hosts that resolve to a private, loopback, link-local, multicast, reserved, or unspecified address.
This blocks accidental (or malicious) requests to internal infrastructure and cloud metadata endpoints (e.g. 169.254.169.254). If you're running on a private network, expose a public ingress (a tunnel like ngrok during development, or a load balancer in production).
Defense-in-depth checklist
- Always set
hmac_secretin production: it's the only way to be sure the request actually came from DocAI Fabric. - Use HTTPS for
webhook_urlso the payload (and signature) can't be observed on the wire. - Rotate the secret periodically; update both ends in the same change window.
- Restrict by IP at your edge if your deployment has a fixed egress IP range.
- Don't rely on the URL being secret: treat it as a public endpoint and use the signature as the only authentication.
End-to-End Example
# 1. Submit a transaction normally (one-step API).
import requests, base64
with open("invoice.pdf", "rb") as f:
file_data = base64.b64encode(f.read()).decode()
requests.post(
f"{BASE_URL}/tenants/{TENANT}/projects/{PROJECT}/transactions/process",
headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
json={
"source_files": [{"filename": "invoice.pdf", "base64_data": file_data}],
"correlation_id": "order-9981",
},
)
# That's it: no polling. Your webhook receiver below will be called.
# 2. Webhook receiver: verifies the signature, downloads the JSON, prints fields.
import hmac, hashlib, requests
from flask import Flask, request, abort
app = Flask(__name__)
SECRET = b"your-shared-secret"
API_KEY = "your-api-key"
@app.post("/hooks/docai")
def receive():
raw = request.get_data()
expected = "sha256=" + hmac.new(SECRET, raw, hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, request.headers.get("X-DocAI-Signature", "")):
abort(401)
p = request.get_json()
if p["event"] == "transaction.failed":
print(f"Transaction {p['transaction_id']} failed: {p.get('error')}")
return "", 200
json_files = [f for f in p["exports"]["files"] if f["output_format"] == "json"]
for f in json_files:
data = requests.get(
f["download_url"], headers={"X-API-Key": API_KEY}
).json()
for doc in data["documents"]:
print(f"\n{doc.get('document_type', 'Unknown')}:")
for field_id, field in doc["fields"].items():
print(f" {field['name']}: {field['value']}")
return "", 200
Polling vs. Webhooks: Quick Comparison
| Polling | Webhooks | |
|---|---|---|
| Setup | None, just call GET /transactions/{id} | Add a Notification activity to the workflow |
| Receiver | Anything that can make HTTP calls | Must accept inbound HTTPS on a public host |
| Latency | Up to your poll interval (or ?timeout=30 long-poll) | Immediate when the workflow reaches the node |
| Cost | Wasted requests on transactions still in progress | One delivery per transaction |
| Failure handling | Your code retries the GET | DocAI Fabric retries delivery |
| Best for | Small volumes, scripts, dev/testing | Production, high-volume, async pipelines |
You can use both: webhooks for the happy path, polling as a fallback if your receiver was offline.
Next Steps
- Process Documents via API: Submitting documents and the export-file schema.
- Authentication: API key permissions for downloading export files.
- Datasets: Route transactions to Playground, Production, or Evaluation datasets.