Skip to main content

Managed in AWS

Work in Progress

AWS deployment is under active development and is offered as an early-access program. The architecture described here is the target design; contact us to discuss timelines and join the program.

Deploy DocAI Fabric entirely inside your AWS account: application, storage, and AI services. You keep full control of your data and infrastructure; we handle the application lifecycle (provisioning, deployments, and updates) through a deployment role you grant to our CI/CD pipeline.

Unlike our Azure offering, no second cloud is involved: documents are processed by Amazon Textract (OCR) and Anthropic Claude on Amazon Bedrock (classification and extraction), all billed to your AWS account.

How It Works

Your AWS Account
├── DocAI Fabric resources
│ ├── ECS Fargate service ─┐
│ ├── ECR repository │
│ ├── ElastiCache (Redis) │
│ ├── Document storage (EFS/S3)│ Provisioned and managed
│ ├── Application Load Balancer├─ by our CI/CD pipeline
│ ├── Secrets Manager │
│ ├── CloudWatch Logs │
│ └── VPC + IAM roles ─┘
└── AWS AI services (serverless, pay-per-use)
├── Amazon Bedrock - Anthropic Claude models
└── Amazon Textract - OCR

You own the infrastructure and data. We deploy and update the application.

Our pipeline authenticates with a short-lived OIDC token against an IAM role you create and control: no long-lived AWS credentials ever leave your account, and every action the pipeline performs is visible in your CloudTrail.

What Gets Deployed

ResourcePurposeSizing
ECS Fargate serviceApplication hosting0.5 vCPU / 1 GB, autoscaling 1-10 tasks
ECR repositoryDocker imagesN/A
ElastiCache (Redis)Job queue & cachingcache.t4g.small, TLS + AUTH, private subnet
Document storageEFS file system (S3 support in development)Encrypted at rest, grows with usage
Application Load Balancer + ACMHTTPS ingress & TLS certificateN/A
Secrets ManagerApplication secretsN/A
CloudWatch LogsCentralized logging30-day retention
VPCNetwork isolation2 availability zones
IAM rolesTask execution & task rolesLeast-privilege, created by the pipeline

AI services require no provisioning: Bedrock and Textract are serverless and billed per use:

AI servicePurposeModel
Amazon BedrockClassification & field extractionAnthropic Claude Haiku 4.5
Amazon BedrockAI Copilot reasoningAnthropic Claude Sonnet 4.6
Amazon TextractOCRDetectDocumentText

Estimated monthly cost: roughly $80-120/month infrastructure, plus pay-per-use AI (Textract ~$1.50 per 1,000 pages; Bedrock per-token). Everything appears on your single AWS bill.

Language coverage for OCR

Amazon Textract reads printed text in English, Spanish, French, German, Italian, and Portuguese (handwriting: English). If your documents are in other languages, talk to us: the application can alternatively connect to Azure Document Intelligence (150+ languages) while keeping everything else in AWS.


Prerequisites

Before starting, make sure you have:

  • OIDC subject value from us (we will provide the exact value for Step 3, e.g., repo:docaifabric/docaifabric:environment:customer-<YOUR_ID>)
  • AWS CLI installed (install guide) and authenticated (aws configure or SSO)
  • Administrator access to the target AWS account (you need to create IAM identity providers and roles)
Use a dedicated AWS account

We strongly recommend a dedicated member account in your AWS Organization, created just for DocAI Fabric. A dedicated account contains nothing else, so granting our pipeline administrative access to it is low-risk by design, and it aligns with the AWS best practice of one account per workload. Create one via AWS Organizations → Add an AWS account.

If you must deploy into a shared account, tell us: we will provide a scoped IAM policy and permissions boundary to use in Step 3 instead of AdministratorAccess.


Setup Guide

Step 1: Choose the Account and Region

Create (or pick) the AWS account and choose a region where both Amazon Bedrock (Anthropic Claude) and Amazon Textract are available, for example:

RegionLocation
us-east-1N. Virginia
us-west-2Oregon
eu-central-1Frankfurt
eu-west-1Ireland
ap-southeast-2Sydney

Other regions may work via Bedrock cross-region inference: tell us your preferred region and we will confirm availability during onboarding.

Note your AWS Account ID:

aws sts get-caller-identity --query Account --output tsv

Step 2: Create the GitHub OIDC Identity Provider

Register GitHub Actions as an OIDC identity provider in your account (once per account; you may already have it if you use GitHub Actions yourself):

aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

If the command reports the provider already exists, that's fine: continue to Step 3. (AWS ignores the thumbprint for publicly trusted certificate authorities such as GitHub's; the value is only required by the CLI syntax.)

Step 3: Create the Deployment Role

Create an IAM role our pipeline assumes. The trust policy restricts it to our repository and your specific environment, so no other GitHub workflow can use it.

Save this as trust-policy.json, replacing <ACCOUNT_ID> and the subject value we provide:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
"token.actions.githubusercontent.com:sub": "<SUBJECT_VALUE_WE_PROVIDE>"
}
}
}
]
}

Then create the role:

aws iam create-role \
--role-name docaifabric-deploy \
--assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
--role-name docaifabric-deploy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess

Note the role ARN (arn:aws:iam::<ACCOUNT_ID>:role/docaifabric-deploy); you'll share it with us.

Shared account? Don't attach AdministratorAccess

In a dedicated account, AdministratorAccess on this role is the simple and safe default: the only resources that will ever exist there are the ones the pipeline creates. In a shared account, ask us for the scoped policy document (ECS, ECR, ElastiCache, EFS/S3, EC2/VPC, ELB, ACM, Secrets Manager, CloudWatch, Bedrock, Textract, and bounded IAM) and a permissions boundary instead.

Step 4: Enable Bedrock Model Access

Anthropic Claude models on Amazon Bedrock require a one-time access activation per account:

  1. Open the Amazon Bedrock console in your chosen region
  2. Go to Model access and request access for Anthropic Claude models (a short use-case form may be required; approval is typically immediate)

Amazon Textract requires no activation. The application itself authenticates to both services through its IAM task role, so there are no AI API keys to create or share.

Step 5: Decide on Networking

By default we create a new VPC (two availability zones, public subnets for the load balancer, private subnets for the application, Redis, and storage). Nothing to do: this is the recommended path.

If you need the application inside an existing VPC (e.g., to reach internal systems or comply with network policy), share instead:

  • VPC ID
  • Two or more private subnet IDs (application, Redis, storage) with outbound access to AWS service endpoints (NAT or VPC endpoints for Bedrock, Textract, and S3)
  • Two or more public subnet IDs (load balancer), or tell us if the application should be internal-only

Step 6: Choose Your Application Hostname

Decide the hostname users will open, e.g. docai.yourcompany.com. During deployment we'll send you two DNS records to create at your DNS provider:

  1. A validation record for the TLS certificate (AWS Certificate Manager)
  2. A CNAME from your hostname to the load balancer

Alternatively, if you delegate a subdomain to a Route 53 hosted zone in the account, we manage both records automatically.


Information to Share With Us

ItemExample
AWS Account ID123456789012
AWS Regioneu-central-1
Deployment role ARNarn:aws:iam::123456789012:role/docaifabric-deploy
Bedrock model accessConfirmation that Anthropic Claude access is enabled (Step 4)
Networking"create a new VPC" (default), or VPC + subnet IDs from Step 5
Application hostnamedocai.yourcompany.com
DNS preferenceYou create two records we send, or Route 53 hosted zone in the account
Document languagesSo we can confirm Textract coverage (see the OCR language note above)

No API keys or secrets need to be shared: the application reaches Bedrock and Textract through its IAM task role, and all application secrets live in Secrets Manager in your AWS account.


Audit and Observability

Everything runs and logs inside your account, so you control access and retention.

What you getWhere it livesWhat it captures
CloudWatch LogsYour accountApplication container logs, queryable with Logs Insights
CloudWatch metrics & alarmsYour accountCPU/memory, request counts, queue health
CloudTrailAccount-level (built in)Every API action, including each deployment performed by our pipeline, visible to you in real time

Default log retention is 30 days. If you have compliance requirements for longer retention, let us know: we can configure extended retention or export to an archive bucket when we deploy.

We recommend granting our team read-only log access via a cross-account IAM role scoped to CloudWatch Logs (the AWS analogue of Azure Lighthouse), so we can diagnose issues quickly without holding credentials or asking you to forward log excerpts. The role is limited to logs and metrics (no access to your documents, secrets, or other resources) and is revocable at any time. If your policy forbids cross-account access, we fall back to log excerpts you share manually, with slower support turnaround.


Next Steps