Skip to main content
Version: 0.15 (unstable)

Guardian AWS Deployment Architecture

This document explains how the Guardian server is deployed on AWS and how the resources in infra/ fit together. It is a map for engineers who already know the application and want to understand the runtime topology, or for operators who need to know which Terraform file owns which AWS resource.

Companion docs:

TL;DR

The stack is a single Fargate service behind an Application Load Balancer, backed by RDS PostgreSQL, with secrets in AWS Secrets Manager and DNS managed through Route 53 and/or Cloudflare. The same Terraform configuration deploys both dev and prod profiles — prod adds autoscaling, RDS Proxy, and storage autoscaling on top of the same base topology.

Topology

Why this shape

  • One service, one image. Guardian is a single Rust binary that exposes HTTP and gRPC from the same process. There is no API gateway, no sidecar. The ALB does layer-7 routing on path so HTTPS clients and gRPC clients share port 443 on the public hostname.
  • State lives in Postgres. Authentication, account state, deltas, proposals, audit logs all persist in RDS. The server itself is stateless; scaling is "add more tasks".
  • Identity lives in Secrets Manager. ACK signing keys (Falcon + ECDSA) used to authenticate Guardian's responses are stored in Secrets Manager in prod and bootstrapped into the container's filesystem keystore at startup. In dev the server auto-generates ephemeral keys.
  • Stage profile toggles capacity, not topology. dev and prod deploy the same set of resource types; prod flips autoscaling on, sizes RDS up, and inserts RDS Proxy between ECS and RDS. There is no separate prod Terraform module.

Request flow

Health checks: the HTTP target group probes GET / (alb.tf:33); the gRPC target group probes /guardian.Guardian/GetPubkey with matcher 0 (alb.tf:55).

Resource inventory

Mapping AWS resources to the Terraform files that own them:

AWS resourceTerraform fileNotes
ECS clusterecs.tf:2Container Insights enabled, ECS Exec logging to CloudWatch.
ECS service guardian serverecs.tf:152Fargate, public IP, two target group attachments when HTTPS is on.
ECS task definitionecs.tf:32One container, ports 3000 + 50051, env + secret env from Secrets Manager.
ECS autoscaling target + policiesecs_autoscaling.tfCPU + memory target-tracking, only created when effective_server_autoscaling_enabled.
ALBalb.tf:2Internet-facing, at least two subnets enforced as precondition.
HTTP target group (:3000)alb.tf:20Health check GET /.
gRPC target group (:50051)alb.tf:39Created only when an ACM cert is present.
HTTP listener :80alb.tf:61Forwards when no cert, redirects to HTTPS when cert is present.
HTTPS listener :443alb.tf:95TLS 1.3-1.2 policy; default action → HTTP target group.
gRPC listener rulealb.tf:110Path /guardian.Guardian/* → gRPC target group, priority 10.
RDS Postgres instancerds.tf:20Storage encrypted, backups retained per rds_backup_retention_days.
RDS subnet grouprds.tf:8Requires ≥2 subnets.
DATABASE_URL secretrds.tf:43Always created; consumed by the server task.
RDS Proxy + credentials secretrds.tf:48, rds.tf:70Prod-only via effective_rds_proxy_enabled.
RDS Proxy target / pool configrds.tf:106, rds.tf:11880% max connections, 50% max idle.
Operator public keys secretoperator_secrets.tfOptional dashboard operator Falcon pubkey list.
ACK Falcon/ECDSA secrets (existing)data.tfLooked up via data in prod; created out-of-band by aws-deploy.sh bootstrap-ack-keys.
EVM allowed chains + RPC URLs secretsdata.tfOptional; populated by deploy script from config/evm/chains.json.
ALB SGsecurity_groups.tf:2Ingress 80/443 from alb_ingress_cidrs.
Server SGsecurity_groups.tf:35Ingress 3000/50051 only from ALB SG; egress all.
RDS Proxy SGsecurity_groups.tf:67Prod-only; ingress 5432 from server SG.
Postgres SGsecurity_groups.tf:92Ingress 5432 from server SG, plus RDS Proxy SG in prod.
ECS task execution roleiam.tf:2Pulls images, reads DB / EVM secrets at task start.
ECS task runtime roleiam.tf:53App-level GetSecretValue for ACK + operator secrets; SSM channels for ECS Exec.
ACK secrets policyiam.tf:70Gated on local.is_prod — dev never reads ACK secrets.
Operator pubkeys policyiam.tf:93Created if user supplies an existing ARN or a managed list.
RDS Proxy roleiam.tf:136Reads the proxy's credentials secret.
CloudWatch log groupslogs.tfserver group and cluster (ECS Exec) group.
Route 53 aliasdns.tf:12Created when route53_zone_id is set.
Cloudflare CNAMEdns.tf:27Created when cloudflare_zone_id is set; can be proxied.
Variables / localsvariables.tf, data.tflocal.is_prod, effective_* locals derive the stage profile.

ECR is not managed by Terraform — it is created and pushed to by scripts/aws-deploy.sh before terraform apply.

Stage profiles

Both stages run from the same Terraform; deployment_stage flips a small set of effective_* locals.

Concrete defaults are in infra/README.md and docs/SERVER_AWS_DEPLOY.md.

Identity and secrets

Five categories of secret participate in a deploy:

  1. DATABASE_URL — written by Terraform from RDS connection details (rds.tf:55). Server task reads it via the execution role at task start; injected as the DATABASE_URL env var (ecs.tf:121).
  2. RDS Proxy credentials (prod) — separate JSON secret consumed by the proxy's IAM role (rds.tf:60, iam.tf:155).
  3. ACK signing keys (prod) — Falcon + ECDSA secret keys for Guardian's own response signing. Created out-of-band by aws-deploy.sh bootstrap-ack-keys, referenced via data blocks, read by the runtime role (iam.tf:70) and imported into the filesystem keystore at process start.
  4. Operator public keys — Falcon public keys allowed to authenticate to the dashboard. Either Terraform-managed (from a variable list) or an existing ARN; either way exposed as GUARDIAN_OPERATOR_PUBLIC_KEYS_SECRET_ID to the task (ecs.tf:100).
  5. EVM allowed chains + RPC URLs — Secrets Manager entries optionally populated from config/evm/chains.json, exposed to the task as GUARDIAN_EVM_ALLOWED_CHAIN_IDS and GUARDIAN_EVM_RPC_URLS (ecs.tf:125).

The IAM split is deliberate:

  • The execution role only reads secrets the AWS-ECS agent needs before the container starts (DB URL, EVM secrets surfaced as env).
  • The task role owns secret reads the application performs at runtime (ACK keys, operator pubkeys) and SSM channels for ECS Exec.

Networking

The stack uses the default VPC by default and picks subnets from data.tf unless vpc_id / subnet_ids are set. Public-subnet selection for the ALB sorts subnets lexicographically, which can surface a private subnet ahead of a public one if subnet naming differs across AZs — pin subnet_ids explicitly when you hit AZ surprises. RDS Proxy additionally requires explicit opt-in for unsupported AZs via rds_proxy_subnet_ids (e.g. us-east-1e / use1-az3 in us-east-1).

Security-group chain:

internet → alb SG (80/443) → server SG (3000, 50051)
server SG ─┬─→ postgres SG (5432) (dev path)
└─→ rds_proxy SG (5432) → postgres SG (5432) (prod path)

Egress is wide-open from the server SG so the task can pull images, talk to Secrets Manager, talk to Miden RPC, and emit CloudWatch logs.

Deploy lifecycle

State is kept locally per stack+stage at infra/terraform.<stack>.<stage>.tfstate. There is no remote backend configured; the deploy script is the source of truth for which state file is in use.

Observability surface

Today: CloudWatch container logs (logs.tf), Container Insights metrics (ecs.tf:6), and ECS Exec session logging (ecs.tf:10). There are no Terraform-managed dashboards, alarms, or tracing exporters yet — that remains an open production-hardening gap.

Things that are deliberately not here

  • No remote Terraform backend. State files are local; the deploy script treats them as authoritative. Switch to S3+DynamoDB before multiple operators apply concurrently.
  • No WAF, no Shield Advanced. The ALB is reachable from alb_ingress_cidrs, default 0.0.0.0/0.
  • No RDS read replica, no automated DR drill. Backups are configured via rds_backup_retention_days (default 7) but there is no rehearsed, automated DR path.
  • No KMS-managed Secrets Manager keys. Secrets use the default AWS-owned key. Rotation is manual.