Proxy on-prem Terraform deployment (AWS)

This guide explains how to deploy Espresso AI's Proxy Service on your AWS infrastructure with Terraform.

You can deploy either:

  1. In a dedicated VPC that Terraform creates.

  2. In an existing VPC that you provide.

Prerequisites

  • Access to an AWS account with IAM permissions for VPC, EKS, IAM roles, EC2/load balancers, Route53 (if used), and Secrets Manager (if used).

  • In the Espresso AI dashboardarrow-up-right, go to Proxy Onboarding and:

    • Enter your AWS account ID so we can grant ECR access for the Proxy image.

    • Copy your customer name.

    • Generate an API key for Espresso API authentication.

What this module creates

  • VPC (optional) or uses your existing VPC/subnets.

  • EKS cluster and node group.

  • Karpenter for node autoscaling.

  • AWS Load Balancer Controller.

  • Proxy deployment, service, and HPA in Kubernetes.

  • Optional Route53 record.

  • Optional managed API key flow via AWS Secrets Manager + External Secrets.

Example usage

Dedicated VPC + managed secret + DNS

Existing VPC + bring-your-own Kubernetes secret

Argument reference

Top-level arguments

  • region: Required. AWS region for deployment.

  • customer: Required. Customer identifier used in naming and API_URL suffixing.

  • create_dedicated_vpc: Optional. Creates dedicated VPC (true) or uses existing VPC (false). Default: true.

  • vpc_config: Optional/conditional. Required when create_dedicated_vpc = true.

  • existing_vpc_config: Optional/conditional. Required when create_dedicated_vpc = false.

  • eks_config: Optional. EKS cluster and node group settings.

  • karpenter_config: Optional. Karpenter NodePool tuning.

  • proxy_config: Required. Proxy runtime configuration.

  • proxy_api_key_value: Optional/conditional, sensitive. Required when proxy_config.api_key_secret_mode = MANAGED_AWS_SECRETS_MANAGER.

  • alb_config: Optional. ALB ingress configuration.

  • dns_config: Optional. Route53 alias record configuration.

  • autoscaling_config: Optional. Proxy HPA configuration.

  • tags: Optional. Additional AWS tags. Default: {}.

vpc_config

  • vpc_name: Optional. Default: espresso-ai-proxy-vpc.

  • cidr: Required in dedicated VPC mode.

  • public_subnet_cidrs: Required in dedicated VPC mode.

  • private_subnet_cidrs: Required in dedicated VPC mode.

  • availability_zones: Required in dedicated VPC mode and must align with subnet counts.

existing_vpc_config

  • vpc_id: Required in existing VPC mode.

  • private_subnet_ids: Required in existing VPC mode.

  • public_subnet_ids: Optional. Default: [].

eks_config

  • cluster_name: Optional. Default: espresso-ai-proxy.

  • cluster_version: Optional. Default: 1.35.

  • bootstrap_self_managed_addons: Optional. Default: false.

  • cluster_endpoint_public_access: Optional. Default: true.

  • cluster_endpoint_private_access: Optional. Default: true.

  • cluster_endpoint_public_access_cidrs: Required when public endpoint access is enabled.

  • create_cloudwatch_log_group: Optional. Default: false.

  • cloudwatch_log_group_retention_in_days: Optional. Default: 90.

  • instance_types: Optional. Default: ["c8i.2xlarge", "c8i.4xlarge"].

  • node_group_min_size: Optional. Default: 2.

  • node_group_desired_size: Optional. Default: 2.

  • node_group_max_size: Optional. Default: 10.

karpenter_config

  • instance_types: Optional. Default: ["c8i.2xlarge", "c8i.4xlarge"].

  • capacity_types: Optional. Default: ["on-demand"].

  • cpu_limit: Optional. Default: 64.

  • memory_limit: Optional. Default: 256Gi.

  • node_cap: Optional. Default: 10.

proxy_config

  • image: Required. Proxy container image URI.

  • replicas: Optional. Default: 2.

  • proxy_host: Required. Non-empty value injected as PROXY_HOST.

  • otel_exporter_otlp_endpoint: Optional. Injected as OTEL_EXPORTER_OTLP_ENDPOINT. Default: https://metrics.espressocomputing.com:443.

  • api_key_secret_name: Optional. Kubernetes secret name for API key injection. Default: espresso-ai.

  • API key secret key name is fixed to ESPRESSO_AI_API_KEY and is not configurable.

  • api_key_secret_mode: Optional. BYO_K8S_SECRET or MANAGED_AWS_SECRETS_MANAGER. Default: BYO_K8S_SECRET.

  • api_key_aws_secret_name: Optional. AWS Secrets Manager secret name used in managed mode. Default: /espresso-ai/proxy/api-key.

  • api_url: Optional. Base URL. Default: https://api.espressocomputing.com:25831.

  • env_vars: Optional. Map of environment variable key/value pairs. Currently supported keys:

    key
    type
    definition

    EXCLUDE_QUERY_TEXT

    bool

    Default: false. Whether to exclude query text on requests to Espresso AI's API. Note: Enabling this will limit supported functionality.

alb_config

  • enable_ingress: Optional. Enables ALB ingress. Default: true.

  • certificate_arn: Required when ingress is enabled.

  • ingress_host: Optional. Host rule.

  • scheme: Optional. internet-facing or internal. Default: internet-facing.

dns_config

  • create_record: Optional. Creates Route53 alias. Default: false.

  • zone_id: Required when create_record = true.

  • record_name: Optional. Falls back to ingress host if omitted.

autoscaling_config

  • min_replicas: Optional. Default: 2.

  • max_replicas: Optional. Default: 10.

  • target_cpu_utilization: Optional. Default: 70.

Secret modes

  • BYO_K8S_SECRET (default): Proxy reads from an existing Kubernetes secret (api_key_secret_name) using fixed key ESPRESSO_AI_API_KEY.

  • MANAGED_AWS_SECRETS_MANAGER: Module provisions AWS Secrets Manager secret, IRSA, External Secrets Operator, and syncs to Kubernetes secret.

Outputs

The module exports:

  • vpc_id

  • public_subnet_ids

  • private_subnet_ids

  • eks_cluster_name

  • eks_cluster_endpoint

  • eks_cluster_security_group_id

  • proxy_namespace

  • proxy_service_name

  • proxy_service_load_balancer_hostname

  • proxy_ingress_load_balancer_hostname

  • proxy_hpa_name

  • proxy_dns_fqdn

How to deploy

Deployment typically takes around 20-30 minutes.

Best practices

  • Manage sensitive variables via environment variables or .tfvars.

Version Migrations

The v0.1.0 → v0.2.0 change is a path-only refactor: the AWS configuration moved from the repo root into an aws/ subdirectory, so the source URL needs //aws. No resource addresses changed, so existing state continues to apply cleanly.

plan should report zero changes. If it shows any resource being destroyed, recreated, or replaced, stop and investigate before running apply.

Last updated