# Proxy on-prem Terraform deployment (AWS)

This guide explains how to deploy Espresso AI's Proxy Service on your AWS infrastructure with Terraform.

You can deploy either:

1. In a dedicated VPC that Terraform creates.
2. In an existing VPC that you provide.

## Prerequisites

* Access to an AWS account with IAM permissions for VPC, EKS, IAM roles, EC2/load balancers, Route53 (if used), and Secrets Manager (if used).
* In the [Espresso AI dashboard](https://dashboard.espressocomputing.com/), go to `Proxy Onboarding` and:
  * Enter your AWS account ID so we can grant ECR access for the Proxy image.
  * Copy your customer name.
  * Copy Espresso AI's AWS Account ID. This is needed for the ECR url.
  * Generate an API key for Espresso API authentication.

## What this module creates

* VPC (optional) or uses your existing VPC/subnets.
* EKS cluster and node group.
* Karpenter for node autoscaling.
* AWS Load Balancer Controller.
* Proxy deployment, service, and HPA in Kubernetes.
* Optional Route53 record.
* Optional managed API key flow via AWS Secrets Manager + External Secrets.

## Example usage

### Dedicated VPC + managed secret + DNS

```hcl
variable "proxy_api_key_value" {
  description = "Managed proxy API key value for Secrets Manager sync."
  type        = string
  sensitive   = true
}

module "proxy_on_prem" {
  source = "github.com/espressocomputing/espresso-ai-proxy-tf//aws?ref=v0.4.0"

  region   = "us-east-1"
  customer = "<Value from Espresso AI dashboard>"

  create_dedicated_vpc = true
  vpc_config = {
    cidr                 = "10.80.0.0/16"
    public_subnet_cidrs  = ["10.80.0.0/20", "10.80.16.0/20"]
    private_subnet_cidrs = ["10.80.32.0/20", "10.80.48.0/20"]
    availability_zones   = ["us-east-1a", "us-east-1b"]
  }

  eks_config = {
    cluster_endpoint_public_access       = true
    cluster_endpoint_public_access_cidrs = ["203.0.113.10/32"]
  }

  proxy_config = {
    repository          = "<Espresso AI's AWS Account ID>.dkr.ecr.us-east-1.amazonaws.com/proxy"
    image               = "0.1-dev-91e316fa12478ad0ae77aa320ff60e6ab63627131a914bb2f5c26ef2579b99b4"
    proxy_host          = "proxy.customer.example.com"
    api_key_secret_mode = "MANAGED_AWS_SECRETS_MANAGER"
  }

  proxy_api_key_value = var.proxy_api_key_value

  alb_config = {
    certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/11111111-2222-3333-4444-555555555555"
    ingress_host    = "proxy.customer.example.com"
  }

  dns_config = {
    create_record = true
    zone_id       = "Z123EXAMPLE456"
    record_name   = "proxy.customer.example.com"
  }
}
```

### Existing VPC + bring-your-own Kubernetes secret

```hcl
module "proxy_on_prem" {
  source = "github.com/espressocomputing/espresso-ai-proxy-tf//aws?ref=v0.4.0"

  region   = "us-east-1"
  customer = "<Value from Espresso AI dashboard>"

  create_dedicated_vpc = false
  existing_vpc_config = {
    vpc_id             = "vpc-0123456789abcdef0"
    private_subnet_ids = ["subnet-01aaaa", "subnet-02bbbb"]
    public_subnet_ids  = ["subnet-03cccc", "subnet-04dddd"]
  }

  eks_config = {
    cluster_endpoint_public_access       = true
    cluster_endpoint_public_access_cidrs = ["203.0.113.10/32"]
  }

  proxy_config = {
    repository          = "<Espresso AI's AWS Account ID>.dkr.ecr.us-east-1.amazonaws.com/proxy"
    image               = "0.1-dev-91e316fa12478ad0ae77aa320ff60e6ab63627131a914bb2f5c26ef2579b99b4"
    proxy_host          = "proxy.customer.example.com"
    api_key_secret_mode = "BYO_K8S_SECRET"
    api_key_secret_name = "espresso-ai"
  }

  alb_config = {
    certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/11111111-2222-3333-4444-555555555555"
    ingress_host    = "proxy.customer.example.com"
  }
}
```

## Argument reference

### Top-level arguments

* `region`: Required. AWS region for deployment.
* `customer`: Required. Customer identifier used in naming and `API_URL` suffixing.
* `create_dedicated_vpc`: Optional. Creates dedicated VPC (`true`) or uses existing VPC (`false`). Default: `true`.
* `vpc_config`: Optional/conditional. Required when `create_dedicated_vpc = true`.
* `existing_vpc_config`: Optional/conditional. Required when `create_dedicated_vpc = false`.
* `eks_config`: Optional. EKS cluster and node group settings.
* `karpenter_config`: Optional. Karpenter NodePool tuning.
* `proxy_config`: Required. Proxy runtime configuration.
* `proxy_api_key_value`: Optional/conditional, sensitive. Required when `proxy_config.api_key_secret_mode = MANAGED_AWS_SECRETS_MANAGER`.
* `alb_config`: Optional. ALB ingress configuration.
* `dns_config`: Optional. Route53 alias record configuration.
* `autoscaling_config`: Optional. Proxy HPA configuration.
* `tags`: Optional. Additional AWS tags. Default: `{}`.

### `vpc_config`

* `vpc_name`: Optional. Default: `espresso-ai-proxy-vpc`.
* `cidr`: Required in dedicated VPC mode.
* `public_subnet_cidrs`: Required in dedicated VPC mode.
* `private_subnet_cidrs`: Required in dedicated VPC mode.
* `availability_zones`: Required in dedicated VPC mode and must align with subnet counts.

### `existing_vpc_config`

* `vpc_id`: Required in existing VPC mode.
* `private_subnet_ids`: Required in existing VPC mode.
* `public_subnet_ids`: Optional. Default: `[]`.

### `eks_config`

* `cluster_name`: Optional. Default: `espresso-ai-proxy`.
* `cluster_version`: Optional. Default: `1.35`.
* `bootstrap_self_managed_addons`: Optional. Default: `false`.
* `cluster_endpoint_public_access`: Optional. Default: `true`.
* `cluster_endpoint_private_access`: Optional. Default: `true`.
* `cluster_endpoint_public_access_cidrs`: Required when public endpoint access is enabled.
* `create_cloudwatch_log_group`: Optional. Default: `false`.
* `cloudwatch_log_group_retention_in_days`: Optional. Default: `90`.
* `instance_types`: Optional. Default: `["c8i.2xlarge", "c8i.4xlarge"]`.
* `node_group_min_size`: Optional. Default: `2`.
* `node_group_desired_size`: Optional. Default: `2`.
* `node_group_max_size`: Optional. Default: `10`.

### `karpenter_config`

* `instance_types`: Optional. Default: `["c8i.2xlarge", "c8i.4xlarge"]`.
* `capacity_types`: Optional. Default: `["on-demand"]`.
* `cpu_limit`: Optional. Default: `64`.
* `memory_limit`: Optional. Default: `256Gi`.
* `node_cap`: Optional. Default: `10`.

### `proxy_config`

* `image`: Required. Proxy container image URI in Espresso AI's ECR.
* `replicas`: Optional. Default: `2`.
* `proxy_host`: Required. Non-empty value injected as `PROXY_HOST`.
* `otel_collector`: Optional. OTEL Collector sidecar configuration. See [`otel_collector`](#otel_collector) below.
* `api_key_secret_name`: Optional. Kubernetes secret name for API key injection. Default: `espresso-ai`.
* API key secret key name is fixed to `ESPRESSO_AI_API_KEY` and is not configurable.
* `api_key_secret_mode`: Optional. `BYO_K8S_SECRET` or `MANAGED_AWS_SECRETS_MANAGER`. Default: `BYO_K8S_SECRET`.
* `api_key_aws_secret_name`: Optional. AWS Secrets Manager secret name used in managed mode. Default: `/espresso-ai/proxy/api-key`.
* `api_url`: Optional. Base URL. Default: `https://api.espressocomputing.com:25831`.
* `env_vars`: Optional. Map of environment variable key/value pairs. Currently supported keys:

  | key                  | type   | definition                                                                                                                                  |
  | -------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------- |
  | `EXCLUDE_QUERY_TEXT` | `bool` | Default: `false`. Whether to exclude query text on requests to Espresso AI's API. *Note: Enabling this will limit supported functionality.* |

### `otel_collector`

Nested object under `proxy_config`. When `enabled = true`, the module deploys an OpenTelemetry Collector sidecar in the proxy pod and renders a ConfigMap with its pipeline. The proxy is automatically pointed at `http://localhost:4318`; the sidecar forwards to your-owned OTLP backend.

* `enabled`: Optional. Default: `true`. Set to `false` to disable the sidecar; the proxy will then emit OTLP directly to `otel_exporter_otlp_endpoint`.
* `image`: Optional. Full image reference for the collector container. Default: `otel/opentelemetry-collector-contrib:0.152.0`.
* `customer_endpoint`: Optional. OTLP endpoint for the customer's own observability backend. Leave empty (default) to disable the customer exporter; the Espresso pipeline still runs.
* `customer_protocol`: Optional. `grpc` (renders the `otlp/customer` exporter) or `http` (renders `otlphttp/customer`). Default: `grpc`.
* `customer_signals`: Optional. Signals to mirror to the customer exporter. Any subset of `traces`, `metrics`, `logs`. Default: all three.
* `customer_auth_secret_name`: Optional. Existing Kubernetes Secret in the proxy namespace whose value is mounted as `CUSTOMER_OTLP_AUTH` and sent as the customer exporter's `Authorization` header. Leave empty for unauthenticated endpoints.
* `customer_auth_secret_key`: Optional. Key within `customer_auth_secret_name`. Default: `authorization`.
* `customer_tls_insecure`: Optional. Disable TLS verification on the customer exporter. Default: `false`.

Example — also mirror traces and metrics (not logs) to the customer's own OTLP backend with bearer-token auth:

```hcl
proxy_config = {
  repository = "<Espresso AI's AWS Account ID>.dkr.ecr.us-east-1.amazonaws.com/proxy"
  image      = "0.1-dev-..."
  proxy_host = "proxy.customer.example.com"

  otel_collector = {
    customer_endpoint         = "https://otlp.observability.customer.example.com:4317"
    customer_protocol         = "grpc"
    customer_signals          = ["traces", "metrics", "logs"]
    customer_auth_secret_name = "customer-otlp-auth"
  }
}
```

The `customer-otlp-auth` Secret must exist in the `proxy` namespace and contain an `authorization` key whose value is the full header (e.g. `Bearer eyJ...`).

For the full list of metrics, spans, and resource attributes the proxy emits — useful for building dashboards and alerts against the customer exporter — see [Proxy telemetry reference](/snowflake-optimizer/proxy-onboarding/proxy-telemetry-reference.md).

### `alb_config`

* `enable_ingress`: Optional. Enables ALB ingress. Default: `true`.
* `certificate_arn`: Required when ingress is enabled.
* `ingress_host`: Optional. Host rule.
* `scheme`: Optional. `internet-facing` or `internal`. Default: `internet-facing`.

### `dns_config`

* `create_record`: Optional. Creates Route53 alias. Default: `false`.
* `zone_id`: Required when `create_record = true`.
* `record_name`: Optional. Falls back to ingress host if omitted.

### `autoscaling_config`

* `min_replicas`: Optional. Default: `2`.
* `max_replicas`: Optional. Default: `10`.
* `target_cpu_utilization`: Optional. Default: `70`.

## Secret modes

* `BYO_K8S_SECRET` (default): Proxy reads from an existing Kubernetes secret (`api_key_secret_name`) using fixed key `ESPRESSO_AI_API_KEY`.
* `MANAGED_AWS_SECRETS_MANAGER`: Module provisions AWS Secrets Manager secret, IRSA, External Secrets Operator, and syncs to Kubernetes secret.

## Outputs

The module exports:

* `vpc_id`
* `public_subnet_ids`
* `private_subnet_ids`
* `eks_cluster_name`
* `eks_cluster_endpoint`
* `eks_cluster_security_group_id`
* `proxy_namespace`
* `proxy_service_name`
* `proxy_service_load_balancer_hostname`
* `proxy_ingress_load_balancer_hostname`
* `proxy_hpa_name`
* `proxy_dns_fqdn`

## How to deploy

Deployment typically takes around 20-30 minutes.

```bash
terraform init
terraform plan
terraform apply
```

## Best practices

* Manage sensitive variables via environment variables or `.tfvars`.

## Version Migrations

The v0.1.0 → v0.2.0 change is a path-only refactor: the AWS configuration moved from the repo root into an `aws/` subdirectory, so the source URL needs `//aws`. No resource addresses changed, so existing state continues to apply cleanly.

```bash
module "proxy_on_prem" {
  source = "github.com/espressocomputing/espresso-ai-proxy-tf//aws?ref=v0.2.0"
  #                                                       ^^^^^ new
  ...
}
```

```bash
terraform init -upgrade
terraform plan
terraform apply
```

`plan` should report zero changes. If it shows any resource being destroyed, recreated, or replaced, stop and investigate before running `apply`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.espresso.ai/snowflake-optimizer/proxy-onboarding/proxy-onboarding-terraform-deployment-aws.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
