Julio Rodriguez

Migrating Production Workloads to Kubernetes on AWS EKS

Context

At Ubiquo, our production services ran on static EC2 instances with fixed capacity, serving clients across multiple Central American countries. Traffic spikes meant degraded performance or manual intervention — and we were paying for peak-capacity instances that sat idle most of the day.

I led the migration of these workloads to AWS EKS, designing the cluster architecture, autoscaling strategy, networking layer, and deployment pipelines from scratch.

The Problem I Solved

The existing infrastructure had critical limitations:

  • No horizontal scaling: Fixed EC2 instances couldn’t respond to traffic spikes — peak hours caused degraded response times
  • No self-healing: Crashed processes required manual SSH and restart, often during off-hours
  • Resource waste: Instances provisioned for peak capacity ran at ~15-25% utilization during off-peak
  • Slow deployments: Releasing new versions required SSH access and manual restarts across multiple servers
  • No isolation: Multiple services sharing instances caused noisy-neighbor issues

My Approach

Cluster Architecture

I designed the EKS cluster across 3 Availability Zones with:

  • Managed node groups with instance diversity for cost optimization
  • Karpenter for intelligent node provisioning — selecting the right instance type based on pending pod requirements instead of fixed node group sizes
  • Namespace isolation per environment and product, with resource quotas to prevent runaway workloads

Networking Layer

I implemented a two-layer networking stack for production-grade traffic management:

InternetAWS NLB - Layer 4Nginx Ingress - Layer 7Service AService BService N

Why this architecture:

  • AWS NLB at Layer 4 provides high throughput, low latency, and static IPs for firewall requirements
  • Nginx Ingress Controller handles all Layer 7 routing (host-based, path-based), TLS, and rate limiting
  • This eliminates the need for one ALB per service, centralizing routing configuration in Kubernetes

Dynamic Autoscaling — The Biggest Win

I implemented a three-layer autoscaling strategy that replaced all static capacity:

SQS Queue DepthCPU / Memory MetricsKEDAHPAQueue ProcessorsAPI ServicesKarpenterRight-sized Nodes 0 to 30 pods2 to 20 podsProvisions optimal EC2

1. HPA for API services — scales pods based on CPU/memory thresholds:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

2. KEDA for queue processors — scales pods based on SQS queue depth:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-processor
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/ACCOUNT/queue-name
        queueLength: "5"
        awsRegion: us-east-1

KEDA’s scale-to-zero capability was a game changer: queue processors with no messages consume zero resources, compared to the always-on EC2 instances we had before.

3. Karpenter for nodes — automatically provisions optimal instance types when pods need capacity, and consolidates workloads to terminate underutilized nodes.

Zero-Downtime Deployments

All workloads use rolling updates with strict safety guarantees:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

Combined with readiness/liveness probes and PodDisruptionBudgets for critical services.

Observability

I deployed Prometheus + Grafana for full cluster visibility:

  • Pod resource utilization, HPA/KEDA scaling events, ingress metrics
  • Alerting on pod restart loops, OOMKills, HPA at max replicas, and queue SLA breaches
  • Centralized logging with correlation IDs for distributed tracing

Migration Strategy

I executed the migration in 4 phases to minimize risk:

  1. Internal tools — validated pipelines, autoscaling, and monitoring with non-critical services
  2. Queue processors — moved SQS consumers to KEDA, immediately seeing cost reduction from scale-to-zero
  3. API services — migrated customer-facing APIs with parallel traffic validation before cutover
  4. Critical processes — migrated core platform with dedicated resource quotas and PodDisruptionBudgets

Results

  • Zero downtime during the entire migration — all phases completed with rolling deployments
  • Eliminated idle compute costs — KEDA’s scale-to-zero for async processors removed always-on instances that ran at under 20% utilization
  • Auto-scaling from 2 to 20+ replicas — services now respond to demand in seconds, handling traffic spikes without degradation
  • Deployment time reduced from ~30min (SSH + manual) to ~3min (automated rolling updates via CI pipeline)
  • Self-healing infrastructure — automatic restarts, rescheduling on node failures, and multi-AZ distribution eliminated manual incident response for common failures
  • Right-sized nodes — Karpenter’s intelligent provisioning replaced over-provisioned fixed instances

Key Takeaways

  1. KEDA + HPA complement each other — use HPA for CPU/memory-based APIs and KEDA for queue-driven workers
  2. Karpenter over Cluster Autoscaler — optimal instance selection and workload consolidation provide better cost efficiency
  3. NLB + Nginx Ingress — AWS handles L4 reliability while Nginx handles L7 flexibility
  4. Migrate in phases — start with non-critical workloads to build confidence before touching production APIs
  5. Resource requests are essential — without accurate requests, the scheduler can’t pack pods efficiently

Tools & Technologies

  • AWS EKS — Managed Kubernetes control plane
  • Karpenter — Intelligent node provisioning and consolidation
  • KEDA — Event-driven pod autoscaling (SQS, cron, custom metrics)
  • HPA — Resource-based pod autoscaling (CPU, memory)
  • Nginx Ingress Controller — L7 routing, TLS, rate limiting
  • AWS NLB — L4 load balancing with static IPs
  • Prometheus + Grafana — Metrics collection and dashboards
  • Helm — Templated Kubernetes manifests
  • GitLab CI — Automated build and deployment pipelines
  • AWS SQS and ActiveMQ — Message queuing for async workloads