Migrating Production Workloads to Kubernetes on AWS EKS

Context

At Ubiquo, our production services ran on static EC2 instances with fixed capacity, serving clients across multiple Central American countries. Traffic spikes meant degraded performance or manual intervention — and we were paying for peak-capacity instances that sat idle most of the day.

I led the migration of these workloads to AWS EKS, designing the cluster architecture, autoscaling strategy, networking layer, and deployment pipelines from scratch.

The Problem I Solved

The existing infrastructure had critical limitations:

No horizontal scaling: Fixed EC2 instances couldn’t respond to traffic spikes — peak hours caused degraded response times
No self-healing: Crashed processes required manual SSH and restart, often during off-hours
Resource waste: Instances provisioned for peak capacity ran at ~15-25% utilization during off-peak
Slow deployments: Releasing new versions required SSH access and manual restarts across multiple servers
No isolation: Multiple services sharing instances caused noisy-neighbor issues

My Approach

Cluster Architecture

I designed the EKS cluster across 3 Availability Zones with:

Managed node groups with instance diversity for cost optimization
Karpenter for intelligent node provisioning — selecting the right instance type based on pending pod requirements instead of fixed node group sizes
Namespace isolation per environment and product, with resource quotas to prevent runaway workloads

Networking Layer

I implemented a two-layer networking stack for production-grade traffic management:

Why this architecture:

AWS NLB at Layer 4 provides high throughput, low latency, and static IPs for firewall requirements
Nginx Ingress Controller handles all Layer 7 routing (host-based, path-based), TLS, and rate limiting
This eliminates the need for one ALB per service, centralizing routing configuration in Kubernetes

Dynamic Autoscaling — The Biggest Win

I implemented a three-layer autoscaling strategy that replaced all static capacity:

1. HPA for API services — scales pods based on CPU/memory thresholds:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

2. KEDA for queue processors — scales pods based on SQS queue depth:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-processor
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/ACCOUNT/queue-name
        queueLength: "5"
        awsRegion: us-east-1

KEDA’s scale-to-zero capability was a game changer: queue processors with no messages consume zero resources, compared to the always-on EC2 instances we had before.

3. Karpenter for nodes — automatically provisions optimal instance types when pods need capacity, and consolidates workloads to terminate underutilized nodes.

Zero-Downtime Deployments

All workloads use rolling updates with strict safety guarantees:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

Combined with readiness/liveness probes and PodDisruptionBudgets for critical services.

Observability

I deployed Prometheus + Grafana for full cluster visibility:

Pod resource utilization, HPA/KEDA scaling events, ingress metrics
Alerting on pod restart loops, OOMKills, HPA at max replicas, and queue SLA breaches
Centralized logging with correlation IDs for distributed tracing

Migration Strategy

I executed the migration in 4 phases to minimize risk:

Internal tools — validated pipelines, autoscaling, and monitoring with non-critical services
Queue processors — moved SQS consumers to KEDA, immediately seeing cost reduction from scale-to-zero
API services — migrated customer-facing APIs with parallel traffic validation before cutover
Critical processes — migrated core platform with dedicated resource quotas and PodDisruptionBudgets

Results

Zero downtime during the entire migration — all phases completed with rolling deployments
Eliminated idle compute costs — KEDA’s scale-to-zero for async processors removed always-on instances that ran at under 20% utilization
Auto-scaling from 2 to 20+ replicas — services now respond to demand in seconds, handling traffic spikes without degradation
Deployment time reduced from ~30min (SSH + manual) to ~3min (automated rolling updates via CI pipeline)
Self-healing infrastructure — automatic restarts, rescheduling on node failures, and multi-AZ distribution eliminated manual incident response for common failures
Right-sized nodes — Karpenter’s intelligent provisioning replaced over-provisioned fixed instances

Key Takeaways

KEDA + HPA complement each other — use HPA for CPU/memory-based APIs and KEDA for queue-driven workers
Karpenter over Cluster Autoscaler — optimal instance selection and workload consolidation provide better cost efficiency
NLB + Nginx Ingress — AWS handles L4 reliability while Nginx handles L7 flexibility
Migrate in phases — start with non-critical workloads to build confidence before touching production APIs
Resource requests are essential — without accurate requests, the scheduler can’t pack pods efficiently

Tools & Technologies

AWS EKS — Managed Kubernetes control plane
Karpenter — Intelligent node provisioning and consolidation
KEDA — Event-driven pod autoscaling (SQS, cron, custom metrics)
HPA — Resource-based pod autoscaling (CPU, memory)
Nginx Ingress Controller — L7 routing, TLS, rate limiting
AWS NLB — L4 load balancing with static IPs
Prometheus + Grafana — Metrics collection and dashboards
Helm — Templated Kubernetes manifests
GitLab CI — Automated build and deployment pipelines
AWS SQS and ActiveMQ — Message queuing for async workloads