AI Infrastructure Engineer

Ade
Daramola

I architect the foundational layers that make AI systems reliable at enterprise scale. Not prototypes — production systems running on AWS.

// 17+ years in production infrastructure · Cloud · Kubernetes · LLM systems · GitOps

See My Work GitHub ↗ Get in Touch
Scroll
The Story

Infrastructure
first. Always.

I started in 2007 managing Oracle databases for defense clients — environments where a bad query plan or a failed backup had real, sometimes irreversible consequences. That's not where most cloud engineers start, and it shaped how I think about systems in ways that are hard to unlearn.

From there I spent close to a decade at Viper Technology building and operating enterprise AWS infrastructure for government and commercial workloads. VPCs, IAM, EC2 fleets, RDS clusters at scale. I wasn't reading about these things — I was the person on call when they broke.

At Zolon Tech I moved into senior infrastructure architecture. Multi-account AWS landing zones, EKS clusters, GitOps delivery pipelines, zero-downtime cloud migrations. The work became more complex and the stakes were higher, but the discipline was the same: build it so it doesn't break, and when it does, fix it faster than anyone notices.

The AI infrastructure work I do now isn't a career change — it's the same foundation applied to a harder problem. LLM systems have all the failure modes of distributed systems, plus a whole new class of problems that most engineers haven't seen yet. Production scars help.

17+
Years in Production
3
AI Infra Projects Shipped
4
Industry Certifications
AWS
Primary Cloud Platform

Featured Projects

Built to solve
problems that exist

Project 01
GitOps Sentinel
AIOps · Kubernetes Remediation
The real-world problem: In 2021, a misconfigured BGP announcement at Facebook took down Instagram, WhatsApp, and Facebook itself for six hours. In 2023, a bad Kubernetes config change at a fintech startup cascaded into a 14-hour incident. The pattern is the same every time — a config change hits production, Prometheus fires alerts at 3am, the on-call engineer is asleep, and by the time anyone is online the blast radius has grown. The question isn't whether this will happen to your cluster. It's whether your system can respond before a human has to.

GitOps Sentinel is an autonomous remediation platform for Kubernetes clusters. When Alertmanager fires a webhook, Sentinel doesn't just page on-call — it ingests the signal through an HMAC-validated API Gateway, deduplicates via DynamoDB, bundles Prometheus metrics and k8s events into S3, and routes the incident through a Step Functions multi-agent pipeline: Classifier → Root Cause → Action Planner → Confidence Scorer. Every remediation is a Git commit. Argo CD detects the merged PR and syncs the cluster. The cluster only changes through the same reviewed write path a human engineer would use.

The design comes from a real operational insight: not all incidents are equal, and the action shouldn't be either. Sentinel gates every remediation on a deterministic confidence score with three explicit routes — ≥80 auto-applies the PR, 40–79 opens a PR for engineer review, and <40 escalates to on-call with no automated change. Routine incidents (OOMKilled pods, replica drift, known bad image tags) resolve in seconds. Novel or high-risk scenarios escalate before anything touches the cluster. Five minutes after every remediation, an Outcome Validator queries Prometheus to confirm the fix held — and if it didn't, it opens an automatic revert PR. The cluster is never worse off than it was before Sentinel ran.

This is the kind of system that pays for itself the first time it silently fixes something at 4am that would have been a two-hour incident.

Python AWS Lambda Step Functions EventBridge Terraform Argo CD Kubernetes / EKS Amazon Bedrock API Gateway · DynamoDB · S3 Prometheus · Grafana OPA Gatekeeper
Project 02
Stratum
Production RAG · Citation-Grounded Generation
The real-world problem: Most RAG systems shipped to production are demos with auth bolted on. They embed documents, do top-k similarity, dump chunks into a prompt, and call it retrieval. Then a user asks something the system doesn't actually know, and the LLM confidently invents an answer with no source attribution — and the team only learns about it after a customer escalation. Pure dense retrieval misses keyword-exact queries. Pure BM25 misses semantic ones. Without reranking, the wrong chunks dominate the context window. Without citation enforcement at generation time, there is no audit trail. The pattern is a system that works on the demo dataset and fails the moment real documents and real queries hit it.

Stratum is a domain-specific RAG engine designed around the assumption that retrieval quality and answer attribution are the actual problems — not the LLM. Queries flow through a hybrid retriever that runs BM25 sparse search and dense ANN in parallel, fuses results with Reciprocal Rank Fusion (RRF, K=60), expands matched child chunks back to their parent context, and reranks with a cross-encoder (ms-marco-MiniLM) before any chunk reaches the LLM. The fusion is parameter-free by design — robust to miscalibrated retrievers in a way that learned weights aren't, until you have enough labeled data to justify training them.

Generation is where most RAG systems quietly fail. Stratum's generator builds a context block with explicit [src N] markers, calls Claude with a citation-enforcing prompt, and parses and validates every citation marker before returning. If the model produces an answer without grounding it in the retrieved sources, the response is rejected at the generator layer — not surfaced to the user as a "best effort."

The infrastructure is dual-backend by design. Local development runs against in-process Chroma with zero Docker. Production runs against Weaviate 1.27 with HNSW tuning and gRPC, deployed on AWS via Terraform: ALB fronting EC2 instances for the FastAPI service and Streamlit UI, a separate node for Weaviate on a 20GB EBS volume, and S3 for raw document storage. Evaluation runs weekly via DeepEval against a golden dataset using a local Ollama judge — zero API cost on the eval gate.

Python FastAPI Weaviate · Chroma Terraform Claude API BM25 + Dense Hybrid Retrieval Cross-Encoder Reranking BGE / OpenAI Embeddings EC2 · ALB · S3 Streamlit · DeepEval
Project 03
LLM Specialization Platform
Fine-Tuning Infrastructure · Domain Adaptation
The real-world problem: A general-purpose frontier model is a fine starting point and a poor finishing point for any serious vertical. Legal, medical, financial, and engineering teams hit the same wall: GPT-4 or Claude understands the language of the domain but doesn't reliably produce outputs that match the domain's structure, tone, or compliance requirements. The standard answer — fine-tune your own model — sounds simple and isn't. Teams end up writing one-off training scripts in notebooks, lose track of which dataset version produced which checkpoint, can't reproduce results six weeks later, and discover at deployment time that their LoRA adapter was trained at a different precision than the inference runtime expects. The infrastructure problem is real: how do you make domain specialization a repeatable engineering process instead of a research project?

The LLM Specialization Platform is the training and evaluation backbone for turning general-purpose foundation models into domain-specific specialists. The pipeline is built around the HuggingFace stack — transformers, PEFT for parameter-efficient fine-tuning (LoRA / QLoRA), TRL for preference optimization (DPO / RLHF), and Accelerate for multi-GPU orchestration — with bitsandbytes quantization so a 13B-parameter base model can be specialized on a single A100 instead of a fleet.

The infrastructure decisions matter more than the choice of optimizer. Every training run is configuration-driven via Pydantic and OmegaConf — no notebooks, no hardcoded paths, no "it worked on my machine." Datasets are versioned and pulled from S3 via boto3 and s3fs. Experiments are tracked end-to-end through Weights & Biases and MLflow, so six weeks later you can answer the question that always gets asked at deployment time: which dataset version, which hyperparameters, and which base model produced this adapter?

Output quality is enforced at inference time, not just at eval. The platform integrates Outlines and Instructor for constrained decoding, so specialized models produce schema-valid JSON, function calls, or structured reports by construction — not by hoping the model behaves. Evaluation runs through scikit-learn, ROUGE, and JSON-schema validators against held-out golden sets, with results logged to the same experiment tracker as training. The whole thing is containerized, reproducible, and pinned to exact dependency versions — the same discipline I apply to production cloud infrastructure, applied to model training.

Python · PyTorch HuggingFace Transformers PEFT · LoRA / QLoRA TRL · DPO / RLHF Accelerate bitsandbytes Outlines · Instructor Weights & Biases · MLflow AWS S3 · boto3 · s3fs Docker · pytest

Career Arc

From Oracle DBA
to AI infrastructure

Nov 2020 – Present
Lead Cloud Engineer
Zolon Tech Inc.

At Zolon I moved from building infrastructure into designing it. The shift mattered. My mandate wasn't to keep the lights on — it was to define the architecture that other people would build on top of. That meant designing a multi-account AWS landing zone from scratch: the VPC topology, Transit Gateway configuration, IAM role hierarchy, and Service Control Policies that would become the security and network model for production, staging, and development environments across the organization. Get this wrong and you're doing remediation work for years.

I led the containerisation strategy — standing up EKS clusters, writing the Helm charts, implementing OPA Gatekeeper policies that enforced security posture at the admission controller level before workloads ever ran. I also inherited several legacy environments and ran zero-downtime cloud migrations using Terraform and CloudFormation, working directly with client stakeholders to sequence cutovers that couldn't afford a maintenance window.

On the delivery side, I standardised how the engineering teams shipped code — CI/CD pipelines via Jenkins and GitHub Actions, deployment patterns documented and enforced, release cycle time reduced across multiple workstreams. And beyond the technical work, I spent a meaningful chunk of this role translating between business requirements and cloud architecture: running design sessions, writing solution patterns, and making the decisions visible to non-technical stakeholders. That kind of work doesn't show up in a Terraform plan, but it's the difference between infrastructure that gets adopted and infrastructure that gets worked around.

Nov 2012 – Nov 2020
AWS Cloud Engineer
Viper Technology Services — Patuxent River, MD

Eight years is a long time to spend inside the same problem set, and I used it. At Viper I designed and operated enterprise AWS infrastructure for large-scale government and commercial clients — the kind of accounts where the blast radius of a mistake is measured in users and dollars, not test cases. VPCs, IAM architecture, EC2 fleets, RDS clusters, S3 at production scale. I wasn't experimenting in a sandbox; I was the person accountable for whether these systems stayed up.

This is where I built the automation foundation that I still use today. Terraform for infrastructure-as-code, Ansible for configuration management, Jenkins for CI/CD, Python for everything else. By the time I left, the practices I'd built at Viper were standard enough that I carried them directly into the AI infrastructure work I do now. That's what a decade of production work gives you: patterns that are actually tested.

2007 – 2012
Oracle DBA / Systems Analyst
CACI International Inc. & Apptis, Inc.

This is where the discipline was formed. Managing Oracle environments for defense and enterprise clients means operating in contexts where data integrity is non-negotiable and downtime is not an acceptable outcome. Performance tuning, backup and recovery, complex analytical queries across relational systems serving real operational workflows. Starting here meant I never had the luxury of treating infrastructure as abstract — it was always connected to something that mattered. That stays with you.

Technical Skills

Tools earned
in production

AI & Agentic Systems
LangGraphLangChain RAG pipelinesMulti-agent orchestration LLM routingpgvector Amazon BedrockOpenAI API
Cloud (AWS)
EKSECS Fargate LambdaStep Functions EventBridgeRDS CloudFrontAPI Gateway IAMSecrets Manager
Infrastructure & GitOps
TerraformKubernetes Argo CDHelm OPA GatekeeperAnsible DockerJenkins
Observability
PrometheusGrafana AlertmanagerAWS X-Ray CloudWatchStructured telemetry
Backend & Data
PythonFastAPI PostgreSQLVector DBs SSE streamingREST APIs SQLBash
Platform Engineering
Event-driven architecture CI/CDGitHub Actions MLOpsSecurity-first IAM Cost-aware architecture TDD / pytest

Certifications

Validated
across domains

🟢
NVIDIA-Certified Professional — Agentic AI
NVIDIA · Advanced
🟢
NVIDIA-Certified Professional — Generative AI (LLMs)
NVIDIA · Advanced
🟠
AWS Certified Solutions Architect — Associate
Amazon Web Services
🟣
HashiCorp Certified: Terraform Associate
HashiCorp
Let's Talk

Looking for AI
infrastructure

I'm looking for AI Infrastructure Engineering roles where production reliability, cloud architecture, and LLM systems come together. If you're building something serious — not a prototype, an actual production system — and you need an engineer who has done this at scale, I'd like to hear about it.