AI Engineer

Ade
Daramola

I build the AI systems that have to actually work in production — fine-tuned models, retrieval pipelines, autonomous remediation. Not prototypes. Production systems running on AWS.

// 17+ years shipping production systems · LLM specialization · RAG · Agentic AI · AWS · GitOps

See My Work GitHub ↗ Get in Touch
Scroll
The Story

Production
discipline. Always.

I started in 2007 managing Oracle databases for defense clients — environments where a bad query plan or a failed backup had real, sometimes irreversible consequences. That's not where most cloud engineers start, and it shaped how I think about systems in ways that are hard to unlearn.

From there I spent close to a decade at Viper Technology building and operating enterprise AWS infrastructure for government and commercial workloads. VPCs, IAM, EC2 fleets, RDS clusters at scale. I wasn't reading about these things — I was the person on call when they broke.

At Zolon Tech I moved into senior infrastructure architecture. Multi-account AWS landing zones, EKS clusters, GitOps delivery pipelines, zero-downtime cloud migrations. The work became more complex and the stakes were higher, but the discipline was the same: build it so it doesn't break, and when it does, fix it faster than anyone notices.

The AI engineering work I do now isn't a career change — it's the same foundation applied to a harder problem. LLM systems have all the failure modes of distributed systems, plus a whole new class of problems that most engineers haven't seen yet. Production scars help.

17+
Years in Production
4
AI Infra Projects Shipped
4
Industry Certifications
AWS
Primary Cloud Platform

Featured Projects

Built to solve
problems that exist

The Through-Line

Four projects. One stack.

These aren't four side projects — they're four layers of the same production AI system. From the cluster that runs the workloads, to the gateway that routes between models, to the RAG engine that grounds answers in documents, to the fine-tuning pipeline that specializes those models for a domain. Each project is a working layer of the same architecture.

Layer 04 · Model
LLM Specialization Platform
Turn a general-purpose model into a domain specialist. QLoRA SFT + DPO, constrained decoding, GGUF export.
Layer 03 · Application
Stratum
Ground the model's answers in your documents. Hybrid retrieval, cross-encoder reranking, citation-validated generation.
Layer 02 · Gateway
Multi-LLM Platform
Cost-aware routing across Claude, OpenAI, and Bedrock. Two-layer semantic cache. DynamoDB auth and rate limiting.
Layer 01 · Operations
GitOps Sentinel
Keep the production cluster running without paging a human at 3am. Confidence-gated remediation through GitOps.

Specialize the model. Ground its answers in your data. Route intelligently across providers. Keep the cluster running without waking anyone up. That's the full lifecycle of a production AI deployment, and the same engineering discipline runs through all four: Terraform-deployed, version-pinned, container-reproducible, tested before deploy.

Project 01
GitOps Sentinel
AIOps · Kubernetes Remediation
The real-world problem: In 2021, a misconfigured BGP announcement at Facebook took down Instagram, WhatsApp, and Facebook itself for six hours. In 2023, a bad Kubernetes config change at a fintech startup cascaded into a 14-hour incident. The pattern is the same every time — a config change hits production, Prometheus fires alerts at 3am, the on-call engineer is asleep, and by the time anyone is online the blast radius has grown. The question isn't whether this will happen to your cluster. It's whether your system can respond before a human has to.

GitOps Sentinel is an autonomous remediation platform for Kubernetes clusters. When Alertmanager fires a webhook, Sentinel doesn't just page on-call — it ingests the signal through an HMAC-validated API Gateway, deduplicates via DynamoDB, bundles Prometheus metrics and k8s events into S3, and routes the incident through a Step Functions multi-agent pipeline: Classifier → Root Cause → Action Planner → Confidence Scorer. Every remediation is a Git commit. Argo CD detects the merged PR and syncs the cluster. The cluster only changes through the same reviewed write path a human engineer would use.

The design comes from a real operational insight: not all incidents are equal, and the action shouldn't be either. Sentinel gates every remediation on a deterministic confidence score with three explicit routes — ≥80 auto-applies the PR, 40–79 opens a PR for engineer review, and <40 escalates to on-call with no automated change. Routine incidents (OOMKilled pods, replica drift, known bad image tags) resolve in seconds. Novel or high-risk scenarios escalate before anything touches the cluster. Five minutes after every remediation, an Outcome Validator queries Prometheus to confirm the fix held — and if it didn't, it opens an automatic revert PR. The cluster is never worse off than it was before Sentinel ran.

This is the kind of system that pays for itself the first time it silently fixes something at 4am that would have been a two-hour incident.

Python AWS Lambda Step Functions EventBridge Terraform Argo CD Kubernetes / EKS Amazon Bedrock API Gateway · DynamoDB · S3 Prometheus · Grafana OPA Gatekeeper
Project 02
Multi-LLM Platform
AI Gateway · Cost-Aware Routing
The real-world problem: Every team shipping production AI eventually asks the same question: why am I paying GPT-4o prices for a request that Claude Haiku could handle? And then: what happens when my primary provider goes down at 2am? And then: how do I prove to finance that our LLM spend is justified? The standard answer — call one model, hardcode the endpoint — works until it doesn't. Vendor lock-in, no fallback, no cost visibility, no way to route intelligently across providers. Building a production AI gateway from scratch is the kind of infrastructure work that takes a team weeks to get right, and most teams skip it until the problem is already expensive.

The Multi-LLM Platform is a production-grade AI gateway that routes requests across Anthropic Claude, OpenAI, and AWS Bedrock with a cost-aware complexity scoring engine. Every request is scored on token volume, code detection, and reasoning keywords — simple queries go to low-tier models (Nova Micro, Haiku), complex ones to mid or high-tier (Sonnet, GPT-4o, Opus). A client can pin a specific model via model_preference; the router falls back gracefully if that provider is unhealthy. The result is a platform that routes 60%+ of traffic to cheap models automatically, with a cost target of $0.004–$0.008 per average request.

The caching layer is two-tier: Redis exact-match (sub-millisecond) sits in front of a pgvector semantic cache (cosine threshold 0.92). Cache hits bypass the LLM entirely — a semantic hit on a prior equivalent question is served from the cache and the prompt embedding gets promoted to Redis. At 35%+ hit rates (the conservative target), the cache pays for itself against LLM API costs within weeks. Auth is DynamoDB-backed per-key with sliding-window rate limiting. A health-checker Lambda runs every five minutes on EventBridge and marks unhealthy providers, so the router never wastes a request on a down endpoint.

Observability is first-class. Every request emits CloudWatch EMF metrics via Lambda stdout — RequestCount, InputTokens, OutputTokens, LatencyMs, CacheHit, EstimatedCostUSD — dimensioned by provider, model, and tier. X-Ray traces cover auth, cache lookup, routing, provider call, and cache write. Terraform manages all infrastructure: networking, Aurora Serverless v2, ElastiCache Serverless, Lambda, API Gateway v2. OIDC-federated GitHub Actions deploys with no long-lived credentials.

Python · FastAPI AWS Lambda · API Gateway v2 Terraform Redis · pgvector Amazon Bedrock Anthropic Claude · OpenAI DynamoDB · Secrets Manager CloudWatch EMF · X-Ray Aurora Serverless v2 GitHub Actions OIDC
Project 03
Stratum
Production RAG · Citation-Grounded Generation
The real-world problem: Most RAG systems shipped to production are demos with auth bolted on. They embed documents, do top-k similarity, dump chunks into a prompt, and call it retrieval. Then a user asks something the system doesn't actually know, and the LLM confidently invents an answer with no source attribution — and the team only learns about it after a customer escalation. Pure dense retrieval misses keyword-exact queries. Pure BM25 misses semantic ones. Without reranking, the wrong chunks dominate the context window. Without citation enforcement at generation time, there is no audit trail. The pattern is a system that works on the demo dataset and fails the moment real documents and real queries hit it.

Stratum is a domain-specific RAG engine designed around the assumption that retrieval quality and answer attribution are the actual problems — not the LLM. Queries flow through a hybrid retriever that runs BM25 sparse search and dense ANN in parallel, fuses results with Reciprocal Rank Fusion (RRF, K=60), expands matched child chunks back to their parent context, and reranks with a cross-encoder (ms-marco-MiniLM) before any chunk reaches the LLM. The fusion is parameter-free by design — robust to miscalibrated retrievers in a way that learned weights aren't, until you have enough labeled data to justify training them.

Generation is where most RAG systems quietly fail. Stratum's generator builds a context block with explicit [src N] markers, calls Claude with a citation-enforcing prompt, and parses and validates every citation marker before returning. If the model produces an answer without grounding it in the retrieved sources, the response is rejected at the generator layer — not surfaced to the user as a "best effort."

The infrastructure is dual-backend by design. Local development runs against in-process Chroma with zero Docker. Production runs against Weaviate 1.27 with HNSW tuning and gRPC, deployed on AWS via Terraform: ALB fronting EC2 instances for the FastAPI service and Streamlit UI, a separate node for Weaviate on a 20GB EBS volume, and S3 for raw document storage. Evaluation runs weekly via DeepEval against a golden dataset using a local Ollama judge — zero API cost on the eval gate.

Python FastAPI Weaviate · Chroma Terraform Claude API BM25 + Dense Hybrid Retrieval Cross-Encoder Reranking BGE / OpenAI Embeddings EC2 · ALB · S3 Streamlit · DeepEval
Project 04
LLM Specialization Platform
Fine-Tuning Infrastructure · Domain Adaptation
The real-world problem: Most fine-tuning projects treat training as the deliverable. Production ML fails for different reasons: the tokenizer drifts between training and serving, the GGUF you ship scores differently than the adapter you evaluated, the model fires on null inputs it should abstain from, and evaluation metrics pass against a mock that diverges from the real stack. Teams end up with training scripts in notebooks, no reproducibility, and no way to answer at deployment time: which dataset version, which hyperparameters, and which base model produced this adapter? The infrastructure problem is the same one that's bitten every serious ML team: how do you make domain specialization a repeatable engineering process instead of a one-time research project?

The LLM Specialization Platform is an end-to-end pipeline for fine-tuning, evaluating, and exporting specialized language models — built adversarially against the failure modes that sink production ML. The initial task is structured JSON extraction from unstructured text using Qwen2.5-7B-Instruct. The pipeline runs Phase 0 (tokenizer audit + baseline) → Phase 1 (QLoRA SFT) → Phase 2 (DPO from best SFT checkpoint) → Phase 3 (evaluation: raw + constrained decoding + metrics.json) → Phase 4 (adapter → merged BF16 → GGUF Q8_0 + Q4_K_M → re-verify). Swapping tasks requires a new dataset, schema, and config — no code changes.

The results are real and measured on a frozen 375-example test set after training on 2,998 examples. DPO improved extraction recall 15% over SFT (0.650 vs 0.498). Null accuracy — the model's ability to correctly abstain when there's nothing to extract — is perfect across all artifacts including both GGUF quantizations. Constrained decoding recovers schema validity to 71.8% on the DPO model; production deployment uses outlines. The CI gate emits a versioned metrics.json with pass/fail flags designed to be consumed by downstream GitOps pipelines — including Sentinel.

The infrastructure discipline matters as much as the training decisions. Every run is reproducible: manifests capture git commit, lockfile hash, dataset hash, and full hardware fingerprint. Validation gates raise ValueError — they don't warn silently and let bad datasets reach training. Field-level F1 is computed on positive examples only — including null cases inflates it for any over-abstaining model, masking real extraction quality, and this pipeline doesn't do that.

Python · PyTorch HuggingFace Transformers · PEFT QLoRA · DPO (TRL) Qwen2.5-7B-Instruct Outlines (constrained decoding) bitsandbytes · Accelerate llama.cpp · GGUF export Weights & Biases · MLflow DeepEval · Ollama judge Docker · pytest · GitHub Actions

Career Arc

From Oracle DBA
to AI engineering

Nov 2020 – Present
Lead Cloud Engineer
Zolon Tech Inc.

At Zolon I moved from building infrastructure into designing it. The shift mattered. My mandate wasn't to keep the lights on — it was to define the architecture that other people would build on top of. That meant designing a multi-account AWS landing zone from scratch: the VPC topology, Transit Gateway configuration, IAM role hierarchy, and Service Control Policies that would become the security and network model for production, staging, and development environments across the organization. Get this wrong and you're doing remediation work for years.

I led the containerisation strategy — standing up EKS clusters, writing the Helm charts, implementing OPA Gatekeeper policies that enforced security posture at the admission controller level before workloads ever ran. I also inherited several legacy environments and ran zero-downtime cloud migrations using Terraform and CloudFormation, working directly with client stakeholders to sequence cutovers that couldn't afford a maintenance window.

On the delivery side, I standardised how the engineering teams shipped code — CI/CD pipelines via Jenkins and GitHub Actions, deployment patterns documented and enforced, release cycle time reduced across multiple workstreams. And beyond the technical work, I spent a meaningful chunk of this role translating between business requirements and cloud architecture: running design sessions, writing solution patterns, and making the decisions visible to non-technical stakeholders. That kind of work doesn't show up in a Terraform plan, but it's the difference between infrastructure that gets adopted and infrastructure that gets worked around.

Nov 2012 – Nov 2020
AWS Cloud Engineer
Viper Technology Services — Patuxent River, MD

Eight years is a long time to spend inside the same problem set, and I used it. At Viper I designed and operated enterprise AWS infrastructure for large-scale government and commercial clients — the kind of accounts where the blast radius of a mistake is measured in users and dollars, not test cases. VPCs, IAM architecture, EC2 fleets, RDS clusters, S3 at production scale. I wasn't experimenting in a sandbox; I was the person accountable for whether these systems stayed up.

This is where I built the automation foundation that I still use today. Terraform for infrastructure-as-code, Ansible for configuration management, Jenkins for CI/CD, Python for everything else. By the time I left, the practices I'd built at Viper were standard enough that I carried them directly into the AI engineering work I do now. That's what a decade of production work gives you: patterns that are actually tested.

2007 – 2012
Oracle DBA / Systems Analyst
CACI International Inc. & Apptis, Inc.

This is where the discipline was formed. Managing Oracle environments for defense and enterprise clients means operating in contexts where data integrity is non-negotiable and downtime is not an acceptable outcome. Performance tuning, backup and recovery, complex analytical queries across relational systems serving real operational workflows. Starting here meant I never had the luxury of treating infrastructure as abstract — it was always connected to something that mattered. That stays with you.

Technical Skills

Tools earned
in production

AI & Agentic Systems
LangGraphLangChain RAG pipelinesMulti-agent orchestration LLM routingpgvector Amazon BedrockOpenAI API Semantic cachingConstrained decoding
Cloud (AWS)
EKSECS Fargate LambdaStep Functions EventBridgeRDS CloudFrontAPI Gateway IAMSecrets Manager
Infrastructure & GitOps
TerraformKubernetes Argo CDHelm OPA GatekeeperAnsible DockerJenkins
Observability
PrometheusGrafana AlertmanagerAWS X-Ray CloudWatchStructured telemetry
Backend & Data
PythonFastAPI PostgreSQLVector DBs SSE streamingREST APIs SQLBash
Platform Engineering
Event-driven architecture CI/CDGitHub Actions MLOpsQLoRA · DPO fine-tuning Security-first IAM Cost-aware architecture TDD / pytest

Certifications

Validated
across domains

🟢
NVIDIA-Certified Professional — Agentic AI
NVIDIA · Advanced
🟢
NVIDIA-Certified Professional — Generative AI (LLMs)
NVIDIA · Advanced
🟠
AWS Certified Solutions Architect — Associate
Amazon Web Services
🟣
HashiCorp Certified: Terraform Associate
HashiCorp
Let's Talk

Looking for AI
engineering

I'm looking for AI Engineering roles where production reliability, model specialization, and LLM systems come together. If you're building something serious — not a prototype, an actual production system — and you need an engineer who has shipped this at scale, I'd like to hear about it.