Ade Daramola — AI Engineer

Featured Projects

Built to solve
problems that exist

The Through-Line

Four projects. One stack.

These aren't four side projects — they're four layers of the same production AI system. From the cluster that runs the workloads, to the gateway that routes between models, to the RAG engine that grounds answers in documents, to the fine-tuning pipeline that specializes those models for a domain. Each project is a working layer of the same architecture.

Layer 04 · Model

LLM Specialization Platform

Turn a general-purpose model into a domain specialist. QLoRA SFT + DPO, per-artifact evaluation, vLLM constrained decoding, GGUF export.

→

Layer 03 · Application

Stratum

Ground the model's answers in your documents. Hybrid retrieval, cross-encoder reranking, citation-validated generation.

→

Layer 02 · Gateway

Multi-LLM Platform

Cost-aware routing across Claude, OpenAI, and Bedrock. Two-layer semantic cache. DynamoDB auth and rate limiting.

→

Layer 01 · Operations

GitOps Sentinel

Keep the production cluster running without paging a human at 3am. Confidence-gated remediation through GitOps.

Specialize the model. Ground its answers in your data. Route intelligently across providers. Keep the cluster running without waking anyone up. That's the full lifecycle of a production AI deployment, and the same engineering discipline runs through all four: Terraform-deployed, version-pinned, container-reproducible, tested before deploy.

Project 01

GitOps Sentinel

AIOps · Kubernetes Remediation

The real-world problem: In 2021, a misconfigured BGP announcement at Facebook took down Instagram, WhatsApp, and Facebook itself for six hours. In 2023, a bad Kubernetes config change at a fintech startup cascaded into a 14-hour incident. The pattern is the same every time — a config change hits production, Prometheus fires alerts at 3am, the on-call engineer is asleep, and by the time anyone is online the blast radius has grown. The question isn't whether this will happen to your cluster. It's whether your system can respond before a human has to.

GitOps Sentinel is an autonomous remediation platform for Kubernetes clusters. When Alertmanager fires a webhook, Sentinel doesn't just page on-call — it ingests the signal through an HMAC-validated API Gateway, deduplicates via DynamoDB, bundles Prometheus metrics and k8s events into S3, and routes the incident through a Step Functions multi-agent pipeline: Classifier → Root Cause → Action Planner → Confidence Scorer. Every remediation is a Git commit. Argo CD detects the merged PR and syncs the cluster. The cluster only changes through the same reviewed write path a human engineer would use.

The design comes from a real operational insight: not all incidents are equal, and the action shouldn't be either. Sentinel gates every remediation on a deterministic confidence score with three explicit routes — ≥80 auto-applies the PR, 40–79 opens a PR for engineer review, and <40 escalates to on-call with no automated change. Routine incidents (OOMKilled pods, replica drift, known bad image tags) resolve in seconds. Novel or high-risk scenarios escalate before anything touches the cluster. Five minutes after every remediation, an Outcome Validator queries Prometheus to confirm the fix held — and if it didn't, it opens an automatic revert PR. The cluster is never worse off than it was before Sentinel ran.

This is the kind of system that pays for itself the first time it silently fixes something at 4am that would have been a two-hour incident.

Python AWS Lambda Step Functions EventBridge Terraform Argo CD Kubernetes / EKS Amazon Bedrock API Gateway · DynamoDB · S3 Prometheus · Grafana OPA Gatekeeper

Project 02

Multi-LLM Platform

AI Gateway · Cost-Aware Routing

The real-world problem: Every team shipping production AI eventually asks the same question: why am I paying GPT-4o prices for a request that Claude Haiku could handle? And then: what happens when my primary provider goes down at 2am? And then: how do I prove to finance that our LLM spend is justified? The standard answer — call one model, hardcode the endpoint — works until it doesn't. Vendor lock-in, no fallback, no cost visibility, no way to route intelligently across providers. Building a production AI gateway from scratch is the kind of infrastructure work that takes a team weeks to get right, and most teams skip it until the problem is already expensive.

The Multi-LLM Platform is a production-grade AI gateway that routes requests across Anthropic Claude, OpenAI, and AWS Bedrock with a cost-aware complexity scoring engine. Every request is scored on token volume, code detection, and reasoning keywords — simple queries go to low-tier models (Nova Micro, Haiku), complex ones to mid or high-tier (Sonnet, GPT-4o, Opus). A client can pin a specific model via model_preference; the router falls back gracefully if that provider is unhealthy. The result is a platform that routes 60%+ of traffic to cheap models automatically, with a cost target of $0.004–$0.008 per average request.

The caching layer is two-tier: Redis exact-match (sub-millisecond) sits in front of a pgvector semantic cache (cosine threshold 0.92). Cache hits bypass the LLM entirely — a semantic hit on a prior equivalent question is served from the cache and the prompt embedding gets promoted to Redis. At 35%+ hit rates (the conservative target), the cache pays for itself against LLM API costs within weeks. Auth is DynamoDB-backed per-key with sliding-window rate limiting. A health-checker Lambda runs every five minutes on EventBridge and marks unhealthy providers, so the router never wastes a request on a down endpoint.

Observability is first-class. Every request emits CloudWatch EMF metrics via Lambda stdout — RequestCount, InputTokens, OutputTokens, LatencyMs, CacheHit, EstimatedCostUSD — dimensioned by provider, model, and tier. X-Ray traces cover auth, cache lookup, routing, provider call, and cache write. Terraform manages all infrastructure: networking, Aurora Serverless v2, ElastiCache Serverless, Lambda, API Gateway v2. OIDC-federated GitHub Actions deploys with no long-lived credentials.

Python · FastAPI AWS Lambda · API Gateway v2 Terraform Redis · pgvector Amazon Bedrock Anthropic Claude · OpenAI DynamoDB · Secrets Manager CloudWatch EMF · X-Ray Aurora Serverless v2 GitHub Actions OIDC

Project 03

Stratum

Production RAG · Citation-Grounded Generation

The real-world problem: Most RAG systems shipped to production are demos with auth bolted on. They embed documents, do top-k similarity, dump chunks into a prompt, and call it retrieval. Then a user asks something the system doesn't actually know, and the LLM confidently invents an answer with no source attribution — and the team only learns about it after a customer escalation. Pure dense retrieval misses keyword-exact queries. Pure BM25 misses semantic ones. Without reranking, the wrong chunks dominate the context window. Without citation enforcement at generation time, there is no audit trail. The pattern is a system that works on the demo dataset and fails the moment real documents and real queries hit it.

Stratum is a domain-specific RAG engine designed around the assumption that retrieval quality and answer attribution are the actual problems — not the LLM. Queries flow through a hybrid retriever that runs BM25 sparse search and dense ANN in parallel, fuses results with Reciprocal Rank Fusion (RRF, K=60), expands matched child chunks back to their parent context, and reranks with a cross-encoder (ms-marco-MiniLM) before any chunk reaches the LLM. The fusion is parameter-free by design — robust to miscalibrated retrievers in a way that learned weights aren't, until you have enough labeled data to justify training them.

Generation is where most RAG systems quietly fail. Stratum's generator builds a context block with explicit [src N] markers, calls Claude with a citation-enforcing prompt, and parses and validates every citation marker before returning. If the model produces an answer without grounding it in the retrieved sources, the response is rejected at the generator layer — not surfaced to the user as a "best effort."

The infrastructure is dual-backend by design. Local development runs against in-process Chroma with zero Docker. Production runs against Weaviate 1.27 with HNSW tuning and gRPC, deployed on AWS via Terraform: ALB fronting EC2 instances for the FastAPI service and Streamlit UI, a separate node for Weaviate on a 20GB EBS volume, and S3 for raw document storage. Evaluation runs weekly via DeepEval against a golden dataset using a local Ollama judge — zero API cost on the eval gate.

Python FastAPI Weaviate · Chroma Terraform Claude API BM25 + Dense Hybrid Retrieval Cross-Encoder Reranking BGE / OpenAI Embeddings EC2 · ALB · S3 Streamlit · DeepEval

Project 04

LLM Specialization Platform

Fine-Tuning Infrastructure · Domain Adaptation

The real-world problem: Most fine-tuning projects treat training as the deliverable. Production ML fails for different reasons: the tokenizer drifts between training and serving, the GGUF you ship scores differently than the adapter you evaluated, the model fires on null inputs it should abstain from, and evaluation metrics pass against a mock that diverges from the real stack. Teams end up with training scripts in notebooks, no reproducibility, and no way to answer at deployment time: which dataset version, which hyperparameters, and which base model produced this adapter? And null cases — inputs where the correct answer is to extract nothing — are almost never first-class in DPO preference data, so models learn to extract well but hallucinate when there's nothing there.

The LLM Specialization Platform is an end-to-end pipeline for fine-tuning, evaluating, and exporting specialized language models — built adversarially against the failure modes that sink production ML. The initial task is structured JSON extraction from unstructured text using Qwen2.5-7B-Instruct. The pipeline runs Phase 0 (tokenizer audit + baseline) → Phase 1 (QLoRA SFT) → Phase 2 (DPO from best SFT checkpoint) → Phase 3 (evaluation: raw + constrained decoding + metrics.json) → Phase 4 (adapter → merged BF16 → GGUF Q8_0 + Q4_K_M → re-verify). Swapping tasks requires a new dataset, schema, and config — no code changes.

Results are measured on a frozen 375-example test set, trained on 2,998 examples (14% null cases). Null accuracy is perfect across all artifacts — BF16, Q8_0, and Q4_K_M all score 1.000, meaning the model never hallucinates an extraction when there's nothing to extract. The raw-vs-guided gap is near zero (field F1 improves only +0.011 under constrained decoding), confirming the model produces valid JSON structure without guidance at temperature 0. GGUF quantization degradation is minimal — Q4_K_M drops field F1 by just 0.020 from Q8_0 and is production-viable. DPO did not harm general capability: MMLU improved +0.127 over base. The CI gate emits a versioned metrics.json with separate raw and deployment pass/fail verdicts, designed to be consumed by downstream GitOps pipelines — including Sentinel.

The DPO training signal is deliberately engineered: instead of SFT-vs-SFT preference pairs (which cluster at score margins ~0.5, too small for DPO to distinguish), rejected completions come from the base model (margins ~1.58), giving a clear and reliable signal. Null-case pairs are 20% of preference data — first-class, not an afterthought. Every run is reproducible: manifests capture git commit, lockfile hash, dataset hash, and full hardware fingerprint. Validation gates raise ValueError — they never warn silently and let bad datasets reach training.

Python · PyTorch HuggingFace Transformers · PEFT QLoRA · DPO (TRL) Qwen2.5-7B-Instruct vLLM · guided_json constrained decoding bitsandbytes · Accelerate llama.cpp · GGUF export Lambda Labs A100 · CUDA 12.8 Docker · pytest · GitHub Actions MMLU · HellaSwag regression

Career Arc

From Oracle DBA
to AI engineering

Nov 2020 – Present

Lead Cloud Engineer

Zolon Tech Inc.

At Zolon I moved from building infrastructure into designing it. The shift mattered. My mandate wasn't to keep the lights on — it was to define the architecture that other people would build on top of. That meant designing a multi-account AWS landing zone from scratch: the VPC topology, Transit Gateway configuration, IAM role hierarchy, and Service Control Policies that would become the security and network model for production, staging, and development environments across the organization. Get this wrong and you're doing remediation work for years.

I led the containerisation strategy — standing up EKS clusters, writing the Helm charts, implementing OPA Gatekeeper policies that enforced security posture at the admission controller level before workloads ever ran. I also inherited several legacy environments and ran zero-downtime cloud migrations using Terraform and CloudFormation, working directly with client stakeholders to sequence cutovers that couldn't afford a maintenance window.

On the delivery side, I standardised how the engineering teams shipped code — CI/CD pipelines via Jenkins and GitHub Actions, deployment patterns documented and enforced, release cycle time reduced across multiple workstreams. And beyond the technical work, I spent a meaningful chunk of this role translating between business requirements and cloud architecture: running design sessions, writing solution patterns, and making the decisions visible to non-technical stakeholders. That kind of work doesn't show up in a Terraform plan, but it's the difference between infrastructure that gets adopted and infrastructure that gets worked around.

Nov 2012 – Nov 2020

AWS Cloud Engineer

Viper Technology Services — Patuxent River, MD

Eight years is a long time to spend inside the same problem set, and I used it. At Viper I designed and operated enterprise AWS infrastructure for large-scale government and commercial clients — the kind of accounts where the blast radius of a mistake is measured in users and dollars, not test cases. VPCs, IAM architecture, EC2 fleets, RDS clusters, S3 at production scale. I wasn't experimenting in a sandbox; I was the person accountable for whether these systems stayed up.

This is where I built the automation foundation that I still use today. Terraform for infrastructure-as-code, Ansible for configuration management, Jenkins for CI/CD, Python for everything else. By the time I left, the practices I'd built at Viper were standard enough that I carried them directly into the AI engineering work I do now. That's what a decade of production work gives you: patterns that are actually tested.

2007 – 2012

Oracle DBA / Systems Analyst

CACI International Inc. & Apptis, Inc.

This is where the discipline was formed. Managing Oracle environments for defense and enterprise clients means operating in contexts where data integrity is non-negotiable and downtime is not an acceptable outcome. Performance tuning, backup and recovery, complex analytical queries across relational systems serving real operational workflows. Starting here meant I never had the luxury of treating infrastructure as abstract — it was always connected to something that mattered. That stays with you.

Ade
Daramola

Production
discipline. Always.

Built to solve
problems that exist

From Oracle DBA
to AI engineering

Tools earned
in production

Validated
across domains

Looking for AI
engineering

AdeDaramola

Productiondiscipline. Always.

Built to solveproblems that exist

From Oracle DBAto AI engineering

Tools earnedin production

Validatedacross domains

Looking for AIengineering

Ade
Daramola

Production
discipline. Always.

Built to solve
problems that exist

From Oracle DBA
to AI engineering

Tools earned
in production

Validated
across domains

Looking for AI
engineering