I’m Khalid, a current Shopify engineering intern. I like building reliable systems that make other engineers faster. Datadog feels like the best place to do that at real scale.
🐶 Why Datadog?
- It’s the nervous system for modern software. The idea of one platform tying together APM, logs, traces, infra, RUM, databases, and security signals is exactly the kind of hard, useful engineering I want to work on.
- Real problems I care about:
- Universal Service Monitoring (eBPF) for near-zero‑friction service discovery and telemetry
- Observability Pipelines to transform/route data for reliability and cost control
- Kubernetes observability (Cluster Agent, Live Containers, workload inventory, autoscaling signals)
- Security that meets engineers where they work (Cloud SIEM, Application Security, CSPM)
- AI that’s actually helpful (Bits surfacing traces, logs, metrics, SLOs, and runbooks to answer “why did p95 spike?”). I’m at Shopify now, where the org is AI‑native across the stack, so I care about AI that shortens the on call loop instead of adding noise.
Also, a small thing I like about Datadog: you call the culture “Pup Culture,” and Dash often turns into a hallway track for deep dives. It’s small, but it shows you care about the craft and the people.
- The scale and constraints are real. Multi tenant ingestion, high cardinality data, query latency, alert correctness, data quality, and cost. Those trade offs are the engineering puzzles I like.
- I’ve also chatted with a few Datadog engineers (Western University alum and folks I met at AWS Summit Toronto). The consistent themes were ownership, fast feedback, and shipping things that customers feel quickly. That’s my kind of environment.
🛠 What I bring
- Kinaxis (ML infrastructure, Azure Kubernetes): fixed hyperparameter training issues on hundreds of pods (cut training time ~9%); built a metrics and accuracy tool over customer datasets (Delta Tables, Spark, GPUs) saving 100+ eng hours per month; CI/CD that trimmed 40+ hours per month. Hands on with systems that must be observable, cost aware, and boringly reliable.
- Interac (payments, Java on Kubernetes/Azure): built a batch API to automate encryption retries across distributed services, reducing bulk payment failures by 47%; shipped 22+ new API fields on a service called millions of times daily; introduced stacked PRs and dev containers to speed up the loop.
- Canadian Tire (high throughput API): shipped a customer facing feature computing loyalty points for 5M+ users with sub second latency over 300+ GB datasets; reduced prod server load by 11% via batching; Terraform for safer deploys.
- Projects: FLOW serves real time inference to 200+ concurrent users with sub second latency; I’ve also built distributed systems (Kubernetes, Docker), worked across AWS and Azure, and I’m comfortable instrumenting everything.
🎯 Where I want to help
- Make Bits even more useful by stitching in richer context from traces, error fingerprints, SLOs, deploy markers, and infra changes.
- Evolve Observability Pipelines so teams can tame high-cardinality data without losing the signals that matter.
- Push the Kubernetes experience toward “just works”: better auto tagging, service maps, and out of the box SLOs.
- Keep closing the gap between observability and security so engineers catch risky behavior as naturally as they catch performance regressions.
I want to build the tools I wished I had on on call: precise, fast, no fluff. Datadog feels like the right place to learn from people who have made that real.
Here’s a past internship evaluation: 📝 https://khalidzabalawi.ca/evaluation.pdf
GitHub: 🐙 https://github.com/HikaruSadashi/
More builds not on my resume: 🧪 https://www.khalidzabalawi.ca/showcase/
Last note: I’ve done my fair share of 3 a.m. dashboards and “why is p95 red again?” hunts. I want to help make those moments shorter and rarer, for me and for everyone else using Datadog.
Thanks for reading!