I’m Khalid, a current Shopify engineering intern. I like building reliable systems that make other engineers faster. Datadog feels like the best place to do that at real scale.

🐶 Why Datadog?

  • It’s the nervous system for modern software. The idea of one platform tying together APM, logs, traces, infra, RUM, databases, and security signals is exactly the kind of hard, useful engineering I want to work on.
  • Real problems I care about:
    • Universal Service Monitoring (eBPF) for near-zero‑friction service discovery and telemetry
    • Observability Pipelines to transform/route data for reliability and cost control
    • Kubernetes observability (Cluster Agent, Live Containers, workload inventory, autoscaling signals)
    • Security that meets engineers where they work (Cloud SIEM, Application Security, CSPM)
    • AI that’s actually helpful (Bits surfacing traces, logs, metrics, SLOs, and runbooks to answer “why did p95 spike?”). I’m at Shopify now, where the org is AI‑native across the stack, so I care about AI that shortens the on call loop instead of adding noise.

Also, a small thing I like about Datadog: you call the culture “Pup Culture,” and Dash often turns into a hallway track for deep dives. It’s small, but it shows you care about the craft and the people.

  • The scale and constraints are real. Multi tenant ingestion, high cardinality data, query latency, alert correctness, data quality, and cost. Those trade offs are the engineering puzzles I like.
  • I’ve also chatted with a few Datadog engineers (Western University alum and folks I met at AWS Summit Toronto). The consistent themes were ownership, fast feedback, and shipping things that customers feel quickly. That’s my kind of environment.

🛠 What I bring

  • Kinaxis (ML infrastructure, Azure Kubernetes): fixed hyperparameter training issues on hundreds of pods (cut training time ~9%); built a metrics and accuracy tool over customer datasets (Delta Tables, Spark, GPUs) saving 100+ eng hours per month; CI/CD that trimmed 40+ hours per month. Hands on with systems that must be observable, cost aware, and boringly reliable.
  • Interac (payments, Java on Kubernetes/Azure): built a batch API to automate encryption retries across distributed services, reducing bulk payment failures by 47%; shipped 22+ new API fields on a service called millions of times daily; introduced stacked PRs and dev containers to speed up the loop.
  • Canadian Tire (high throughput API): shipped a customer facing feature computing loyalty points for 5M+ users with sub second latency over 300+ GB datasets; reduced prod server load by 11% via batching; Terraform for safer deploys.
  • Projects: FLOW serves real time inference to 200+ concurrent users with sub second latency; I’ve also built distributed systems (Kubernetes, Docker), worked across AWS and Azure, and I’m comfortable instrumenting everything.

🎯 Where I want to help

  • Make Bits even more useful by stitching in richer context from traces, error fingerprints, SLOs, deploy markers, and infra changes.
  • Evolve Observability Pipelines so teams can tame high-cardinality data without losing the signals that matter.
  • Push the Kubernetes experience toward “just works”: better auto tagging, service maps, and out of the box SLOs.
  • Keep closing the gap between observability and security so engineers catch risky behavior as naturally as they catch performance regressions.

I want to build the tools I wished I had on on call: precise, fast, no fluff. Datadog feels like the right place to learn from people who have made that real.

Here’s a past internship evaluation: 📝 https://khalidzabalawi.ca/evaluation.pdf

GitHub: 🐙 https://github.com/HikaruSadashi/

More builds not on my resume: 🧪 https://www.khalidzabalawi.ca/showcase/

Last note: I’ve done my fair share of 3 a.m. dashboards and “why is p95 red again?” hunts. I want to help make those moments shorter and rarer, for me and for everyone else using Datadog.

Thanks for reading!