Hi Anthony!👋

I’m Khalid, a current Shopify engineering intern. I like building reliable systems that make other engineers faster. Datadog feels like the best place to do that at real scale.

🐶 Why Datadog?

It’s the nervous system for modern software. The idea of one platform tying together APM, logs, traces, infra, RUM, databases, and security signals is exactly the kind of hard, useful engineering I want to work on.
Real problems I care about:
- Universal Service Monitoring (eBPF) for near-zero‑friction service discovery and telemetry
- Observability Pipelines to transform/route data for reliability and cost control
- Kubernetes observability (Cluster Agent, Live Containers, workload inventory, autoscaling signals)
- Security that meets engineers where they work (Cloud SIEM, Application Security, CSPM)
- AI that’s actually helpful (Bits surfacing traces, logs, metrics, SLOs, and runbooks to answer “why did p95 spike?”). I’m at Shopify now, where the org is AI‑native across the stack, so I care about AI that shortens the on call loop instead of adding noise.

Also, a small thing I like about Datadog: you call the culture “Pup Culture,” and Dash often turns into a hallway track for deep dives. It’s small, but it shows you care about the craft and the people.

The scale and constraints are real. Multi tenant ingestion, high cardinality data, query latency, alert correctness, data quality, and cost. Those trade offs are the engineering puzzles I like.
I’ve also chatted with a few Datadog engineers (Western University alum and folks I met at AWS Summit Toronto). The consistent themes were ownership, fast feedback, and shipping things that customers feel quickly. That’s my kind of environment.

🛠 What I bring

Kinaxis (ML infrastructure, Azure Kubernetes): fixed hyperparameter training issues on hundreds of pods (cut training time ~9%); built a metrics and accuracy tool over customer datasets (Delta Tables, Spark, GPUs) saving 100+ eng hours per month; CI/CD that trimmed 40+ hours per month. Hands on with systems that must be observable, cost aware, and boringly reliable.
Interac (payments, Java on Kubernetes/Azure): built a batch API to automate encryption retries across distributed services, reducing bulk payment failures by 47%; shipped 22+ new API fields on a service called millions of times daily; introduced stacked PRs and dev containers to speed up the loop.
Canadian Tire (high throughput API): shipped a customer facing feature computing loyalty points for 5M+ users with sub second latency over 300+ GB datasets; reduced prod server load by 11% via batching; Terraform for safer deploys.
Projects: FLOW serves real time inference to 200+ concurrent users with sub second latency; I’ve also built distributed systems (Kubernetes, Docker), worked across AWS and Azure, and I’m comfortable instrumenting everything.

🎯 Where I want to help

Make Bits even more useful by stitching in richer context from traces, error fingerprints, SLOs, deploy markers, and infra changes.
Evolve Observability Pipelines so teams can tame high-cardinality data without losing the signals that matter.
Push the Kubernetes experience toward “just works”: better auto tagging, service maps, and out of the box SLOs.
Keep closing the gap between observability and security so engineers catch risky behavior as naturally as they catch performance regressions.

I want to build the tools I wished I had on on call: precise, fast, no fluff. Datadog feels like the right place to learn from people who have made that real.

Here’s a past internship evaluation: 📝 https://khalidzabalawi.ca/evaluation.pdf

GitHub: 🐙 https://github.com/HikaruSadashi/

More builds not on my resume: 🧪 https://www.khalidzabalawi.ca/showcase/

Last note: I’ve done my fair share of 3 a.m. dashboards and “why is p95 red again?” hunts. I want to help make those moments shorter and rarer, for me and for everyone else using Datadog.

Thanks for reading!

🐶 Why Datadog?#

🛠 What I bring#

🎯 Where I want to help#

🐶 Why Datadog?

🛠 What I bring

🎯 Where I want to help