projects

open source work — building in public from May 2026

gpt.cpp inference

GPT-2 124M forward pass implemented from scratch in C++ using Eigen. No training, no PyTorch — just raw matrix math, pretrained weights, and top-k sampling. pybind11 bindings expose the C++ forward pass to Python for tokenization.

C++ Python Eigen pybind11
github ↗
cuda-kernels gpu

CUDA kernels written from scratch — vector add, naive matmul, tiled matmul with shared memory. Each kernel profiled with Nsight Systems and Nsight Compute on an RTX 3050. Includes roofline analysis and the counter-intuitive L1 cache finding from tiled matmul.

CUDA C++ Nsight
github ↗