Prateek’s Blog Post

Sep 09, 2025

Forwards and Backprop through Online Attention

This tutorial is also accessible as a notebook here: Toy FlashAttention Notebook

Aug 18, 2025

We’ve come a long way, from single-task LLMs (like translation or classification) to general-purpose models capable of performing a wide range of tasks. Giving these models the ability to call too...

Jul 24, 2025

Understanding Mixed-Precision Training

Several clever innovations have made it feasible to train large language models (LLM) with hundreds of billions of parameters, some even reaching 600B. However, there’s also increasing pressure ...

Jul 24, 2025

Understanding How Floating Point Representations Work

The impressive emergent capabilities of LLMs have largely been observed as a result of scaling them to massive sizes, sometimes with hundreds of billions of parameters (e.g., 470B or 600B). Thes...

Jul 08, 2025

Distributed Training for Dummies

After enjoying the Ultra Scale Playbook, I really wanted to understand how different types of parallelism are implemented programmatically. The accompanying picotron library was quite useful, bu...

Jun 21, 2025

Notes on Post-Training Quantization

Post-training quantization (PTQ) reduces model size and speeds up inference by converting high-precision weights and/or activations to low-bit integers — with minimal or no retraining.

Mar 31, 2024

Exploring Simulacra's Generative Agents

Tutorial on Toy Simulacra

Apr 08, 2021

Contact tracing as a personalization framework

The Covid-19 pandemic has forced policy experts to make challenging trade-offs between public health and economic activity. As a result, countries have gone through a series of lockdowns, experienc...