I work on reverse engineering artificial neural networks into human understandable algorithms.
I'm one of the co-founders of Anthropic, an AI lab focused on the safety of large models. Previously, I led interpretability research at OpenAI, worked at Google Brain, and co-founded Distill, a scientific journal focused on outstanding communication.
My blog should not be taken to reflect the views of any organization I'm affiliated with.
Twitter  -  Google Scholar