January Meetup – The Mechanistic Interpretability Agenda
Astral Codex Ten recently posted an explainer of Anthropic’s paper “Towards Monosemanticity: Decomposing Language Models with Dictionary Learning.” Since we haven’t had a discussion meetup specifically focused on AI in awhile, this seems like as good of a time as any! Since the content in the ACX explainer and the