Research by Harvard students on catastrophic risks from advanced AI.
Managing risks from advanced artificial intelligence is one of the most important problems of our time.¹ We are a community of technical and policy researchers at Harvard aimed at reducing these risks and steering the trajectory of AI development for the better.
We run a semester-long introductory technical reading group on AI safety research, covering topics like neural network interpretability,¹ learning from human feedback,² goal misgeneralization in reinforcement learning agents,³ eliciting latent knowledge,⁴ and evaluating dangerous capabilities in models.⁵
We also run an introductory AI policy reading group, where we discuss core strategic issues posed by the development of transformative AI systems.
Join our mailing list →
Our members have worked with:
Note: Use of organizational logos does not imply current affiliation with or endorsement by these organizations.