Policy Papers

This page is intended for students, researchers, and practitioners who are interested in learning about work from public policy, law, corporate governance, economics, and related fields that is relevant to reducing risks from advanced AI. The papers here represent a broad overview of what we see as the central challenges in and approaches to the governance of advanced AI.

Overviews and Surveys

The Role of Cooperation in Responsible AI Development (Askell et al., 2019).

This paper outlines one of the central issues in the governance of advanced AI: the risk of a “race to the bottom” where, because AI developers are unable to coordinate with each other, they are each incentivized to quickly develop more capable models while cutting corners on safety.

AI Policy Levers: A Review of the U.S. Government’s Tools to Shape AI Research, Development, and Deployment (Fischer et al., 2021).

Fischer et al. give a high-level overview of the many options available to the U.S. federal government for shaping the development of AI.

AI Chips: What They Are and Why They Matter (Khan & Mann, 2020).

The specialized chips used to train cutting-edge AI systems are one of the inputs to AI development most amenable to regulatory intervention. In this paper, researchers from the Center for Security and Emerging Technology explain the basic dynamics that govern these chips’ development and production, and discuss their strategic importance.

Towards Best Practices in AGI Safety and Governance (Schuett et al., 2023).

Researchers from the Centre for the Governance of AI survey experts at top AI labs and other relevant institutions. They find a remarkable degree of consensus among their respondents about the safety and governance practices that top labs should implement.

12 Tentative Ideas for U.S. AI Policy (Muehlhauser, 2023).

Licensing, Auditing and Standards

Governments often ensure that risky technologies are developed safely by creating regulatory regimes that ensure that approved projects obey best practices for safety. Many researchers and practitioners advocate for a similar approach to AI regulation. This approach would reduce risks from the wide deployment of dangerous systems and incentivize further research into safety and reliability from developers. Much work remains to be done, however, in working out the details of just what such a regulatory regime would look like, as well as how it would be realistically enforced.

Auditing Large Language Models: A Three-Layered Approach (Mökander et al., 2023).

This paper provides a blueprint for an auditing regime for contemporary foundation models. The authors also discuss the unique challenges that AI poses, cover some of the limitations of their proposed approach, and argue that the safe development of AI requires more than just a licensing regime.

Towards Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims (Brundage et al., 2020).

For a licensing and auditing regime to be effective, it needs to be enforceable. This paper describes some ways in which AI developers’ claims might be verified by a third party.

Nuclear Arms Control Verification and Lessons for AI Treaties (Baker, 2023).

Many researchers believe that the effective governance of nuclear weapons and nuclear power poses very similar challenges to the effective governance of advanced AI. Here, Mauricio Baker discusses in depth the lessons the AI governance community can draw from the challenges of verifying states’ claims about their nuclear development.

Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (Shavit, 2023).

Yonadav Shavit gives a detailed description of a possible verification scheme here, based on privacy-preserving monitoring of the usage of specialized chips.

Misuse and Conflict

In addition to the risk of catastrophic AI accidents, another source of AI-related risk is conflict precipitated by the development of advanced AI. AI might radically reshape the balance of power and the security landscape, leading to significant geopolitical instability.

How does the offense-defense balance scale? (Garfinkel & Dafoe, 2019).

The “offense-defense balance” refers to the relative ease of attacking another power and defending against an attack. Some researchers worry that advanced AI might tilt this balance in favor of offense, leading to a more conflict-prone world. This paper discusses that possibility.

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (Brundage et al., 2018).

This paper advocates for increased concern about the potential catastrophic misuse of powerful AI systems. The authors describe several concrete threat models, and they make high-level recommendations to policymakers and AI developers aimed at reducing misuse risks.

Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted? (Anderljung & Hazell, 2023).

The authors discuss the tradeoffs inherent in restrictions to AI capabilities aimed at curbing misuse risks. They describe some domains in which significant capabilities restrictions might be worthwhile.

Structural Risk

Thinking About Risks From AI: Accidents, Misuse and Structure (Zwetsloot & Dafoe, 2019).

This blog post develops a rough taxonomy for classifying different risks associated with AI development. Most significantly, it develops the notion of “structural risk.”

The Windfall Clause: Distributing the Benefits of AI (O’Keefe et al., 2020).

A central structural risk posed by advanced AI development is that of extreme economic concentration and correspondingly extreme inequality. If a few AI developers rapidly develop the ability to automate large swathes of the economy, they may quickly acquire much more economic power than private entities have had in the past. This paper outlines a proposal for an ex ante commitment by AI developers to avoid such extreme concentration of wealth.

Algorithmic Black Swans (Kolt, 2023).

This article introduces the concept of an “algorithmic black swan” — a catastrophic tail outcome caused by AI development. Noam Kolt argues that existing institutions and efforts to regulate AI systematically neglect these outcomes, and argues that future efforts should actively take them into account.