The SCIENCE

Managing extreme AI risks amid rapid progress. Bengio et al

Abstract. Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI1, there is a lack of consensus about how exactly such risk arise, and how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.

Regulating advanced artificial agents. Russell et al

Abstract. Governance frameworks should address the prospect of AI systems that cannot be safely tested. Technical experts and policy-makers have increasingly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them Technical experts and policy-makers have increasingly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them (1). Reinforcement learning (RL) agents that plan over a long time horizon far more effectively than humans present particular risks. Giving an advanced AI system the objective to maximize its reward and, at some point, withholding reward from it, strongly incentivizes the AI system to take humans out of the loop, if it has the opportunity. The incentive to deceive humans and thwart human control arises not only for RL agents but for long-term planning agents (LTPAs) more generally. Because empirical testing of sufficiently capable LTPAs is unlikely to uncover these dangerous tendencies, our core regulatory proposal is simple: Developers should not be permitted to build sufficiently capable LTPAs, and the resources required to build them should be subject to stringent controls.

AI: Unexplainable, Unpredictable, Uncontrollable.

by Roman Yampolskiy

Abstract. Delving into the deeply enigmatic nature of Artificial Intelligence (AI), AI: Unexplainable, Unpredictable, Uncontrollable explores the various reasons why the field is so challenging. Written by one of the founders of the field of AI safety, this book addresses some of the most fascinating questions facing humanity, including the nature of intelligence, consciousness, values and knowledge.  Moving from a broad introduction to the core problems, such as the unpredictability of AI outcomes or the difficulty in explaining AI decisions, this book arrives at more complex questions of ownership and control, conducting an in-depth analysis of potential hazards and unintentional consequences. The book then concludes with philosophical and existential considerations, probing into questions of AI personhood, consciousness, and the distinction between human intelligence and artificial general intelligence (AGI).  Bridging the gap between technical intricacies and philosophical musings, AI: Unexplainable, Unpredictable, Uncontrollable appeals to both AI experts and enthusiasts looking for a comprehensive understanding of the field, whilst also being written for a general audience with minimal technical jargon.