The Shattered Leviathan

Addressing Irrational Actor Failure Modes in AGI Governance Frameworks

Apr 20, 2026

In 1995 a group that included scientists, engineers, and physicians in its membership released sarin on the Tokyo subway. In the 1980s an unknown individual killed multiple people by putting cyanide in a common over the counter medication.

The motives behind these actions may not have been well-aligned with money, power, or concessions: they may have been attempting to damage the institutions we depend on.

Many proposed artificial general intelligence (AGI) frameworks assume that threats will arise from rational actors. History warns us of the risk of irrational actors, and this paper explores the risk they pose during AGI emergence.

Read the full paper on SSRN

The proposal AGI, Governments, and Free Societies1 builds upon the “Narrow Corridor” framework proposed by Acemoglu and Robinson2 to outline a proposal for avoiding catastrophic risks through hybrid institutional design and adaptive regulation. This proposal, however, depends upon the rationality of actors and their responsiveness to power, legitimacy, and stability. This framework addresses the dual risks of power concentration leading to authoritarianism (the “despotic Leviathan”)3 or the weakening of the state through rapid AGI diffusion (the “absent Leviathan”).4 Missing from this analysis is the potential for non-rational actors (e.g., apocalyptic cults or nihilistic accelerationists) seeking to destroy, rather than control or use, the Leviathan. This paper considers that irrational actors introduce a third exit to the narrow corridor – the ‘shattered Leviathan’. We build upon Bullock, Hammond, and Krier’s work by proposing additional interventions and analyze the new failure modes this proposal exposes.

I. The Framework

The original work by Bullock, Hammond, and Krier5 (the “BHK Framework”) argues that “AGI poses distinct risks of pushing societies toward either a ‘despotic Leviathan’ through enhanced state surveillance and control, or an ‘absent Leviathan’ through the erosion of state legitimacy relative to AGI-empowered non-state actors.”6 The particular failure modes addressed are surveillance and control enablement leading to power concentration and an increased risk of authoritarian practices and non-state actor access to advanced capabilities eroding state legitimacy and governability. To address these risks and failure modes Bullock, Hammond, and Krier recommend a range of actions that are designed to enable institutions to capture the benefits of AGI while guarding against both failure modes mentioned.

II. The Narrow Corridor

AGI, Governments, and Free Societies7 posits that there is a narrow corridor of liberty during the emergence of AGI, and that “a governance framework emphasizing robust technical safeguards, hybrid institutional designs that maintain meaningful human oversight, and adaptive regulatory mechanisms”8 protects against likely kill chains resulting in either the ‘despotic Leviathan’ or ‘absent Leviathan’ results. The end states described are the consequences of the tensions “between the need for a strong state to provide public goods and enforce rules and the desire to limit state power to protect individual freedoms, both state and societal actors continually seek to expand their power, creating constant pressure to outgrow the corridor.”9 The principal actors in this scenario are acting in rational ways to expand power or maximize utility. This analysis fails to account for actors with orthogonal goals such as destruction, chaos, or widespread panic. This third class of actor is not motivated to grow power but instead acts to damage the institutions involved in the power struggle and therefore shatter the metaphorical Leviathan.

a. Irrational Actors

These irrational actors are not “stupid” or “unintelligent”, they are just proceeding on a different set of axioms and motivations than other actors. In many cases, these actors can be both quite intelligent and effective in pursuing their goals. This brief considers several historical examples of actors to motivate the defense against risks arising from individuals or groups that prioritize destructive outcomes over power, stability, or self-preservation. Although the historical context certainly does not include AGI, these types of actions have arisen in history, and the advanced capabilities and uplift potential of emerging technology will serve to make such attacks more dangerous. Although rare, these ‘black swan’ type events require scrutiny and consideration when proposing future AGI governance frameworks.

i. Solar Temple – Internal Destruction

During the 1980s and 1990s a religious movement called the Order of the Solar Temple (OTS) gained followers, with 442 members at its peak in January 1989.10 The OTS was a continuation of several other occult-apocalyptic organizations, and had members in France, Switzerland, Canada, Martinique, the United States, and Spain.11 This occult group, however, had a number of members capable of complex, rational efforts including firearm trafficking, financial instrument manipulation, and resource management.12 As legal investigations and internal strife13 grew, the leadership of OTS shifted the group’s focus toward an “exit” from these difficulties.

OTS leadership ultimately determined that the entire organization would “transit” to the Star Sirius.14 What was meant by transit, however, was a combination of ritual murders and suicide meant to endow them with “solar bodies”.15 In total, 74 members of the organization died over the course of the murder/suicide effort.16 These deaths demonstrate that a group including capable, rational persons can have underlying incentives that are misaligned with the assumptions of the Narrow Corridor (life, liberty, and economic security, et al.). These rational values ultimately had less value to OTS than escape to an extrasolar existence.

ii. Aum Shinrikyo – Widespread Chaos

Aum Shinrikyo (Aum), the Japanese doomsday cult responsible for the 1995 Tokyo subway attack, offers a direct parallel to the technical failure modes discussed in AGI, Governments, and Free Societies. At its height, Aum had more than 10,000 global members and assets exceeding $1 billion USD.17 Similar to OTS, Aum was led and populated by competent, intelligent persons capable of resource management, coordination, and complex CBRN development.

Aum likely justified its attacks through religious doctrine18, a motivation that does not fully align with the Narrow Corridor motivational model. A number of incidents were ultimately connected with Aum: an ineffective anthrax attack,19 a sarin attack against civilians in Matsumoto leading to 8 deaths and at least 500 other casualties,20 an attempt to steal military technical documents,21 several VX attacks involving 100-200 grams of the toxin in total used against three people (one of whom died),22 and the Tokyo subway sarin attack that killed 13, seriously injured 54, and affected at least 980 more.23 These actions were not only motivated by nihilism or doomsday thinking, but the uplift potential of AGI may have given them a much more deadly series of results.

iii. The Tylenol Killer – Inscrutable Motives

In late 1982, 7 people died from cyanide poisoning tied to the consumption of adulterated Tylenol.24 Despite several investigations, no one has been convicted of this crime and accordingly no motive has been firmly established. Federal investigators did seem confident that the attacks were perpetrated by an individual who was convicted of extorting the manufacturer25 who also threatened to assassinate the president as well as “murder more innocent people with cyanide-laced Tylenol.”26 Whether or not the true motive was extortion, this attack inspired multiple copycats resulting in deaths.2728293031 The original act and subsequent acts may have been more deadly given the capability uplifts in deployment and planning possible with AGI. Widespread random poisoning of consumer products could lead to substantial erosion of public trust. Finally, this sort of irrational act (random, inscrutable attacks) seems particularly high risk when considering the possibility of “AI-induced psychosis”3233Not only might actors with inscrutable motives carry out random attacks, but malicious actors may also introduce epistemic threats into the information ecosystem leading to growth in this sort of attack. If the state fails to act, the risk of an ‘Absent Leviathan’ grows substantially. If the state does act, the necessary increase in surveillance dramatically increases the risk of a ‘despotic Leviathan’ emerging.

III. Failure Modes

These actors represent three potential failure modes in an AGI future: “Kamikaze” actors, alignment hijacking, and information poisoning.

a. Kamikaze Actors

These actors – like the actors in the Order of the Solar Temple, are not affected by deterrence strategies. Their lives are less valuable than their goals. The only effective countermeasure is preventing the action from occurring. These actors pose extreme risk because of the attacker/defender asymmetry problem: the bad actors only need to succeed once to be effective, but the defenders need to be perfect to prevent the associated societal damage. As a result of the uplift potential of AGI, these marginal actors would pose an increased threat due to decreased time to action, improved development and logistical capabilities, and advanced attack planning support. They are not seeking power – so power centralization diversion techniques are ineffective – they are seeking only to harm the governing systems.

b. Alignment Hijacking

Well-placed individuals with irrational underlying motivations could cause widespread alignment problems if allowed access to the development, enhancement, or oversight apparatuses that are proposed in the “BHK Framework”. Individuals like the members of Aum Shinrikyo could cause great damage to the system at large by shifting AGI alignment towards bad or catastrophic outcomes.

c. Information Poisoning

If AI/AGI capabilities were harnessed not to control information, but to completely erode societal agreement on truth (epistemic fragmentation), chaos and AI-induced psychosis could run rampant. Rogue actors like the Tylenol Killer could cause widespread societal damage not through direct bad action, but through poisoning our shared information ecosystem and decreasing the relevance of actual truth to the public at large.

IV. Security Implications of Responses

The three cases above illustrate the risks posed by irrational actors. The motivations these actors have are unlikely to be impacted by either deterrence or inclusion, as suggested in the “BHK Framework.” Actors like the Order of the Solar Temple have underlying motivations that are immune to deterrence, actors like Aum Shinrikyo are unlikely to respond to societal inclusion (including them may actually exacerbate the risk – many Aum actors were respected members of society), and actors like the Tylenol killer are sufficiently inscrutable that neither intervention is likely to be effective. There are three classes of safety interventions that are likely to be needed to combat these risks: (1) radical prompt visibility, (2) control of information and censored outputs, and (3) invasive ideological vetting of individual participants in the oversight process.

a. Radical Prompt Transparency

Many of the risks discussed above would require CBRN capability exposure. That risk may be adequately managed through other safety controls. The underlying planning and logistics, however, may require substantial prompt oversight to combat. Many of the actions needed could be easily justified or explained and may not even appear harmful on the surface (e.g., returned consumer goods as an attack vector, wind patterns or HVAC design for optimal chemical dispersion). This would need to be coupled with oversight (human or automated) to detect some sort of “nihilistic intent” and be able to surface those risks. This starts to resemble a very, very intrusive surveillance state where communications are closely monitored for risk.

b. Information Controls

The risk of psychologically destabilizing information is under active study34, but may require some amount of output screening. To effectively combat this, it may become necessary to begin evaluating the truth of claims generated by AI solutions, leading to mass information control. This exposes society to great risk of authoritarian influence.

c. Ideological Oversight

The Aum Shinrikyo organization had large numbers of relatively well integrated members of society. To combat the risk of infiltration of the participatory regulatory mechanisms themselves, these individuals would need to be carefully vetted. This also carries great totalitarian risk due to the gradual development of a “single political / societal narrative” requirement for participation.

To prevent destabilizing the narrow corridor when reacting to irrational actors, we run great risk of aggressively shifting toward a ‘despotic Leviathan’ end state. Failing to act, however, exposes society to ever-growing risks and mistrust of institutions and an ‘absent Leviathan’ end state. Irrational actors pose a unique threat to societal stability, and both action and failure to act pose great stability risks.

V. Potential Defenses

Potential defenses against irrational actors could include the development of “safe” governance compute frameworks and restorable air-gapped logic, Know Your Customer (KYC) / proof-of-personhood for social participation, and the introduction of pure malice based red-teaming.

To prevent alignment hijacking or long-horizon information poisoning, AI developers could begin developing gapped governance logic that cannot be altered and use this to validate or restore “in the wild” systems periodically to ensure that alignment and safety measures are manageable in the long term.

Continuing to drive towards KYC regulations provides a defense against this sort of rogue actor, and requiring proof-of-personhood can reduce the risk of information poisoning.

Malice based red teaming should be introduced as part of standard safety practice. In addition to monitoring for misuse, misalignment, hijacking, and jailbreaking, oversight bodies should begin testing how these systems respond to potentially dangerous prompts with inscrutable or irrational motivations. This can both lead to further research on how to identify and prevent these actions as well as increase the resilience of deployed systems.

VI. Conclusion

Irrational actors pose a unique risk to AGI-enabled societies. The ‘absent’ and ‘despotic’ Leviathans used to illustrate potential failure modes in BHK’s work need to be supplemented by a ‘shattered Leviathan’ to more fully represent the threat landscape, and additional interventions be considered to prevent irrational actors from exploiting the capabilities and social structures that emerge during AGI development.

If you enjoyed this brief exploration of these issues, please consider subscribing to Code on Code for more deep dives into the intersection of AI governance, safety, law, and power structures.

Justin Bullock, Samuel Hammond, & Seb Krier, AGI, Governments, and Free Societies (Mar. 13, 2025), https://arxiv.org/pdf/2503.05710.

Daron Acemoglu & James A. Robinson, The Narrow Corridor: States, Societies, and the Fate of Liberty (2019).

Bullock, et al., supra note 1, at 30.

Id. at 31.

Id.

Id., abstract.

Id., at 12.

Jean-François Mayer, Our Terrestrial Journey is Coming to an End: The Last Voyage of the Solar Temple, 2 Nova Religio 172, 177 (Elijah Siegler trans., 1999).

Id., at 177.

Massimo Introvigne, Ordeal by Fire: The Tragedy of the Solar Temple 31-32 (James R. Lewis ed., 2006); Henrik Bogdan, The Order of the Solar Temple 289 (James R. Lewis & Jesper Aa. Petersen eds., 2d ed. 2014).

John Walliss, Crises of Charismatic Authority and Millenarian Violence: The Case of the Order of the Solar Temple 112 (James R. Lewis ed., 2006).

Roger C. Michaud, Bureau du coroner, Ordre du Temple solaire: rapport d’investigation du coroner au sujet des décès survenus à Morin Heights et en relation avec ceux survenus à Cheiry et à Salvan 35 (1996) (Can.).

Marc Labelle, The Ordre du Temple Solaire and the Quest for the Absolute Sun, in The Order of the Solar Temple: Prophet of the Apocalypse 149, 162 (James R. Lewis ed., 2006).

Order of the Solar Temple, Wikipedia, https://en.wikipedia.org/wiki/Order_of_the_Solar_Temple (last visited Apr. 9, 2026).

Ian Reader, Religious Violence in Contemporary Japan: The Case of Aum Shinrikyō 163–64 (2000).

Daniel A. Metraux, Religious Terrorism in Japan: The Fatal Appeal of Aum Shinrikyo, 35 Asian Surv. 1140, 1153 (1995).

Hiroshi Takahashi, Bacillus anthracis Bioterrorism Incident, Kameido, Tokyo, 1993, 10 Emerging Infectious Diseases 117, 117–20 (2004).

Kyle B. Olson, Aum Shinrikyo: Once and Future Threat?, 5 Emerging Infectious Diseases 413, 413–16 (1999).

David E. Kaplan & Andrew Marshall, The Cult at the End of the World (1996).

Pamela Zurer, Japanese Cult Used VX to Slay Member, Chem. & Eng’g News, Aug. 31, 1998, at 7.

Haruki Murakami, Underground (Alfred Birnbaum & Philip Gabriel trans., Vintage Int’l 2001) (2000).

Kori Rumore, The Tylenol Murders: 40 Years Ago, an Infamous Chicago-Area Crime Took These 7 Lives, Chi. Trib. (Sept. 30, 2022), https://www.chicagotribune.com/2022/09/30/the-tylenol-murders-40-years-ago-an-infamous-chicago-area-crime-took-these-7-lives/ (last visited Apr. 11, 2026).

Feds Convinced Lewis Was Tylenol Killer, WCVB Boston (Feb. 5, 2009, 5:29 PM), https://www.wcvb.com/article/feds-convinced-lewis-was-tylenol-killer/8028045.

Food & Drug Admin., Tamper-Evident Packaging Requirements for Over-the-Counter Human Drug Products (Nov. 4, 1998), https://web.archive.org/web/20170201023317/http://www.fda.gov/ohrms/dockets/98fr/110498a.txt (last visited Apr. 11, 2026).

Robert Hanley, 2D Tainted Bottle of Tylenol Found by Investigators, N.Y. Times (Feb. 14, 1986), https://www.nytimes.com/1986/02/14/nyregion/2d-tainted-bottle-of-tylenol-found-by-investigators.html (last visited Apr. 11, 2026).

Ray Hanania & Tom Seibel, Tylenol Legacy: Fear, New Safety: 10 Years Later, Case Still Unsolved, Chi. Sun-Times, Sept. 30, 1992, archived at https://web.archive.org/web/20160307103507/https://www.highbeam.com/doc/1P2-8061375.html (last visited Apr. 11, 2026).

Man Guilty of Killing Two in Sudafed Tampering, N.Y. Times (Apr. 4, 1993), https://www.nytimes.com/1993/04/04/us/man-guilty-of-killing-two-in-sudafed-tampering.html (last visited Apr. 11, 2026).

Police Find 2 More Containers of Tainted Excedrin, L.A. Times (June 19, 1986), https://www.latimes.com/archives/la-xpm-1986-06-19-mn-12346-story.html (last visited Apr. 11, 2026).

Elina Treyger, Joseph Matveyenko, Lynsay Ayer, Manipulating Minds: Security Implications of AI-Induced Psychosis, RAND (Dec. 8, 2025), https://www.rand.org/pubs/research_reports/RRA4435-1.html (last visited Apr. 11, 2026).

Jane Dalton, Jaswant Singh Chail: The ‘Star Wars’-Obsessed Assassin Who Planned to Kill the Queen with a Crossbow, The Independent (July 5, 2023), https://www.independent.co.uk/news/uk/crime/queen-crossbow-windsor-jaswant-singh-chail-b2369814.html (last visited Apr. 11, 2026); Kevin Roose, How AI Chatbots Are Spreading New Life into Old Conspiracy Theories, N.Y. Times (June 13, 2025), https://www.nytimes.com/2025/06/13/technology/chatgpt-ai-chatbots-conspiracies.html (last visited Apr. 11, 2026); Maria Dinzeo, OpenAI, Microsoft Face Lawsuit in Case Alleging ChatGPT Drove Man to Commit Murder-Suicide, The Recorder (Dec. 12, 2025), https://www.law.com/therecorder/2025/12/12/openai-microsoft-face-lawsuit-in-case-alleging-chatgpt-drove-man-to-commit-murder-suicide-/ (last visited Apr. 11, 2026).

Manipulating Minds, supra note 32.

Code on Code

Discussion about this post

Ready for more?