Automating the Modelling of Transformative Artificial Intelligence Risks

An Epistemic Framework for Leveraging Frontier AI Systems to Upscale Conditional Policy Assessments in Bayesian Networks on a Narrow Path towards Existential Safety

Author

Valentin Jakob Meyer

Published

May 26, 2025

Abstract

The rapid development of artificial intelligence poses existential risks that current governance structures struggle to address. This thesis diagnoses a critical coordination failure: while billions flow into AI safety research, efforts remain fragmented across technical, policy, and strategic communities operating with incompatible frameworks. I present AMTAIR (Automating Transformative AI Risk Modeling), a computational system that extracts formal probabilistic models from natural language arguments about AI risk. The approach uses frontier language models to transform unstructured text into Bayesian networks through a two-stage pipeline. First, arguments are parsed into hierarchical causal structures (ArgDown). Then, probability distributions are extracted and integrated (BayesDown). The resulting models enable systematic comparison across worldviews, evaluation of policy interventions, and integration with prediction markets for live updating. I demonstrate feasibility by successfully extracting complex models like Carlsmith’s power-seeking AI argument, transforming weeks of manual effort into minutes of computation. The implementation handles real-world complexity through modular architecture, progressive visualization, and thoughtful design choices that balance automation with human oversight. While extraction remains imperfect and validation preliminary, the system provides practical value by making implicit assumptions explicit and enabling evidence-based policy evaluation. This work contributes both theoretical frameworks for understanding coordination failures and practical tools for addressing them, offering a path toward more effective governance of transformative AI development.

Glossary

  • Argument mapping: A method for visually representing the structure of arguments
  • BayesDown: An extension of ArgDown that incorporates probabilistic information
  • Bayesian network: A probabilistic graphical model representing variables and their dependencies
  • Conditional probability: The probability of an event given that another event has occurred
  • Directed Acyclic Graph (DAG): A graph with directed edges and no cycles
  • Existential risk: Risk of permanent curtailment of humanity’s potential
  • Mesa-optimization: A learned optimization process that emerges within a broader training objective
  • Power-seeking AI: AI systems with instrumental incentives to acquire resources and power
  • Prediction market: A market where participants trade contracts that resolve based on future events
  • d-separation: A criterion for identifying conditional independence relationships in Bayesian networks
  • Monte Carlo sampling: A computational technique using random sampling to obtain numerical results

List of Abbreviations

AI - Artificial Intelligence
AGI - Artificial General Intelligence
AMTAIR - Automating Transformative AI Risk Modeling
API - Application Programming Interface
APS - Advanced, Planning, Strategic (AI systems)
BN - Bayesian Network
CPT - Conditional Probability Table
DAG - Directed Acyclic Graph
LLM - Large Language Model
ML - Machine Learning
MTAIR - Modeling Transformative AI Risks
NLP - Natural Language Processing
P&E - Philosophy & Economics
PDF - Portable Document Format
TAI - Transformative Artificial Intelligence