2 Preface
This thesis represents the culmination of interdisciplinary research at the intersection of AI safety, formal epistemology, and computational social science. The work emerged from recognizing a fundamental challenge in AI governance: while investment in AI safety research has grown exponentially, coordination between different stakeholder communities remains fragmented, potentially increasing existential risk through misaligned efforts.
The journey from initial concept to working implementation involved iterative refinement based on feedback from advisors, domain experts, and potential users. What began as a technical exercise in automated extraction evolved into a broader framework for enhancing epistemic security in one of humanity’s most critical coordination challenges. The AMTAIR project—Automating Transformative AI Risk Modeling—represents an attempt to build computational bridges between communities that, despite shared concerns about AI risk, often struggle to communicate effectively due to incompatible frameworks, terminologies, and implicit assumptions.
I hope this work contributes to building the intellectual and technical infrastructure necessary for humanity to navigate the transition to transformative AI safely. The tools and frameworks presented here are offered in the spirit of collaborative problem-solving, recognizing that the challenges we face require unprecedented cooperation across disciplines, institutions, and worldviews.
Acknowledgments
This thesis represents not just an intellectual journey but a deeply personal one, made possible only through the support, patience, and contributions of many remarkable people.
My supervisor, Dr. Timo Speith, provided the perfect balance of intellectual freedom and thoughtful guidance. His ability to see both the philosophical forest and the technical trees helped me navigate moments when I was lost in one or the other. His questions pushed this work beyond what I thought possible and his experience and intuition helped me steer clear of too many rabbit holes to count.
I owe my deepest gratitude to my wife, Mina Deol, whose unwavering support made these months of intensive research possible. Her patience during countless late nights, her encouragement during moments of doubt, and her willingness to listen to endless iterations of esoteric Bayesian network speculations went far beyond what any partner should reasonably endure. This work exists because she created the space for it to flourish.
I owe a particular debt to Johannes Meyer and Jelena Meyer for their meticulous work in creating independent manual extractions of the Carlsmith and Bucknall papers. Their contribution went far beyond mere validation; it provided peace of mind. As lead developer, I had harbored persistent concerns that my own intuitions might have inadvertently shaped the system’s behavior through architectural choices, prompt engineering, or source selection. Their independent extractions—showing both convergence in structure and expected variance in probabilities—allowed me to release these anxieties and trust that the system captures something real about how arguments work, not just my own biases about how they should work. Their intellectual generosity and attention to detail exemplify the collaborative spirit that makes progress possible.
Coleman Snell deserves special recognition for his partnership in developing the AMTAIR vision. Our conversations—ranging from technical implementation details to grand strategic questions about AI governance—shaped this project in fundamental ways. His ability to oscillate between pragmatic engineering concerns and ambitious long-term thinking kept the work both grounded and aspirational.
The MTAIR team’s pioneering manual approach provided both inspiration and a benchmark. David Manheim, Sam Clarke, Aryeh Englander and their collaborators demonstrated that formal modeling of AI risk arguments was possible, even if arduously manual. Their work posed the challenge this thesis attempts to answer: could we preserve their rigor while achieving scale through automation?
My family—my parents and sister—provided the foundation that made this journey possible. Their love, support, and occasional bewilderment at my career choices (“you’re teaching computers to argue about robot apocalypse? – seems all too real now”) kept me grounded in what matters. They reminded me that behind all the technical complexity lie fundamentally human concerns about our shared future.
The broader AI safety community created the intellectual raw material without which this work would be impossible. Every researcher who wrestled their intuitions into prose, every attempt to quantify the unquantifiable, every blog post and paper and comment thread contributed to the corpus that AMTAIR learns to formalize. I thank them for taking these risks seriously and for the courage to reason publicly about them.
Finally, I acknowledge the peculiar historical moment that made this work both possible and necessary. We live in an era where the tools to build potentially transformative AI exist alongside deep uncertainty about how to do so safely. This thesis represents one small attempt to build infrastructure for navigating that uncertainty collectively.
Any errors, oversights, or failures of imagination remain entirely my own responsibility. The contributions of others made this work possible; its limitations reflect only my own constraints.