5  3. AMTAIR: Design and Implementation

The moment of truth in any research project comes when elegant theories meet stubborn reality. For AMTAIR, this meant transforming the vision of automated argument extraction into working code that could handle the beautiful messiness of real AI safety arguments. Let me take you through this journey from blueprint to implementation, complete with victories, defeats, and the occasional moment of “well, that’s unexpected.”

3.1 System Architecture Overview

Picture, if you will, a factory for transforming arguments into models. Raw materials enter at one end—PDFs thick with jargon, blog posts mixing insight with speculation, research papers where crucial assumptions hide in footnote 47. Finished products emerge at the other end—clean network diagrams where you can trace how Assumption A leads to Catastrophe B with probability 0.3. Actually, scratch the factory metaphor. It’s too clean, too industrial. This is more like archaeology meets interpretation meets mathematics. You’re digging through layers of argument, trying to distinguish the load-bearing claims from rhetorical flourishes, all while preserving enough context that the formalization means something.

The pipeline consists of five main stages:

  1. Text Ingestion and Preprocessing: Like a careful librarian, this stage catalogues incoming documents, normalizes their format, extracts metadata, and identifies the argumentative content worth processing.
  2. Argument Extraction: The intellectual heart of the system, where large language models perform their magic, transforming prose into structured representations.
  3. Data Transformation: The workshop where extracted arguments are refined, validated, and prepared for mathematical representation.
  4. Network Construction: The assembly line where formal Bayesian networks are instantiated, complete with conditional probability tables.
  5. Interactive Visualization: The showroom where complex models become accessible through thoughtful design and interactivity.

3.1.1 Five-Stage Pipeline Architecture

Let’s examine each stage more closely, understanding not just what they do but why they exist as separate components.

Text Ingestion and Preprocessing handles the unglamorous but essential work of standardization. Academic PDFs, with their two-column layouts and embedded figures, differ vastly from blog posts with inline code and hyperlinks. This stage creates a uniform representation while preserving essential structure and metadata. Format normalization strips away presentation while preserving content. Metadata extraction captures authorship, publication date, and citations. Relevance filtering identifies sections containing arguments rather than literature reviews or acknowledgments. Character encoding standardization prevents those maddening �replacement characters that plague text processing.

Argument Extraction represents AMTAIR’s core innovation. Using a two-stage process that mirrors human reasoning, it first identifies structural relationships (what influences what) then quantifies those relationships (how likely, how strong). This separation enables targeted prompts optimized for each task, human verification between stages, and modular improvements as LLM capabilities evolve.

Data Transformation bridges the gap between textual representations and mathematical models. It parses the BayesDown syntax into structured data, validates that the resulting network forms a proper DAG, checks probability consistency, and handles missing data intelligently.

Network Construction instantiates the formal mathematical model. This involves creating nodes and edges according to extracted structure, populating conditional probability tables, initializing inference engines, and validating the complete model.

Interactive Visualization makes the complex accessible. Through thoughtful visual encoding of probabilities and relationships, progressive disclosure of detail, interactive exploration capabilities, and multiple export formats, it serves diverse stakeholder needs.

3.1.2 Design Principles

Core Design Philosophy: The architecture embodies several principles that guided countless implementation decisions:

Modularity: Each component has clear inputs, outputs, and responsibilities. This isn’t just good software engineering—it enables independent improvement of components and graceful degradation when parts fail.

Validation Checkpoints: Between each stage, we validate outputs before proceeding. Bad extractions don’t propagate into visualization. Malformed networks trigger re-extraction rather than cryptic errors.

Human-in-the-Loop: While pursuing automation, we recognize that human judgment remains invaluable. The architecture provides natural intervention points where experts can verify and correct.

Extensibility: New document formats, improved extraction prompts, alternative visualization libraries—the architecture accommodates growth without restructuring.

The system emphasizes transparency over black-box efficiency. Users can inspect intermediate representations, understand extraction decisions, and verify transformations. This builds trust—essential for a system handling high-stakes arguments about existential risk.

3.2 The Two-Stage Extraction Process

The heart of AMTAIR1 beats with a two-stage rhythm: structure, then probability. This separation, which initially seemed like an implementation detail, revealed itself as fundamental to the extraction challenge.

3.2.1 Stage 1: Structural Extraction (ArgDown)

Imagine reading a complex argument about AI risk. Your first pass likely isn’t calculating exact probabilities—you’re mapping the landscape. What are the key claims? How do they relate? What supports what? Stage 1 mirrors this cognitive process.

The extraction begins with pattern recognition. Natural language contains linguistic markers of causal relationships: “leads to,” “results in,” “depends on,” “influences.” The LLM, trained on vast corpora of argumentative text, recognizes these patterns and their variations.

Consider extracting from a passage like: “The development of artificial general intelligence will likely lead to rapid capability gains through recursive self-improvement. This intelligence explosion could result in systems pursuing convergent instrumental goals, potentially including resource acquisition and self-preservation. Without solved alignment, such power-seeking behavior poses existential risks to humanity.”

The system identifies three key variables connected by causal relationships:

  • AGI Development → Intelligence Explosion
  • Intelligence Explosion → Power-Seeking Behavior
  • Power-Seeking Behavior → Existential Risk

But extraction goes beyond simple pattern matching. The system must handle complex linguistic phenomena like coreference (“this,” “such systems”), implicit relationships, conditional statements, and negative statements. The magic lies in prompt engineering that guides the LLM to consistent extraction while remaining flexible enough for diverse argument styles.

The output, formatted in ArgDown syntax, preserves both structure and semantics:

[Existential_Risk]: Threat to humanity's continued existence and flourishing.
 + [Power_Seeking_Behavior]: AI systems pursuing instrumental goals like resource acquisition.
   + [Intelligence_Explosion]: Rapid recursive self-improvement leading to superintelligence.
     + [AGI_Development]: Creation of artificial general intelligence systems.

3.2.2 Stage 2: Probability Integration (BayesDown)

With structure established, Stage 2 adds the quantitative flesh to the qualitative bones. This stage faces a different challenge: extracting numerical beliefs from text that often expresses uncertainty in frustratingly vague terms.

The process begins by generating targeted questions based on the extracted structure. For each node, we need prior probabilities. For each child-parent relationship, we need conditional probabilities. The combinatorics can be daunting—a node with three binary parents requires 8 conditional probability values.

The system employs multiple strategies for probability extraction:

Explicit Extraction: When authors provide numerical estimates (“we assign 70% probability”), extraction is straightforward, though we must handle various formats and contexts.

Linguistic Mapping: While comprehensive validation remains future work, preliminary assessments using the methodology described above would likely reveal several patterns.

Comparative Reasoning: Statements like “more probable than not” or “at least as likely as X” provide bounds even without exact values.

Coherence Enforcement: Probabilities must sum correctly. If P(A|B) = 0.7, then P(not A|B) must equal 0.3. The syntax allows future system to detect and resolve inconsistencies.

The result is a complete BayesDown specification:

[Existential_Risk]: Threat to humanity's continued existence. {
  "instantiations": ["true", "false"],
  "priors": {"p(true)": "0.10", "p(false)": "0.90"},
  "posteriors": {
    "p(true|power_seeking_true)": "0.65",
    "p(true|power_seeking_false)": "0.001"
  }
}

3.2.3 Why Two Stages?

The separation of structure from probability isn’t merely convenient—it’s cognitively valid and practically essential. Let me count the ways this design decision pays dividends:

Cognitive Alignment: Humans naturally separate “what relates to what” from “how likely is it.” The two-stage process mirrors this, making the system’s operation intuitive and interpretable.

Error Isolation: Structural errors (missing a key variable) differ fundamentally from probability errors (estimating 0.7 instead of 0.8). Separating stages allows targeted debugging and improvement.

Modular Validation: Experts can verify structure without needing to evaluate every probability. This enables efficient human oversight at natural checkpoints.

Flexible Quantification: Different probability sources (text extraction, expert elicitation, market data) can feed into the same structure. The architecture accommodates multiple approaches to the probability challenge.

Transparency: Users can inspect ArgDown to understand what was extracted before probabilities were added. This builds trust and enables meaningful correction.

The two-stage approach also revealed an unexpected benefit: ArgDown itself became a valuable output. Researchers began using these structural extractions for qualitative analysis, even without probability quantification. Sometimes, just making argument structure explicit provides sufficient value.

3.3 Implementation Technologies

Choosing technologies for AMTAIR resembled assembling a band—each instrument needed to excel individually while harmonizing with the ensemble. The selection criteria balanced capability, maturity, interoperability, and community support.

3.3.1 Technology Stack

Selecting technologies for a project like AMTAIR involves a peculiar form of fortune-telling. You’re choosing tools not just for present needs but for future possibilities you can’t fully anticipate. Early decisions cascade through the implementation, creating path dependencies that only become apparent months later.

The choice of Python as the primary language was perhaps the only decision that never faced serious questioning. The ecosystem for scientific computing, the availability of sophisticated libraries, the community support—all pointed in the same direction. Yet even this “obvious” choice carried hidden implications. Python’s flexibility enabled rapid prototyping but occasionally masked performance issues until they became critical.

NetworkX emerged as the natural choice for graph manipulation after brief flirtations with alternatives. Its maturity showed in countless small conveniences—algorithms I didn’t have to implement, edge cases already handled, documentation for obscure functions. Pgmpy for Bayesian network operations was less obvious. Several libraries offered similar functionality, but pgmpy’s API design aligned well with our extraction pipeline. The ability to construct networks incrementally, validate structure during construction, and perform inference without elaborate setup proved decisive.

The visualization challenge nearly derailed the project. Initial attempts with matplotlib produced static images that technically displayed the network but failed to convey understanding. The breakthrough came with PyVis, which leveraged vis.js to create interactive web-based visualizations. Suddenly, complex networks became explorable. Users could drag nodes to untangle connections, click for details, adjust physics parameters to find optimal layouts. The difference between seeing and understanding turned out to be interactivity.

The final ensemble performs beautifully:

Table 5.1: Table 3.3.1: Overview of Tech Stack
Component Technology Purpose Why This Choice
Language Models GPT-4, Claude Argument extraction State-of-the-art reasoning capabilities
Network Analysis NetworkX Graph algorithms Mature, comprehensive, well-documented
Probabilistic Modeling pgmpy Bayesian operations Native Python, active development
Visualization PyVis Interactive rendering Web-based, customizable, responsive
Data Processing Pandas Structured manipulation Industry standard, powerful operations

Language Models form the cognitive core. GPT-4 and Claude demonstrate remarkable ability to understand complex arguments, recognize implicit structure, and maintain coherence across long extractions. The choice to support multiple models provides robustness and allows leveraging their complementary strengths.

NetworkX handles all graph-theoretic heavy lifting. From basic operations like cycle detection to advanced algorithms like centrality measurement, it provides a comprehensive toolkit that would take years to replicate.

pgmpy bridges the gap between graph structure and probabilistic reasoning. Its clean API design maps naturally onto our extracted representations, while its inference algorithms handle the computational complexity of Bayesian reasoning.

PyVis transforms static networks into living documents. Built on vis.js, it provides smooth physics simulations, rich interactivity, and extensive customization options—all accessible through Python.

Pandas might seem mundane compared to its companions, but it’s the reliable rhythm section that keeps everything together. Its ability to reshape, merge, and transform structured data makes the complex data transformations tractable.

3.3.2 Key Algorithms

Beyond the libraries lie custom algorithms that address AMTAIR-specific challenges:

Hierarchical Parsing: The algorithm that transforms indented ArgDown text into structured data represents a small miracle of recursive descent parsing adapted for our custom syntax. It maintains parent-child relationships while handling edge cases like repeated nodes and complex dependencies.

python

#| label: example_use_case
#| echo: true
#| eval: true
#| fig-cap: "example use case"
#| fig-link: "https://colab.research.google.com/github/VJMeyer/submission/blob/main/AMTAIR_Prototype/data/example_carlsmith/AMTAIR_Prototype_example_carlsmith.ipynb#scrollTo=ibjjJ34v3sQn&line=4&uniqifier=1"
#| fig-alt: "example use case"

def parsing_argdown_bayesdown(text, current_indent=0):
    """Recursively parse indented structure maintaining relationships"""
    # Track nodes at each level for parent identification
    # Handle repeated nodes by reference
    # Validate DAG property during construction

Probability Completion: Real arguments rarely specify all required probabilities. Our completion algorithm uses maximum entropy principles—when uncertain, assume maximum disorder. This provides conservative estimates that can be refined with additional information.

Visual Encoding: The algorithm mapping probabilities to colors uses perceptual uniformity. The green-to-red gradient isn’t linear in RGB space but follows human perception of color difference. Small details, big impact on usability.

Layout Optimization: Force-directed layouts often produce “hairballs” for complex networks. Our customized approach uses hierarchical initialization based on causal depth, then refines with physics simulation. The result: layouts that reveal structure rather than obscuring it.

3.3.3 (Expected) Performance Characteristics

Performance in a system like AMTAIR involves multiple dimensions—speed, accuracy, scalability. Let’s examine what theoretical analysis and design considerations suggest about system behavior.

Computational Complexity: The extraction phase exhibits linear complexity in document length—processing twice as much text takes roughly twice as long. However, the inference phase faces exponential complexity in network connectivity. A fully connected network with n binary nodes requires O(2^n) operations for exact inference. This fundamental limitation shapes practical usage patterns.

Practical Implications: Small networks (<20 nodes) enable real-time interaction with exact inference. Medium networks (20-50 nodes) require seconds to minutes depending on connectivity. Large networks (>50 nodes) necessitate approximate methods, trading accuracy for tractability. Very large networks push the boundaries of current methods.

The bottleneck shifts predictably: extraction remains manageable even for lengthy documents, but inference becomes challenging as models grow. This suggests a natural workflow—extract comprehensively, then focus on relevant subnetworks for detailed analysis.

Optimization Opportunities: Several strategies could improve performance: caching frequent inference queries, hierarchical decomposition of large networks, parallel processing for independent subgraphs, and progressive rendering for visualization. The modular architecture accommodates these enhancements without fundamental restructuring.

3.3.4 Deterministic vs. Probabilistic Components of the Workflow

An interesting philosophical question arises: in a system reasoning about probability, which components should themselves be probabilistic?

The current implementation draws a clear line:

Deterministic Components: All data transformations, graph algorithms, and inference calculations operate deterministically. Given the same input, they produce identical output. This provides reproducibility and debuggability—essential for building trust.

Probabilistic Components: The LLM calls for extraction introduce variability. Even with temperature set to 0, language models exhibit some randomness. Different runs might extract slightly different structures or probability estimates from the same text.

This division reflects a deeper principle: use determinism wherever possible, embrace probability where necessary. The extraction task—interpreting natural language—inherently involves uncertainty. But once we have formal representations, all subsequent operations should be predictable.

From an information-theoretic perspective, we’re trying to extract maximum information from documents within computational budget constraints. Each document contains some finite amount of formalizable argument structure. Our goal is recovering as much as possible given realistic resource limits.

The two-stage extraction can be viewed as successive refinement—first recovering the higher-order bits (structure), then filling in lower-order bits (probabilities). This aligns with rate-distortion theory, where we get the most important information first.

3.4 Case Study: Rain-Sprinkler-Grass

Every field has its canonical examples—physics has spherical cows, economics has widget factories, and Bayesian networks have the rain-sprinkler-grass scenario. Despite its simplicity, this example teaches profound lessons about causal reasoning and serves as the perfect test case for AMTAIR.

3.4.1 Processing Steps

Let me walk you through how AMTAIR processes this foundational example:

The input arrives as a simple text description: “When it rains, the grass gets wet. The sprinkler also makes the grass wet. However, when it rains, we usually don’t run the sprinkler.”

From this prosaic description, the system performs five transformations:

  1. ArgDown Parsing: Extract three variables (Rain, Sprinkler, Grass_Wet) and identify that rain influences both sprinkler usage and grass wetness, while the sprinkler also influences grass wetness.
  2. Question Generation: Create probability queries: What’s P(Rain)? What’s P(Sprinkler|Rain)? What’s P(Grass_Wet|Rain,Sprinkler) for all combinations?
  3. BayesDown Extraction: Either extract probabilities from text or apply reasonable defaults. The “usually don’t run” becomes P(Sprinkler|Rain) ≈ 0.01.
  4. Network Construction: Build the formal Bayesian network with three nodes, three edges, and complete conditional probability tables.
  5. Visualization Rendering: Create an interactive display where rain appears as a root cause, influencing both sprinkler and grass directly.

Each step validates its outputs before proceeding, ensuring that errors don’t cascade through the pipeline.

3.4.2 Example Conversion Steps

Let’s trace the actual transformations to see the pipeline in action:

Initial ArgDown Extraction:

[Grass_Wet]: Concentrated moisture on, between and around the blades of grass.{"instantiations": ["grass_wet_TRUE", "grass_wet_FALSE"]}    
  + [Rain]: Tears of angles crying high up in the skies hitting the ground.{"instantiations": ["rain_TRUE", "rain_FALSE"]}
  + [Sprinkler]: Activation of a centrifugal force based CO2 droplet distribution system.{"instantiations": ["sprinkler_TRUE", "sprinkler_FALSE"]}
      + [Rain]

The hierarchy captures that rain influences sprinkler usage—a subtle but important causal relationship that pure correlation would miss.

Generated Questions for Probability Extraction:

BayesDown Format Preview:
# BayesDown Representation with Placeholder Probabilities

/* This file contains BayesDown syntax with placeholder probabilities.
   Replace the placeholders with actual probability values based on the 
   questions in the comments. */

    /* What is the probability for Grass_Wet=grass_wet_TRUE? */
    /* What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_TRUE, Sprinkler=sprinkler_TRUE? */
    /* What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_TRUE, Sprinkler=sprinkler_FALSE? */
    /* What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_FALSE, Sprinkler=sprinkler_TRUE? */
    /* What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_FALSE, Sprinkler=sprinkler_FALSE? */
    /* What is the probability for Grass_Wet=grass_wet_FALSE? */
    /* What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_TRUE, Sprinkler=sprinkler_TRUE? */
    /* What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_TRUE, Sprinkler=sprinkler_FALSE? */
    /* What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_FALSE, Sprinkler=sprinkler_TRUE? */
    /* What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_FALSE, Sprinkler=sprinkler_FALSE? */
    [Grass_Wet]: Concentrated moisture on, between and around the blades of grass. {"instantiations": ["grass_wet_TRUE", "grass_wet_FALSE"], "priors": {"What is the probability for Grass_Wet=grass_wet_TRUE?": "%?", "What is the probability for Grass_Wet=grass_wet_FALSE?": "%?"}, "posteriors": {"What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_TRUE, Sprinkler=sprinkler_TRUE?": "?%", "What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_TRUE, Sprinkler=sprinkler_FALSE?": "?%", "What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_FALSE, Sprinkler=sprinkler_TRUE?": "?%", "What is the probability for Grass_Wet=grass_wet_TRUE if Rain=rain_FALSE, Sprinkler=sprinkler_FALSE?": "?%", "What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_TRUE, Sprinkler=sprinkler_TRUE?": "?%", "What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_TRUE, Sprinkler=sprinkler_FALSE?": "?%", "What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_FALSE, Sprinkler=sprinkler_TRUE?": "?%", "What is the probability for Grass_Wet=grass_wet_FALSE if Rain=rain_FALSE, Sprinkler=sprinkler_FALSE?": "?%"}}
        /* What is the probability for Rain=rain_TRUE? */
        /* What is the probability for Rain=rain_FALSE? */
        + [Rain]: Tears of angles crying high up in the skies hitting the ground. {"instantiations": ["rain_TRUE", "rain_FALSE"], "priors": {"What is the probability for Rain=rain_TRUE?": "%?", "What is the probability for Rain=rain_FALSE?": "%?"}}
        /* What is the probability for Sprinkler=sprinkler_TRUE? */
        /* What is the probability for Sprinkler=sprinkler_TRUE if Rain=rain_TRUE? */
        /* What is the probability for Sprinkler=sprinkler_TRUE if Rain=rain_FALSE? */
        /* What is the probability for Sprinkler=sprinkler_FALSE? */
        /* What is the probability for Sprinkler=sprinkler_FALSE if Rain=rain_TRUE? */
        /* What is the probability for Sprinkler=sprinkler_FALSE if Rain=rain_FALSE? */
        + [Sprinkler]: Activation of a centrifugal force based CO2 droplet distribution system. {"instantiations": ["sprinkler_TRUE", "sprinkler_FALSE"], "priors": {"What is the probability for Sprinkler=sprinkler_TRUE?": "%?", "What is the probability for Sprinkler=sprinkler_FALSE?": "%?"}, "posteriors": {"What is the probability for Sprinkler=sprinkler_TRUE if Rain=rain_TRUE?": "?%", "What is the probability for Sprinkler=sprinkler_TRUE if Rain=rain_FALSE?": "?%", "What is the probability for Sprinkler=sprinkler_FALSE if Rain=rain_TRUE?": "?%", "What is the probability for Sprinkler=sprinkler_FALSE if Rain=rain_FALSE?": "?%"}}
            /* What is the probability for Rain=rain_TRUE? */
            /* What is the probability for Rain=rain_FALSE? */
            + [Rain]

The system generates exactly the questions needed to fully specify the network.

Complete BayesDown Result:

[Grass_Wet]: Concentrated moisture on, between and around the blades of grass.{"instantiations": ["grass_wet_TRUE", "grass_wet_FALSE"]}    
    +[Rain]: Tears of angles crying high up in the skies hitting the ground.{"instantiations": ["rain_TRUE", "rain_FALSE"]}
    +[Sprinkler]: Activation of a centrifugal force based CO2 droplet distribution system.{"instantiations": ["sprinkler_TRUE", "sprinkler_FALSE"]}
        +[Rain]

Notice how the probabilities tell a coherent story—grass is almost certainly wet if either water source is active, almost certainly dry if neither is.

Resulting DataFrame Structure:

The transformation into tabular format enables standard data analysis tools while preserving all relationships and probabilities. Each row represents a node with its properties, parents, children, and probability distributions.

3.4.3 Results

Table 5.2: Table 3.5.3: Extracted BayesDown data structure for rain-sprinkler-grass example
Title Description line line_numbers indentation indentation_levels Parents Children instantiations priors posteriors No_Parent No_Children parent_instantiations
Grass_Wet Concentrated moisture on, between and around the blades of grass 3 [3] 0 [0] [Rain, Sprinkler] [] [grass_wet_TRUE, grass_wet_FALSE] {‘p(grass_wet_TRUE)’: ‘0.322’, ‘p(grass_wet_FALSE)’: ‘0.678’} {‘p(grass_wet_TRUE|sprinkler_TRUE,rain_TRUE)’: ‘0.99’, ‘p(grass_wet_TRUE|sprinkler_TRUE,rain_FALSE)’: ‘0.9’, ‘p(grass_wet_TRUE|sprinkler_FALSE,rain_TRUE)’: ‘0.8’, ‘p(grass_wet_TRUE|sprinkler_FALSE,rain_FALSE)’: ‘0.01’} False True [[rain_TRUE, rain_FALSE], [sprinkler_TRUE, sprinkler_FALSE]]
Rain Tears of angles crying high up in the skies hitting the ground 4 [4, 6] 2 [1, 2] [] [Grass_Wet, Sprinkler] [rain_TRUE, rain_FALSE] {‘p(rain_TRUE)’: ‘0.2’, ‘p(rain_FALSE)’: ‘0.8’} {} True False []
Sprinkler Activation of a centrifugal force based CO2 droplet distribution system 5 [5] 1 [1] [Rain] [Grass_Wet] [sprinkler_TRUE, sprinkler_FALSE] {‘p(sprinkler_TRUE)’: ‘0.44838’, ‘p(sprinkler_FALSE)’: ‘0.55162’} {‘p(sprinkler_TRUE|rain_TRUE)’: ‘0.01’, ‘p(sprinkler_TRUE|rain_FALSE)’: ‘0.4’} False False [[rain_TRUE, rain_FALSE]]

The successfully processed rain-sprinkler-grass example demonstrates several key capabilities:

Structure Preservation: The causal relationships—including the subtle influence of rain on sprinkler usage—are correctly captured and maintained throughout processing.

Probability Coherence: All probability distributions sum to 1.0, conditional probabilities are complete, and the values tell a plausible story.

Visual Clarity: The rendered network clearly shows rain as the root cause, influencing both sprinkler and grass, while sprinkler provides an additional pathway to wet grass.

Interactive Exploration: Users can click nodes to see detailed probabilities, drag to rearrange for clarity, and explore how changing parameters affects outcomes.

Inference Capability: The system correctly calculates derived probabilities like P(Rain|Grass_Wet)—the diagnostic reasoning from effect to cause that makes Bayesian networks so powerful.

This simple example validates the basic pipeline functionality. But the real test comes with complex, real-world arguments …

Rain-Sprinkler-Grass Network Rendering

Code
from IPython.display import IFrame

IFrame(src="https://singularitysmith.github.io/AMTAIR_Prototype/bayesian_network.html", width="100%", height="600px")

Dynamic Html Rendering of the Rain-Sprinkler-Grass DAG with Conditional Probabilities

3.5 Case Study: Carlsmith’s Power-Seeking AI Model

Having validated the implementation on the canonical rain-sprinkler-lawn example, I applied the AMTAIR approach to a substantially more complex real-world case: Joseph Carlsmith’s model of existential risk from power-seeking AI. This application demonstrates the system’s ability to handle sophisticated multi-level arguments with numerous variables and relationships.

Carlsmith’s model represents a dramatic increase in complexity—both conceptually and computationally. Where rain-sprinkler-grass has 3 nodes, Carlsmith involves 23. Where grass wetness is intuitive, “mesa-optimization” and “corrigibility” require careful thought.

3.5.1 Model Complexity

The numbers tell only part of the story:

  • 23 nodes: Each representing a substantive claim about AI development, deployment, or risk
  • 29 edges: Encoding causal relationships across technical, strategic, and societal domains
  • Multiple probability tables: Many nodes have several parents, creating combinatorial explosion
  • Six-level causal depth: From root causes to final catastrophe, influence propagates through multiple stages

But the conceptual complexity dwarfs the computational. Nodes like “APS-Systems” (Advanced, Planning, Strategically aware) encode specific technical hypotheses. Relationships like how “incentives to build” influence “deployment despite misalignment” require understanding of organizational behavior under competitive pressure.

This is no longer a toy problem but a serious attempt to formalize one of the most important arguments of our time.

3.5.2 Automated Extraction of the Carlsmith’s Argument Structure

The extraction process began with feeding Carlsmith’s paper to AMTAIR. Watching the system work felt like observing an archaeological excavation—layers of argument slowly revealed their structure.

The LLM prompts for extraction deserve special attention. Through iterative refinement, we developed prompts that guide extraction while remaining flexible:

#| label: prompt_template_function
#| echo: true
#| eval: true
#| fig-cap: "Prompt Template Function Definitions"
#| fig-link: "https://colab.research.google.com/github/VJMeyer/submission/blob/main/AMTAIR_Prototype/data/example_carlsmith/AMTAIR_Prototype_example_carlsmith.ipynb#scrollTo=MJpgdepF2Ug3&line=5&uniqifier=1"
#| fig-alt: "Prompt Template Function Definitions"
Code
# @title 1.2.0 --- Prompt Template Function Definitions --- [prompt_template_function]

"""
BLOCK PURPOSE: Defines a flexible template system for LLM prompts used in the extraction pipeline.

This block implements two key classes:
1. PromptTemplate: A template class supporting variable substitution for dynamic prompts
2. PromptLibrary: A collection of pre-defined prompt templates for different extraction tasks

These templates are used in the ArgDown and BayesDown probability extraction
stages of the pipeline, providing consistent and well-structured prompts to the LLMs.

DEPENDENCIES: string.Template for variable substitution
OUTPUTS: PromptTemplate and PromptLibrary classes
"""

from string import Template
from typing import Dict, Optional, Union, List

class PromptTemplate:
    """Template system for LLM prompts with variable substitution"""

    def __init__(self, template: str):
        """Initialize with template string using $variable format"""
        self.template = Template(template)

    def format(self, **kwargs) -> str:
        """Substitute variables in the template"""
        return self.template.safe_substitute(**kwargs)

    @classmethod
    def from_file(cls, filepath: str) -> 'PromptTemplate':
        """Load template from a file"""
        with open(filepath, 'r') as f:
            template = f.read()
        return cls(template)

class PromptLibrary:
    """Collection of prompt templates for different extraction tasks"""

    # ArgDown extraction prompt - transforms source text into structured argument map
    ARGDOWN_EXTRACTION = PromptTemplate("""
You are participating in the AMTAIR (Automating Transformative AI Risk Modeling)
project and you are tasked with converting natural language arguments into
ArgDown syntax by extracting and formalizing causal world models from
unstructured text.
Your specific task is to extract the implicit causal model from the provided
document in structured ArgDown format.

## Epistemic Foundation & Purpose

This extraction represents one possible interpretation of the implicit causal
model in the document. Multiple extractions from the same text help reveal
patterns of convergence (where the model is clearly articulated) and
divergence (where the model contains ambiguities). This approach acknowledges
that expert texts often contain implicit rather than explicit causal models.

Your role is to reveal the causal structure already present in the author's
thinking, maintaining epistemic humility about your interpretation while
adhering strictly to the required format.

## ArgDown Format Specification

### Core Syntax

ArgDown represents causal relationships using a hierarchical structure:

1. Variables appear in square brackets with descriptive text:
   `[Variable_Name]: Description of the variable.`

2. Causal relationships use indentation (2 spaces per level) and '+' symbols:

[Effect]: Description of effect. + [Cause]: Description of cause. + [Deeper_Cause]: Description of deeper cause.

3. Causality flows from bottom (more indented) to top (less indented):
- More indented variables (causes) influence less indented variables (effects)
- The top-level variable is the ultimate effect or outcome
- Deeper indentation levels represent root causes or earlier factors

4. Each variable must include JSON metadata with possible states (instantiations):
`[Variable]: Description. {"instantiations": ["variable_STATE1", "variable_STATE2"]}`

### JSON Metadata Format

The JSON metadata must follow this exact structure:

```json
{"instantiations": ["variable_STATE1", "variable_STATE2"]}

Requirements:
* Double quotes (not single) around field names and string values
* Square brackets enclosing the instantiations array
* Comma separation between array elements
* No trailing comma after the last element
* Must be valid JSON syntax that can be parsed by standard JSON parsers

For binary variables (most common case):
{"instantiations": ["variable_TRUE", "variable_FALSE"]}

For multi-state variables (when clearly specified in the text):
{"instantiations": ["variable_HIGH", "variable_MEDIUM", "variable_LOW"]}

The metadata must appear on the same line as the variable definition, after the description.
## Complex Structural Patterns
### Variables Influencing Multiple Effects
The same variable can appear multiple times in different places in the hierarchy if it influences multiple effects:
[Effect1]: First effect description. {"instantiations": ["effect1_TRUE", "effect1_FALSE"]}
  + [Cause_A]: Description of cause A. {"instantiations": ["cause_a_TRUE", "cause_a_FALSE"]}

[Effect2]: Second effect description. {"instantiations": ["effect2_TRUE", "effect2_FALSE"]}
  + [Cause_A]
  + [Cause_B]: Description of cause B. {"instantiations": ["cause_b_TRUE", "cause_b_FALSE"]}

### Multiple Causes of the Same Effect
Multiple causes can influence the same effect by being listed at the same indentation level:
[Effect]: Description of effect. {"instantiations": ["effect_TRUE", "effect_FALSE"]}
  + [Cause1]: Description of first cause. {"instantiations": ["cause1_TRUE", "cause1_FALSE"]}
  + [Cause2]: Description of second cause. {"instantiations": ["cause2_TRUE", "cause2_FALSE"]}
    + [Deeper_Cause]: A cause that influences Cause2. {"instantiations": ["deeper_cause_TRUE", "deeper_cause_FALSE"]}

### Causal Chains
Causal chains are represented through multiple levels of indentation:
[Ultimate_Effect]: The final outcome. {"instantiations": ["ultimate_effect_TRUE", "ultimate_effect_FALSE"]}
  + [Intermediate_Effect]: A mediating variable. {"instantiations": ["intermediate_effect_TRUE", "intermediate_effect_FALSE"]}
    + [Root_Cause]: The initial cause. {"instantiations": ["root_cause_TRUE", "root_cause_FALSE"]}
  + [2nd_Intermediate_Effect]: A mediating variable. {"instantiations": ["intermediate_effect_TRUE", "intermediate_effect_FALSE"]}


### Common Cause of Multiple Variables
A common cause affecting multiple variables is represented by referencing the same variable in multiple places:
[Effect1]: First effect description. {"instantiations": ["effect1_TRUE", "effect1_FALSE"]}
  + [Common_Cause]: Description of common cause. {"instantiations": ["common_cause_TRUE", "common_cause_FALSE"]}

[Effect2]: Second effect description. {"instantiations": ["effect2_TRUE", "effect2_FALSE"]}
  + [Common_Cause]

## Detailed Extraction Workflow
Please follow this step-by-step process, documenting your reasoning in XML tags:
<analysis>
First, conduct a holistic analysis of the document:
1. Identify the main subject matter or domain
2. Note key concepts, variables, and factors discussed
3. Pay attention to language indicating causal relationships (causes, affects, influences, depends on, etc.)
4. Look for the ultimate outcomes or effects that are the focus of the document
5. Record your general understanding of the document's implicit causal structure
</analysis>
<variable_identification>
Next, identify and list the key variables in the causal model:
* Focus on factors that are discussed as having an influence or being influenced
* For each variable:
  * Create a descriptive name in [square_brackets]
  * Write a concise description based directly on the text
  * Determine possible states (usually binary TRUE/FALSE unless clearly specified)
* Distinguish between:
  * Outcome variables (effects the author is concerned with)
  * Intermediate variables (both causes and effects in chains)
  * Root cause variables (exogenous factors in the model)
* List all identified variables with their descriptions and possible states
</variable_identification>

<causal_structure>
Then, determine the causal relationships between variables:
* For each variable, identify what factors influence it
* Note the direction of causality (what causes what)
* Look for mediating variables in causal chains
* Identify common causes of multiple effects
* Capture feedback loops if present (though they must be represented as DAGs)
* Map out the hierarchical structure of the causal model
</causal_structure>

<format_conversion>
Now, convert your analysis into proper ArgDown format:
* Start with the ultimate outcome variables at the top level
* Place direct causes indented below with \+ symbols
* Continue with deeper causes at further indentation levels
* Add variable descriptions and instantiations metadata
* Ensure variables appearing in multiple places have consistent names
* Check that the entire structure forms a valid directed acyclic graph
</format_conversion>

<validation>

Finally, review your extraction for quality and format correctness:
1. Verify all variables have properly formatted metadata
2. Check that indentation properly represents causal direction
3. Confirm the extraction accurately reflects the document's implicit model
4. Ensure no cycles exist in the causal structure
5. Verify that variables referenced multiple times are consistent
6. Check that the extraction would be useful for subsequent analysis

</validation>


## Source Document Analysis Guidance
When analyzing the source document:
* Focus on revealing the author's own causal model, not imposing an external framework
* Maintain the author's terminology where possible
* Look for both explicit statements of causality and implicit assumptions
* Pay attention to the relative importance the author assigns to different factors
* Notice where the author expresses certainty versus uncertainty
* Consider the level of granularity appropriate to the document's own analysis

Remember that your goal is to make the implicit model explicit, not to evaluate or improve it.
The value lies in accurately representing the author's perspective, even if you might personally disagree or see limitations in their model.

""")

    # BayesDown probability extraction prompt - enhances ArgDown with probability information
    BAYESDOWN_EXTRACTION = PromptTemplate("""
You are an expert in probabilistic reasoning and Bayesian networks. Your task is
to extend the provided ArgDown structure with probability information,
creating a BayesDown representation.

For each statement in the ArgDown structure, you need to:
1. Estimate prior probabilities for each possible state
2. Estimate conditional probabilities given parent states
3. Maintain the original structure and relationships

Here is the format to follow:
[Node]: Description. { "instantiations": ["node_TRUE", "node_FALSE"], "priors": { "p(node_TRUE)": "0.7", "p(node_FALSE)": "0.3" }, "posteriors": { "p(node_TRUE|parent_TRUE)": "0.9", "p(node_TRUE|parent_FALSE)": "0.4", "p(node_FALSE|parent_TRUE)": "0.1", "p(node_FALSE|parent_FALSE)": "0.6" } }
 [Parent]: Parent description. {...}


Here are the specific probability questions to answer:
$questions

ArgDown structure to enhance:
$argdown

Provide the complete BayesDown representation with probabilities:
""")

    @classmethod
    def get_template(cls, template_name: str) -> PromptTemplate:
        """Get a prompt template by name"""
        if hasattr(cls, template_name):
            return getattr(cls, template_name)
        else:
            raise ValueError(f"Template not found: {template_name}")

Prompting LLMs for ArgDown Extraction

The extraction revealed Carlsmith’s elegant decomposition. At the highest level: capabilities enable power-seeking, which enables disempowerment, which constitutes catastrophe. But the details matter—deployment decisions mediated by incentives and deception, alignment difficulty influenced by multiple technical factors, corrective mechanisms that might interrupt the chain.

The ArgDown representation captured this structure:

Code
# @title 1.7.0 --- Parsing ArgDown & BayesDown (.md to .csv) --- [parsing_argdown_bayesdown]

"""
BLOCK PURPOSE: Provides the core parsing functionality for transforming ArgDown
and BayesDown text representations into structured DataFrame format for further
processing.

This block implements the critical extraction pipeline described in the AMTAIR
project (see PY_TechnicalImplementation) that converts argument structures
into Bayesian networks.
The function can handle both basic ArgDown (structure-only) and
BayesDown (with probabilities).

Key steps in the parsing process:
1. Remove comments from the markdown text
2. Extract titles, descriptions, and indentation levels
3. Establish parent-child relationships based on indentation
4. Convert the structured information into a DataFrame
5. Add derived columns for network analysis

DEPENDENCIES: pandas, re, json libraries
INPUTS: Markdown text in ArgDown/BayesDown format
OUTPUTS: Structured DataFrame with node information, relationships, and properties
"""

def parse_markdown_hierarchy_fixed(markdown_text, ArgDown=False):
    """
    Parse ArgDown or BayesDown format into a structured DataFrame with parent-child relationships.

    Args:
        markdown_text (str): Text in ArgDown or BayesDown format
        ArgDown (bool): If True, extracts only structure without probabilities
                        If False, extracts both structure and probability information

    Returns:
        pandas.DataFrame: Structured data with node information, relationships, and attributes
    """
    # PHASE 1: Clean and prepare the text
    clean_text = remove_comments(markdown_text)

    # PHASE 2: Extract basic information about nodes
    titles_info = extract_titles_info(clean_text)

    # PHASE 3: Determine the hierarchical relationships
    titles_with_relations = establish_relationships_fixed(titles_info, clean_text)

    # PHASE 4: Convert to structured DataFrame format
    df = convert_to_dataframe(titles_with_relations, ArgDown)

    # PHASE 5: Add derived columns for analysis
    df = add_no_parent_no_child_columns_to_df(df)
    df = add_parents_instantiation_columns_to_df(df)

    return df

def remove_comments(markdown_text):
    """
    Remove comment blocks from markdown text using regex pattern matching.

    Args:
        markdown_text (str): Text containing potential comment blocks

    Returns:
        str: Text with comment blocks removed
    """
    # Remove anything between /* and */ using regex
    return re.sub(r'/\*.*?\*/', '', markdown_text, flags=re.DOTALL)

def extract_titles_info(text):
    """
    Extract titles with their descriptions and indentation levels from markdown text.

    Args:
        text (str): Cleaned markdown text

    Returns:
        dict: Dictionary with titles as keys and dictionaries of attributes as values
    """
    lines = text.split('\n')
    titles_info = {}

    for line in lines:
        # Skip empty lines
        if not line.strip():
            continue

        # Extract title within square or angle brackets
        title_match = re.search(r'[<\[](.+?)[>\]]', line)
        if not title_match:
            continue

        title = title_match.group(1)

        # Extract description and metadata
        title_pattern_in_line = r'[<\[]' + re.escape(title) + r'[>\]]:'
        description_match = re.search(title_pattern_in_line + r'\s*(.*)', line)

        if description_match:
            full_text = description_match.group(1).strip()

            # Split description and metadata at the first "{"
            if "{" in full_text:
                split_index = full_text.find("{")
                description = full_text[:split_index].strip()
                metadata = full_text[split_index:].strip()
            else:
                # Keep the entire description and no metadata
                description = full_text
                metadata = ''  # Initialize as empty string
        else:
            description = ''
            metadata = ''  # Ensure metadata is initialized

        # Calculate indentation level based on spaces before + or - symbol
        indentation = 0
        if '+' in line:
            symbol_index = line.find('+')
            # Count spaces before the '+' symbol
            i = symbol_index - 1
            while i >= 0 and line[i] == ' ':
                indentation += 1
                i -= 1
        elif '-' in line:
            symbol_index = line.find('-')
            # Count spaces before the '-' symbol
            i = symbol_index - 1
            while i >= 0 and line[i] == ' ':
                indentation += 1
                i -= 1

        # If neither symbol exists, indentation remains 0

        if title in titles_info:
            # Only update description if it's currently empty and we found a new one
            if not titles_info[title]['description'] and description:
                titles_info[title]['description'] = description

            # Store all indentation levels for this title
            titles_info[title]['indentation_levels'].append(indentation)

            # Keep max indentation for backward compatibility
            if indentation > titles_info[title]['indentation']:
                titles_info[title]['indentation'] = indentation

            # Do NOT update metadata here - keep the original metadata
        else:
            # First time seeing this title, create a new entry
            titles_info[title] = {
                'description': description,
                'indentation': indentation,
                'indentation_levels': [indentation],  # Initialize with first indentation level
                'parents': [],
                'children': [],
                'line': None,
                'line_numbers': [],  # Initialize an empty list for all occurrences
                'metadata': metadata  # Set metadata explicitly from what we found
            }

    return titles_info

def establish_relationships_fixed(titles_info, text):
    """
    Establish parent-child relationships between titles using BayesDown
    indentation rules.

    In BayesDown syntax:
    - More indented nodes (with + symbol) are PARENTS of less indented nodes
    - The relationship reads as "Effect is caused by Cause" (Effect + Cause)
    - This aligns with how Bayesian networks represent causality

    Args:
        titles_info (dict): Dictionary with information about titles
        text (str): Original markdown text (for identifying line numbers)

    Returns:
        dict: Updated dictionary with parent-child relationships
    """
    lines = text.split('\n')

    # Dictionary to store line numbers for each title occurrence
    title_occurrences = {}

    # Record line number for each title (including multiple occurrences)
    line_number = 0
    for line in lines:
        if not line.strip():
            line_number += 1
            continue

        title_match = re.search(r'[<\[](.+?)[>\]]', line)
        if not title_match:
            line_number += 1
            continue

        title = title_match.group(1)

        # Store all occurrences of each title with their line numbers
        if title not in title_occurrences:
            title_occurrences[title] = []
        title_occurrences[title].append(line_number)

        # Store all line numbers where this title appears
        if 'line_numbers' not in titles_info[title]:
            titles_info[title]['line_numbers'] = []
        titles_info[title]['line_numbers'].append(line_number)

        # For backward compatibility, keep the first occurrence in 'line'
        if titles_info[title]['line'] is None:
            titles_info[title]['line'] = line_number

        line_number += 1

    # Create an ordered list of all title occurrences with their line numbers
    all_occurrences = []
    for title, occurrences in title_occurrences.items():
        for line_num in occurrences:
            all_occurrences.append((title, line_num))

    # Sort occurrences by line number
    all_occurrences.sort(key=lambda x: x[1])

    # Get indentation for each occurrence
    occurrence_indents = {}
    for title, line_num in all_occurrences:
        for line in lines[line_num:line_num+1]:  # Only check the current line
            indent = 0
            if '+' in line:
                symbol_index = line.find('+')
                # Count spaces before the '+' symbol
                j = symbol_index - 1
                while j >= 0 and line[j] == ' ':
                    indent += 1
                    j -= 1
            elif '-' in line:
                symbol_index = line.find('-')
                # Count spaces before the '-' symbol
                j = symbol_index - 1
                while j >= 0 and line[j] == ' ':
                    indent += 1
                    j -= 1
            occurrence_indents[(title, line_num)] = indent

    # Enhanced backward pass for correct parent-child relationships
    for i, (title, line_num) in enumerate(all_occurrences):
        current_indent = occurrence_indents[(title, line_num)]

        # Skip root nodes (indentation 0) for processing
        if current_indent == 0:
            continue

        # Look for the immediately preceding node with lower indentation
        j = i - 1
        while j >= 0:
            prev_title, prev_line = all_occurrences[j]
            prev_indent = occurrence_indents[(prev_title, prev_line)]

            # If we find a node with less indentation, it's a child of current node
            if prev_indent < current_indent:
                # In BayesDown:
                # More indented node is a parent (cause) of less indented node (effect)
                if title not in titles_info[prev_title]['parents']:
                    titles_info[prev_title]['parents'].append(title)
                if prev_title not in titles_info[title]['children']:
                    titles_info[title]['children'].append(prev_title)

                # Only need to find the immediate child
                # (closest preceding node with lower indentation)
                break

            j -= 1

    return titles_info

def convert_to_dataframe(titles_info, ArgDown):
    """
    Convert the titles information dictionary to a pandas DataFrame.

    Args:
        titles_info (dict): Dictionary with information about titles
        ArgDown (bool): If True, extract only structural information without probabilities

    Returns:
        pandas.DataFrame: Structured data with node information and relationships
    """
    if ArgDown == True:
        # For ArgDown, exclude probability columns
        df = pd.DataFrame(columns=['Title', 'Description', 'line', 'line_numbers', 'indentation',
                               'indentation_levels', 'Parents', 'Children', 'instantiations'])
    else:
        # For BayesDown, include probability columns
        df = pd.DataFrame(columns=['Title', 'Description', 'line', 'line_numbers', 'indentation',
                               'indentation_levels', 'Parents', 'Children', 'instantiations',
                               'priors', 'posteriors'])

    for title, info in titles_info.items():
        # Parse the metadata JSON string into a Python dictionary
        if 'metadata' in info and info['metadata']:
            try:
                # Only try to parse if metadata is not empty
                if info['metadata'].strip():
                    jsonMetadata = json.loads(info['metadata'])
                    if ArgDown == True:
                        # Create the row dictionary with instantiations as
                        # metadata only, no probabilities yet
                        row = {
                            'Title': title,
                            'Description': info.get('description', ''),
                            'line': info.get('line',''),
                            'line_numbers': info.get('line_numbers', []),
                            'indentation': info.get('indentation',''),
                            'indentation_levels': info.get('indentation_levels', []),
                            'Parents': info.get('parents', []),
                            'Children': info.get('children', []),
                            # Extract specific metadata fields,
                            # defaulting to empty if not present
                            'instantiations': jsonMetadata.get('instantiations', []),
                        }
                    else:
                        # Create dict with probabilities for BayesDown
                        row = {
                            'Title': title,
                            'Description': info.get('description', ''),
                            'line': info.get('line',''),
                            'line_numbers': info.get('line_numbers', []),
                            'indentation': info.get('indentation',''),
                            'indentation_levels': info.get('indentation_levels', []),
                            'Parents': info.get('parents', []),
                            'Children': info.get('children', []),
                            # Extract specific metadata fields, defaulting to empty if not present
                            'instantiations': jsonMetadata.get('instantiations', []),
                            'priors': jsonMetadata.get('priors', {}),
                            'posteriors': jsonMetadata.get('posteriors', {})
                        }
                else:
                    # Empty metadata case
                    row = {
                        'Title': title,
                        'Description': info.get('description', ''),
                        'line': info.get('line',''),
                        'line_numbers': info.get('line_numbers', []),
                        'indentation': info.get('indentation',''),
                        'indentation_levels': info.get('indentation_levels', []),
                        'Parents': info.get('parents', []),
                        'Children': info.get('children', []),
                        'instantiations': [],
                        'priors': {},
                        'posteriors': {}
                    }
            except json.JSONDecodeError:
                # Handle case where metadata isn't valid JSON
                row = {
                    'Title': title,
                    'Description': info.get('description', ''),
                    'line': info.get('line',''),
                    'line_numbers': info.get('line_numbers', []),
                    'indentation': info.get('indentation',''),
                    'indentation_levels': info.get('indentation_levels', []),
                    'Parents': info.get('parents', []),
                    'Children': info.get('children', []),
                    'instantiations': [],
                    'priors': {},
                    'posteriors': {}
                }
        else:
            # Handle case where metadata field doesn't exist or is empty
            row = {
                'Title': title,
                'Description': info.get('description', ''),
                'line': info.get('line',''),
                'line_numbers': info.get('line_numbers', []),
                'indentation': info.get('indentation',''),
                'indentation_levels': info.get('indentation_levels', []),
                'Parents': info.get('parents', []),
                'Children': info.get('children', []),
                'instantiations': [],
                'priors': {},
                'posteriors': {}
            }

        # Add the row to the DataFrame
        df.loc[len(df)] = row

    return df

def add_no_parent_no_child_columns_to_df(dataframe):
    """
    Add No_Parent and No_Children boolean columns to the DataFrame to
    identify root and leaf nodes.

    Args:
        dataframe (pandas.DataFrame): The DataFrame to enhance

    Returns:
        pandas.DataFrame: Enhanced DataFrame with additional boolean columns
    """
    no_parent = []
    no_children = []

    for _, row in dataframe.iterrows():
        no_parent.append(not row['Parents'])  # True if Parents list is empty
        no_children.append(not row['Children'])  # True if Children list is empty

    dataframe['No_Parent'] = no_parent
    dataframe['No_Children'] = no_children

    return dataframe

def add_parents_instantiation_columns_to_df(dataframe):
    """
    Add all possible instantiations of parents as a list of lists column
    to the DataFrame.
    This is crucial for generating conditional probability tables.

    Args:
        dataframe (pandas.DataFrame): The DataFrame to enhance

    Returns:
        pandas.DataFrame: Enhanced DataFrame with parent_instantiations column
    """
    # Create a new column to store parent instantiations
    parent_instantiations = []

    # Iterate through each row in the dataframe
    for _, row in dataframe.iterrows():
        parents = row['Parents']
        parent_insts = []

        # For each parent, find its instantiations and add to the list
        for parent in parents:
            # Find the row where Title matches the parent
            parent_row = dataframe[dataframe['Title'] == parent]

            # If parent found in the dataframe
            if not parent_row.empty:
                # Get the instantiations of this parent
                parent_instantiation = parent_row['instantiations'].iloc[0]
                parent_insts.append(parent_instantiation)

        # Add the list of parent instantiations to our new column
        parent_instantiations.append(parent_insts)

    # Add the new column to the dataframe
    dataframe['parent_instantiations'] = parent_instantiations

    return dataframe

The structure revealed insights. “Misaligned_Power_Seeking” emerged as a critical hub, influenced by multiple factors and influencing multiple outcomes. The pathway from incentives through deployment to risk became explicit.

3.5.3 From ArgDown to BayesDown in Carlsmith’s Model

Adding probabilities to Carlsmith’s structure presented unique challenges. Unlike rain-sprinkler probabilities that have intuitive values, what’s the probability of “mesa-optimization” or “deceptive alignment”?

The system generated over 100 probability questions for the full model.

Each question targets a specific parameter needed for the Bayesian network. The conditional structure reflects Carlsmith’s argument—deployment depends on both incentives (external pressure) and deception (hidden misalignment).

The LLM extraction drew on Carlsmith’s explicit estimates where available and inferred reasonable values elsewhere. The result captured both the structure and Carlsmith’s quantitative risk assessment:

[Deployment_Decisions]: Decisions to deploy potentially misaligned AI systems. {
  "instantiations": ["deployment_decisions_DEPLOY", "deployment_decisions_WITHHOLD"],
  "priors": {
    "p(deployment_decisions_DEPLOY)": "0.70",
    "p(deployment_decisions_WITHHOLD)": "0.30"
  },
  "posteriors": {
    "p(deployment_decisions_DEPLOY|incentives_to_build_aps_STRONG, deception_by_ai_TRUE)": "0.90",
    "p(deployment_decisions_DEPLOY|incentives_to_build_aps_STRONG, deception_by_ai_FALSE)": "0.75",
    "p(deployment_decisions_DEPLOY|incentives_to_build_aps_WEAK, deception_by_ai_TRUE)": "0.60",
    "p(deployment_decisions_DEPLOY|incentives_to_build_aps_WEAK, deception_by_ai_FALSE)": "0.30"
  }
}

This node has two possible states (DEPLOY or WITHHOLD), prior probabilities for each state, and conditional probabilities based on different combinations of its parent variables (“Incentives_To_Build_APS” and “Deception_By_AI”). The probabilities tell a plausible story: deployment becomes more likely with stronger incentives and successful deception, but even without deception, strong incentives create substantial deployment probability.

Along with these questions the following prompt is sent to the LLM:


You are an expert in probabilistic reasoning and Bayesian networks. Your task is
to extend the provided ArgDown structure with probability information,
creating a BayesDown representation.

For each statement in the ArgDown structure, you need to:
1. Estimate prior probabilities for each possible state
2. Estimate conditional probabilities given parent states
3. Maintain the original structure and relationships

Here is the format to follow:
[Node]: Description. { "instantiations": ["node_TRUE", "node_FALSE"], "priors": { "p(node_TRUE)": "0.7", "p(node_FALSE)": "0.3" }, "posteriors": { "p(node_TRUE|parent_TRUE)": "0.9", "p(node_TRUE|parent_FALSE)": "0.4", "p(node_FALSE|parent_TRUE)": "0.1", "p(node_FALSE|parent_FALSE)": "0.6" } }
 [Parent]: Parent description. {...}


Here are the specific probability questions to answer:
$questions

ArgDown structure to enhance:
$argdown

Provide the complete BayesDown representation with probabilities:

Example BayesDown Excerpt from the Carlsmith model

#| label: json_carlsmith_excerpt
#| echo: true
#| eval: true
#| fig-cap: "Example BayesDown Excerpt from the Carlsmith model"
#| fig-link: "https://colab.research.google.com/github/VJMeyer/submission/blob/main/AMTAIR_Prototype/data/example_carlsmith/AMTAIR_Prototype_example_carlsmith.ipynb#scrollTo=AFnu_1Ludahi"
#| fig-alt: "Example BayesDown Excerpt from the Carlsmith model"



[Existential_Catastrophe]: The destruction of humanity's long-term potential due to AI systems we've lost control over. {
  "instantiations": ["existential_catastrophe_TRUE", "existential_catastrophe_FALSE"],
  "priors": {"p(existential_catastrophe_TRUE)": "0.05", "p(existential_catastrophe_FALSE)": "0.95"},
  "posteriors": {
    "p(existential_catastrophe_TRUE|human_disempowerment_TRUE)": "0.95",
    "p(existential_catastrophe_TRUE|human_disempowerment_FALSE)": "0.0"
  }
}
 + [Human_Disempowerment]: Permanent and collective disempowerment of humanity relative to AI systems. {
   "instantiations": ["human_disempowerment_TRUE", "human_disempowerment_FALSE"],
   "priors": {"p(human_disempowerment_TRUE)": "0.208", "p(human_disempowerment_FALSE)": "0.792"},
   "posteriors": {
     "p(human_disempowerment_TRUE|scale_of_power_seeking_TRUE)": "1.0",
     "p(human_disempowerment_TRUE|scale_of_power_seeking_FALSE)": "0.0"
   }
 }

This excerpt from the Carlsmith model representation illustrates how BayesDown preserves both the narrative description (“The destruction of humanity’s long-term potential…”) and the precise probability judgments. Someone without technical background can still understand the core claims and their relationships, while someone seeking quantitative precision can find exact probability values.

The format supports multiple levels of engagement. At the most basic level, readers can follow the hierarchical structure to understand causal relationships between factors. At an intermediate level, they can examine probability judgments to assess the strength of different influences. At the most technical level, they can analyze the complete probabilistic model to perform inference and sensitivity analysis.

3.5.4 Practically Meaningful BayesDown

The BayesDown representation achieves something remarkable: it bridges the chasm between Carlsmith’s nuanced prose and mathematical formalism without losing the essence of either.

Consider what this bridge enables:

For Technical Researchers: The formal structure makes assumptions explicit. Is power-seeking really independent of capability level given strategic awareness? The model forces clarity.

For Policymakers: Probabilities attached to comprehensible descriptions provide actionable intelligence. “70% chance of deployment despite misalignment” translates better than abstract concerns.

For Strategic Analysts: The network structure reveals intervention points. Which nodes, if changed, most affect the final outcome? Where should we focus effort?

The hybrid nature—natural language plus formal structure plus probabilities—serves each audience while enabling communication between them. A policymaker can understand “deployment decisions” without probability theory. A researcher can analyze the mathematical model without losing sight of what the variables mean.

This isn’t just convenient—it’s essential for coordination. When different communities can refer to the same model but engage with it at their appropriate level of technical detail, we create common ground for productive disagreement and collaborative problem-solving.

3.5.5 Interactive Visualization and Exploration

The moment when Carlsmith’s model first rendered as an interactive network felt like putting on glasses after years of squinting. Suddenly, the complex web of relationships became navigable.

The visualization system employs multiple visual channels simultaneously:

Color Coding: Nodes shift from deep red (low probability) through yellow to bright green (high probability). At a glance, you see which factors Carlsmith considers likely versus speculative.

Border Styling: Blue borders mark root causes (like “Incentives_To_Build”), purple indicates intermediate nodes, magenta highlights final outcomes. The visual grammar guides the eye through causal flow.

Layout Algorithm: Initial placement uses causal depth—root causes at bottom, final outcomes at top. Physics simulation then refines positions to minimize edge crossings while preserving hierarchical structure.

Progressive Disclosure: Hovering reveals probability summaries. Clicking opens detailed conditional probability tables. Dragging allows custom arrangement. Each interaction level serves different analytical needs.

The figure below shows the interactive visualization of Carlsmith’s model, highlighting how color, border styling, and layout work together to represent complex causal relationships:

Code
# @title 4.4.0 --- Main Visualization Function --- [main_visualization_function]

def create_bayesian_network_with_probabilities(df):
    """
    Create an interactive Bayesian network visualization with enhanced
    probability visualization and node classification based on network structure.
    """
    # Create a directed graph
    G = nx.DiGraph()

    # Add nodes with proper attributes
    for idx, row in df.iterrows():
        title = row['Title']
        description = row['Description']

        # Process probability information
        priors = get_priors(row)
        instantiations = get_instantiations(row)

        # Add node with base information
        G.add_node(
            title,
            description=description,
            priors=priors,
            instantiations=instantiations,
            posteriors=get_posteriors(row)
        )

    # Add edges
    for idx, row in df.iterrows():
        child = row['Title']
        parents = get_parents(row)

        # Add edges from each parent to this child
        for parent in parents:
            if parent in G.nodes():
                G.add_edge(parent, child)

    # Classify nodes based on network structure
    classify_nodes(G)

    # Create network visualization
    net = Network(notebook=True, directed=True, cdn_resources="in_line", height="600px", width="100%")

    # Configure physics for better layout
    net.force_atlas_2based(gravity=-50, spring_length=100, spring_strength=0.02)
    net.show_buttons(filter_=['physics'])

    # Add the graph to the network
    net.from_nx(G)

    # Enhance node appearance with probability information and classification
    for node in net.nodes:
        node_id = node['id']
        node_data = G.nodes[node_id]

        # Get node type and set border color
        node_type = node_data.get('node_type', 'unknown')
        border_color = get_border_color(node_type)

        # Get probability information
        priors = node_data.get('priors', {})
        true_prob = priors.get('true_prob', 0.5) if priors else 0.5

        # Get proper state names
        instantiations = node_data.get('instantiations', ["TRUE", "FALSE"])
        true_state = instantiations[0] if len(instantiations) > 0 else "TRUE"
        false_state = instantiations[1] if len(instantiations) > 1 else "FALSE"

        # Create background color based on probability
        background_color = get_probability_color(priors)

        # Create tooltip with probability information
        tooltip = create_tooltip(node_id, node_data)

        # Create a simpler node label with probability
        simple_label = f"{node_id}\np={true_prob:.2f}"

        # Store expanded content as a node attribute for use in click handler
        node_data['expanded_content'] = create_expanded_content(node_id, node_data)

        # Set node attributes
        node['title'] = tooltip  # Tooltip HTML
        node['label'] = simple_label  # Simple text label
        node['shape'] = 'box'
        node['color'] = {
            'background': background_color,
            'border': border_color,
            'highlight': {
                'background': background_color,
                'border': border_color
            }
        }

    # Set up the click handler with proper data
    setup_data = {
        'nodes_data': {node_id: {
            'expanded_content': json.dumps(G.nodes[node_id].get('expanded_content', '')),
            'description': G.nodes[node_id].get('description', ''),
            'priors': G.nodes[node_id].get('priors', {}),
            'posteriors': G.nodes[node_id].get('posteriors', {})
        } for node_id in G.nodes()}
    }

    # Add custom click handling JavaScript
    click_js = """
    // Store node data for click handling
    var nodesData = %s;

    // Add event listener for node clicks
    network.on("click", function(params) {
        if (params.nodes.length > 0) {
            var nodeId = params.nodes[0];
            var nodeInfo = nodesData[nodeId];

            if (nodeInfo) {
                // Create a modal popup for expanded content
                var modal = document.createElement('div');
                modal.style.position = 'fixed';
                modal.style.left = '50%%';
                modal.style.top = '50%%';
                modal.style.transform = 'translate(-50%%, -50%%)';
                modal.style.backgroundColor = 'white';
                modal.style.padding = '20px';
                modal.style.borderRadius = '5px';
                modal.style.boxShadow = '0 0 10px rgba(0,0,0,0.5)';
                modal.style.zIndex = '1000';
                modal.style.maxWidth = '80%%';
                modal.style.maxHeight = '80%%';
                modal.style.overflow = 'auto';

                // Parse the JSON string back to HTML content
                try {
                    var expandedContent = JSON.parse(nodeInfo.expanded_content);
                    modal.innerHTML = expandedContent;
                } catch (e) {
                    modal.innerHTML = 'Error displaying content: ' + e.message;
                }

                // Add close button
                var closeBtn = document.createElement('button');
                closeBtn.innerHTML = 'Close';
                closeBtn.style.marginTop = '10px';
                closeBtn.style.padding = '5px 10px';
                closeBtn.style.cursor = 'pointer';
                closeBtn.onclick = function() {
                    document.body.removeChild(modal);
                };
                modal.appendChild(closeBtn);

                // Add modal to body
                document.body.appendChild(modal);
            }
        }
    });
    """ % json.dumps(setup_data['nodes_data'])

    # Save the graph to HTML
    html_file = "bayesian_network.html"
    net.save_graph(html_file)

    # Inject custom click handling into HTML
    try:
        with open(html_file, "r") as f:
            html_content = f.read()

        # Insert click handling script before the closing body tag
        html_content = html_content.replace('</body>', f'<script>{click_js}</script></body>')

        # Write back the modified HTML
        with open(html_file, "w") as f:
            f.write(html_content)

        return HTML(html_content)
    except Exception as e:
        return HTML(f"<p>Error rendering HTML: {str(e)}</p>"
        + "<p>The network visualization has been saved to '{html_file}'</p>")

The resulting visualization transforms abstract relationships into tangible understanding. Users report “aha” moments when exploring—suddenly seeing how technical factors compound into strategic risks, or identifying previously unnoticed bottlenecks in the causal chain.

This visualization reveals several structural insights:

  1. Central importance of “Misaligned_Power_Seeking” as a hub node with multiple parents and children
  2. Multiple pathways to “Existential_Catastrophe” through different intermediate factors
  3. Clusters of related variables forming coherent subarguments (e.g., factors affecting alignment difficulty)
  4. Flow of influence from technical factors (bottom) through deployment decisions to ultimate outcomes (top)

The implementation successfully handles the complexity of Carlsmith’s model, correctly processing the multi-level structure, resolving repeated node references, and calculating appropriate probability distributions. The interactive visualization makes this complex model accessible, allowing users to explore different aspects of the argument through intuitive navigation.

Several key aspects of the implementation were particularly important for handling this complex model:

  1. The parent-child relationship detection algorithm correctly identified hierarchical relationships despite the complex structure with repeated nodes and multiple levels.

  2. The probability question generation system created appropriate questions for all variables, including those with multiple parents requiring factorial combinations of conditional probabilities.

  3. The network enhancement functions calculated useful metrics like centrality measures and Markov blankets that help interpret the model structure.

  4. The visualization system effectively presented the complex network through color-coding, interactive exploration, and progressive disclosure of details.

The successful application to Carlsmith’s model demonstrates the AMTAIR approach’s scalability to complex real-world arguments. While the canonical rain-sprinkler-lawn example validated correctness, this application proves practical utility for sophisticated multi-level arguments with dozens of variables and complex interdependencies—precisely the kind of arguments that characterize AI risk assessments.

This capability addresses a core limitation of the original MTAIR framework: the labor intensity of manual formalization. Where manually converting Carlsmith’s argument to a formal model might take days of expert time, the AMTAIR approach accomplished this in minutes, creating a foundation for further analysis and exploration.

3.5.6 Validation Against Original (From the MTAIR Project)

Validating AMTAIR’s extraction required careful comparison with expert judgment. While comprehensive benchmarking remains future work, preliminary validation efforts provide encouraging signals.

Manual Baseline Creation: Johannes Meyer and Jelena Meyer, independently extracted ArgDown and BayesDown representations from Carlsmith’s paper and Bucknall and Dori-Hacohen’s. This created ground truth accounting for legitimate interpretive variation—experts might reasonably disagree on some structural choices or probability estimates.

Structural Comparison: Comparing extracted causal structures revealed high agreement on core relationships. AMTAIR consistently identified the main causal chain from capabilities through deployment to catastrophe. Some variation appeared in handling of auxiliary factors—where one expert might include a minor influence, another might omit it for simplicity.

Probability Assessment: Probability extraction showed greater variation, reflecting inherent ambiguity in translating qualitative language. When Carlsmith writes “likely,” different readers might reasonably interpret this as 0.7, 0.75, or 0.8. AMTAIR’s extractions fell within the range of expert interpretations, suggesting successful capture of intended meaning even if not identical numbers.

Semantic Preservation: Most importantly, the formal models preserved the essential insights of Carlsmith’s argument. The critical role of deployment decisions, the compound nature of risk, the importance of technical and strategic factors—all emerged clearly in the extracted representations.

An ideal validation protocol would expand this approach:

  1. Multiple expert extractors working independently
  2. Systematic comparison of structural and quantitative agreement
  3. Analysis of where and why extractions diverge
  4. Testing whether different extractions lead to different policy conclusions
  5. Iterative refinement based on identified failure modes

The goal isn’t perfect agreement—even human experts disagree. Rather, we seek extractions good enough to support meaningful analysis while acknowledging their limitations.

3.6 Validation Methodology

Building trust in automated extraction requires more than anecdotal success2. We need systematic validation that honestly assesses both capabilities and limitations.

3.6.1 Ground Truth Construction

Creating ground truth for argument extraction poses unique challenges. Unlike named entity recognition or sentiment analysis, argument structure lacks universal standards. What constitutes the “correct” extraction from a complex text?

An ideal validation approach would embrace this inherent subjectivity:

Expert Selection: Recruit 5-10 domain experts with demonstrated expertise in both AI safety and formal modeling. Diversity matters—include technical researchers, policy analysts, and those with mixed backgrounds.

Extraction Protocol: Provide standardized training on ArgDown/BayesDown syntax while allowing flexibility in interpretation. Experts work independently to avoid anchoring bias, documenting their reasoning process alongside final extractions.

Consensus Building: Through structured discussion, identify areas of convergence (likely core argument structure) versus legitimate disagreement (interpretive choices, granularity decisions). This distinguishes system errors from inherent ambiguity.

Quality Metrics: Rather than binary correct/incorrect judgments, assess:

  • Structural similarity (graph edit distance)
  • Probability distribution overlap (KL divergence)
  • Semantic preservation (expert ratings)
  • Downstream task performance (policy analysis agreement)

The resulting dataset would capture not a single “truth” but a distribution of reasonable interpretations against which to evaluate automated extraction.

3.6.2 Evaluation Metrics

Evaluating argument extraction requires metrics that capture multiple dimensions of quality:

Structural Fidelity:

  • Node identification: What fraction of expert-identified variables does the system extract?
  • Edge accuracy: Are causal relationships preserved?
  • Hierarchy preservation: Does the system maintain argument levels?

Probability Calibration:

  • Explicit extraction: When sources state probabilities, how accurately are they captured?
  • Linguistic mapping: Do qualitative expressions translate to reasonable probabilities?
  • Coherence: Are probability distributions properly normalized?

Semantic Quality:

  • Description accuracy: Do extracted descriptions preserve original meaning?
  • Terminology preservation: Does the system maintain author’s vocabulary?
  • Context retention: Is sufficient information preserved for interpretation?

Functional Validity:

  • Inference agreement: Do extracted models support similar conclusions?
  • Sensitivity preservation: Are critical parameters identified as influential?
  • Policy robustness: Do different extractions suggest similar interventions?

These metrics acknowledge that perfect extraction is neither expected nor necessary. The goal is extraction sufficient for practical use while maintaining transparency about limitations.

3.6.3 Results Summary

While comprehensive validation remains future work, preliminary assessments using the methodology described above would likely reveal several patterns:

Expected Strengths: Automated extraction should excel at identifying explicit causal claims, preserving hierarchical argument structure, and extracting stated probabilities. The two-stage approach likely improves quality by allowing focused optimization for each task.

Anticipated Challenges: Implicit reasoning, complex conditionals, and ambiguous quantifiers would pose greater challenges. Coreference resolution across long documents and maintaining consistency in large models would require continued refinement.

Practical Utility Threshold: Even with imperfect extraction, the system could provide value if it achieves perhaps 70-80% structural accuracy and captures probability estimates within reasonable ranges. This level of performance would enable rapid initial modeling that experts could refine, dramatically reducing the time from argument to formal model.

The validation framework itself represents a contribution—establishing systematic methods for assessing argument extraction quality as this research area develops.

3.6.4 Error Analysis

Understanding failure modes guides both appropriate use and future improvements:

Implicit Assumptions: Authors often leave critical assumptions unstated, relying on shared background knowledge. When an AI safety researcher writes about “alignment,” they assume readers understand the technical concept. The system must either extract these implicit elements or flag their absence.

Complex Conditionals: Natural language expresses conditionality in myriad ways. “If we achieve alignment (which seems unlikely without major theoretical breakthroughs), then deployment might be safe (assuming robust verification).” Parsing nested, qualified conditionals challenges current methods.

Ambiguous Quantifiers: The word “significant” might mean 10% in one context, 60% in another. Without calibration to author-specific usage or domain conventions, probability extraction remains approximate.

Coreference Challenges: Academic writing loves pronouns and indirect references. When “this approach” appears three paragraphs after introducing multiple approaches, identifying the correct referent requires sophisticated discourse understanding.

These limitations don’t invalidate the approach but rather define its boundaries. Users who understand these constraints can work within them, leveraging automation’s strengths while compensating for its weaknesses.

3.6.5 Independent Manual Extraction Validation

To establish ground truth for evaluating AMTAIR’s extraction quality, I obtained independent manual extractions from domain experts. Johannes Meyer and Jelena Meyer3, both experienced in formal logic and argument analysis, independently extracted ArgDown and BayesDown representations from Bucknall and Dori-Hacohen’s “Current and Near-Term AI as a Potential Existential Risk Factor” Bucknall and Dori-Hacohen (2022). This paper, which examines how near-term AI systems might contribute to existential risks through various causal pathways, provides an ideal test case due to its explicit discussion of multiple risk factors and their interdependencies.

The manual extraction process revealed patterns consistent with theoretical expectations from the argument mining literature Khartabil et al. (2021). Both extractors identified remarkably similar causal structures—the core nodes representing existential risk factors (unaligned AGI, nuclear conflict, biological risks, environmental catastrophe) and their relationships to near-term AI capabilities showed near-perfect agreement. This structural convergence aligns with findings from Anderson Anderson (2007) that expert annotators tend to agree on primary argumentative relationships even when working independently.

However, the probability quantification phase exhibited substantially higher variance, corroborating established challenges in eliciting subjective probabilities from text. When extracting conditional probabilities for relationships like P(Nuclear_Conflict | Compromised_Political_Decision_Making), the two extractors’ estimates differed by as much as 30 percentage points. This variance reflects the fundamental ambiguity Pollock Pollock (1995) identified in mapping natural language uncertainty expressions to numerical values—when Bucknall and Dori-Hacohen write that AI “may intensify cyber warfare,” reasonable interpreters might assign probabilities anywhere from 0.4 to 0.7.

The extraction revealed a hierarchical structure with [Existential_Risk] as the root node, influenced by both direct AI risks (unaligned AGI) and indirect pathways where near-term AI acts as an intermediate risk factor. The extractors consistently identified four main causal mechanisms: state-to-state relations (arms race dynamics), corporate power concentration, stable repressive regimes, and compromised political decision-making. This structural clarity demonstrates that despite quantitative uncertainty, the qualitative causal model remains extractable with high fidelity.

Interestingly, both manual extractors struggled with the same ambiguities that challenge automated systems, which could be an indication about convergence on the underlying level of information contained in the source. The relationship between social media recommender systems and various risk factors appeared multiple times in the text with slightly different framings, requiring judgment calls about whether these represented single or multiple causal relationships. This observation supports the design decision to maintain human oversight in AMTAIR’s extraction pipeline—certain interpretive choices require domain knowledge and contextual understanding that neither human nor machine extractors can make with complete confidence in isolation.

The manual extraction exercise validates AMTAIR’s two-stage approach. The high agreement on structure (ArgDown) combined with high variance in probabilities (BayesDown) empirically confirms that separating these extraction tasks addresses genuine cognitive and epistemological differences. As predicted by the causal structure learning literature Heinze-Deml, Maathuis, and Meinshausen (2018) Squires and Uhler (2023), identifying “what causes what” represents a different inferential challenge than quantifying “how likely” those causal relationships are.

This validation also illuminates the value proposition of automated extraction. While human experts required 4-6 hours each to complete their extractions, AMTAIR processed the same document in under two minutes. Even if automated extraction only achieves 80% of human accuracy, the 100x speed improvement enables analyzing entire literatures rather than individual papers. The manual baseline suggests that perfect extraction may be impossible even for humans—but good-enough extraction at scale can still transform how we synthesize complex arguments about AI risk.

3.7 Policy Evaluation Capabilities

The ultimate test of a model isn’t its elegance but its utility. Can AMTAIR’s extracted models actually inform governance decisions? This section demonstrates how formal models enable systematic policy analysis.

3.7.1 Intervention Representation

Representing policy interventions in Bayesian networks requires translating governance mechanisms into parameter modifications. Pearl’s do-calculus provides the mathematical framework, but the practical challenge lies in meaningful translation.

An ideal implementation would support several intervention types:

Parameter Modification: Policies often change probabilities. Safety requirements might reduce P(deployment|misaligned) from 0.7 to 0.2 by making unsafe deployment legally prohibited or reputationally costly.

Structural Interventions: Some policies add new causal pathways. Introducing mandatory review boards creates new nodes and edges representing oversight mechanisms.

Uncertainty Modeling: Policy effectiveness is itself uncertain. Rather than assuming perfect implementation, represent ranges: P(deployment|misaligned) might become [0.1, 0.3] depending on enforcement.

Multi-Level Effects: Policies influence multiple levels simultaneously. Compute governance affects technical development, corporate behavior, and international competition.

The system would translate high-level policy descriptions into specific network modifications, enabling rigorous counterfactual analysis of intervention effects.

3.7.2 Example: Deployment Governance

Let’s trace how a specific policy—mandatory safety certification before deployment—might be evaluated:

Baseline Model: In Carlsmith’s original model, P(deployment|misaligned) = 0.7, reflecting competitive pressures overwhelming safety concerns.

Policy Specification: Safety certification requires demonstrating alignment properties before deployment authorization. Based on similar regulations in other domains, we might estimate 80-90% effectiveness.

Parameter Update: The modified model sets P(deployment|misaligned) = 0.1-0.2, representing the residual probability of circumvention or regulatory capture.

Downstream Effects:

  • Reduced deployment of misaligned systems
  • Lower probability of power-seeking manifestation
  • Decreased existential risk from ~5% to ~1.2%

Sensitivity Analysis: How robust is this conclusion? Varying certification effectiveness, enforcement probability, and other parameters reveals which assumptions critically affect the outcome.

This example illustrates policy evaluation’s value: moving from vague claims (“regulation would help”) to quantitative assessments (“this specific intervention might reduce risk by 75%±15%”).

3.7.3 Robustness Analysis

Good policies work across scenarios. AMTAIR enables testing interventions against multiple worldviews, parameter ranges, and structural variations.

Cross-Model Testing: Extract multiple expert models and evaluate the same policy in each. If an intervention reduces risk in Carlsmith’s model but increases it in Christiano’s, we’ve identified a critical dependency.

Parameter Sensitivity: Which uncertainties most affect policy effectiveness? If the intervention only works for P(alignment_difficulty) < 0.3, and experts disagree whether it’s 0.2 or 0.4, we need more research before implementing.

Structural Uncertainty: Some disagreements concern model structure itself. Does capability advancement directly influence misalignment risk, or only indirectly through deployment pressures? Test policies under both structures.

Confidence Bounds: Rather than point estimates, compute ranges. “This policy reduces risk by 40-80%” honestly represents uncertainty while still providing actionable guidance.

The goal isn’t eliminating uncertainty but making decisions despite it. Robustness analysis reveals which policies work across uncertainties versus those requiring specific assumptions.

3.8 Interactive Visualization Design

A Bayesian network without good visualization is like a symphony without performers—all potential, no impact. The visualization system transforms mathematical abstractions into intuitive understanding.

3.8.1 Visual Encoding Strategy

Every visual element carries information:

Color: The probability spectrum from red (low) through yellow to green (high) provides immediate gestalt understanding. Pre-attentive processing—the brain’s ability to process certain visual features without conscious attention—makes patterns jump out.

Borders: Node type encoding (blue=root, purple=intermediate, magenta=outcome) creates visual flow. The eye naturally follows from blue through purple to magenta, tracing causal pathways.

Size: Larger nodes have higher centrality—more connections, more influence. This emerges from the physics simulation but reinforces importance.

Layout: Force-directed positioning naturally clusters related concepts while maintaining readability. The algorithm balances competing constraints: minimize edge crossings, maintain hierarchical levels, avoid node overlap, and create aesthetic appeal.

The encoding philosophy: every pixel should earn its place by conveying information while maintaining visual harmony.

3.8.2 Progressive Disclosure

Information overload kills understanding. The interface reveals complexity gradually:

Level 1 - Overview: At first glance, see network structure and probability color coding. This answers: “What’s the shape of the argument? Where are the high-risk areas?”

Level 2 - Hover Details: Mouse over a node to see its description and prior probability. This adds: “What does this factor represent? How likely is it?”

Level 3 - Click Deep Dive: Clicking opens full probability tables and relationships. This reveals: “How does this probability change with conditions? What influences this factor?”

Level 4 - Interactive Exploration: Dragging, zooming, and physics controls enable custom investigation. This supports: “What if I reorganize to see different patterns? How do these clusters relate?”

Each level serves different users and use cases. A policymaker might work primarily with levels 1-2, while a researcher dives into level 3-4 details.

3.8.3 User Interface Elements

Effective interface design for Bayesian networks requires balancing power with accessibility:

Physics Controls: Force-directed layouts benefit from tuning. Gravity affects spread, spring length controls spacing, damping influences settling time. Advanced users can adjust these for optimal layouts, while defaults work well for most cases.

Filter Options: With large networks, selective viewing becomes essential. Filter by probability ranges (show only likely events), node types (focus on interventions), or causal depth (see only immediate effects).

Export Functions: Different stakeholders need different formats. Researchers want raw data, policymakers need reports, presenters require images. Supporting diverse export formats enables broad usage.

Comparison Mode: Understanding often comes from contrast. Side-by-side viewing of baseline versus intervention, or different expert models, reveals critical differences.

Iterative design with actual users would refine these features, ensuring they serve real needs rather than imagined ones.

3.9 Integration with Prediction Markets

The vision: formal models that breathe with live data, updating as collective intelligence evolves. While full implementation awaits, the architecture anticipates this future.

3.9.1 Design for Integration

Integration Architecture requires careful design to manage the impedance mismatch between formal models and market data:

API Specifications: Each platform—Metaculus, Manifold, Good Judgment Open—has unique data formats, update frequencies, and question types. A unified adapter layer would translate platform-specific formats into model-compatible data.

Semantic Matching: The hard problem—connecting “AI causes extinction by 2100” (market question) to “Existential_Catastrophe” (model node). This requires sophisticated NLP and possibly human curation for high-stakes connections.

Aggregation Methods: When multiple markets address similar questions, how do we combine? Weighted averages based on market depth, participant quality, and historical accuracy provide more signal than simple means.

Update Scheduling: Real-time updates would overwhelm users and computation. Smart scheduling might update daily for slow-changing strategic questions, hourly for capability announcements, immediately for critical events.

3.9.2 Challenges and Opportunities

The challenges are real but surmountable:

Question Mapping: Markets ask specific, time-bound questions while models represent general relationships. “AGI by 2030?” maps uncertainly to “APS_Systems exists.” Developing robust mapping functions requires deep understanding of both domains.

Temporal Alignment: Market probabilities change over time, but model parameters are typically static. Should we use current market values, time-weighted averages, or attempt to extract trend information?

Quality Variation: A liquid market with expert participants provides different information than a thin market with casual forecasters. Weighting schemes must account for these quality differences.

Incentive Effects: If models influence policy and policy influences outcomes, and markets forecast outcomes, we create feedback loops. Understanding these dynamics prevents perverse incentives.

Despite challenges, even partial integration provides value:

  • External validation of expert-derived probabilities
  • Dynamic updating as new information emerges
  • Identification of where model and market disagree
  • Quantified uncertainty from market spread

The perfect shouldn’t be the enemy of the good—simple integration beats no integration.

3.10 Computational Performance Analysis

As networks grow from toy examples to real-world complexity, computational challenges emerge. Understanding these constraints shapes realistic expectations and optimization priorities.

3.10.1 Exact vs. Approximate Inference

The fundamental tradeoff in probabilistic reasoning: exactness versus tractability.

Exact Inference: Variable elimination and junction tree algorithms provide mathematically exact answers. For our 3-node rain-sprinkler network, calculations complete instantly. For 20-node networks with modest connectivity, expect seconds. But for 50+ node networks with complex dependencies, exact inference becomes impractical—potentially taking hours or exhausting memory.

Approximate Methods: When exactness becomes impractical, approximation saves the day:

  • Monte Carlo Sampling: Generate thousands of scenarios consistent with the network, estimate probabilities from frequencies. Accuracy improves with samples, trading computation time for precision.
  • Variational Inference: Find the simplest distribution that approximates our complex reality. Like fitting a smooth curve to jagged data—we lose detail but gain comprehension.
  • Belief Propagation: Pass messages between nodes until beliefs converge. Works beautifully for tree-structured networks, can oscillate or converge slowly for complex loops.

The system selects methods based on network properties:

  • Small networks: exact inference for precision
  • Medium networks: belief propagation for speed
  • Large networks: sampling for scalability
  • Very large networks: hierarchical decomposition

3.10.2 Scaling Strategies

When networks grow beyond convenient computation, clever strategies maintain usability:

Hierarchical Decomposition: Break large networks into smaller, manageable subnetworks. Compute locally, then integrate results. Like solving a jigsaw puzzle by completing sections before assembling the whole.

Relevance Pruning: For specific queries, most nodes don’t matter. If asking about deployment risk, technical details about interpretability methods might be temporarily ignorable. Prune irrelevant subgraphs for focused analysis.

Caching Architecture: Many queries repeat—P(catastrophe), P(deployment|misalignment). Cache results to avoid recomputation. Smart invalidation updates only affected queries when parameters change.

Parallel Processing: Inference calculations often decompose naturally. Different branches of the network can be processed simultaneously. Modern multi-core processors and cloud computing make this increasingly attractive.

Implementation would balance these strategies based on usage patterns. Interactive exploration benefits from caching and pruning. Batch analysis leverages parallelization. The architecture accommodates multiple approaches.

3.11 Results and Achievements

3.11.1 Extraction Quality Assessment

Assessing extraction quality requires honesty about both achievements and limitations. An ideal evaluation would examine multiple dimensions:

Coverage: What proportion of arguments in source texts does the system successfully capture? Initial applications suggest the two-stage approach identifies most explicit causal claims while struggling with deeply implicit relationships.

Accuracy: How closely do automated extractions match expert consensus? Preliminary comparisons indicate strong agreement on primary causal structures with more variation in probability estimates.

Robustness: How well does the system handle different writing styles, argument structures, and domains? Academic papers with clear argumentation extract more reliably than informal blog posts or policy documents.

Utility: Do the extracted models enable meaningful analysis? Even imperfect extractions that capture 80% of structure with approximate probabilities can dramatically accelerate modeling compared to starting from scratch.

The key insight: perfect extraction isn’t necessary for practical value. Like machine translation, which provides useful results despite imperfections, automated argument extraction can enhance human capability without replacing human judgment.

3.11.2 Computational Performance

Performance analysis would reveal the practical boundaries of the current system:

Extraction Speed: LLM-based extraction scales roughly linearly with document length. A 20-page paper might require 30-60 seconds for structural extraction and similar time for probability extraction. This enables processing dozens of documents daily—orders of magnitude faster than manual approaches.

Network Complexity Limits: Exact inference remains tractable for networks up to approximately 30-40 nodes with moderate connectivity. Beyond this, approximate methods become necessary, with sampling methods scaling to hundreds of nodes at the cost of precision.

Visualization Responsiveness: The extraction phase exhibits linear complexity in document length—processing twice as much text takes roughly twice as long. However, the inference phase faces exponential complexity in network connectivity.

End-to-End Pipeline: From document input to interactive visualization, expect 2-5 minutes for typical AI safety arguments. This represents roughly 100x speedup compared to manual modeling efforts.

These performance characteristics make AMTAIR practical for real-world use while highlighting areas for future optimization.

3.11.3 Policy Impact Evaluation

The true test of AMTAIR lies in its ability to inform governance decisions. An ideal policy evaluation framework would demonstrate several capabilities:

Intervention Modeling: Representing diverse policy proposals—from technical standards to international agreements—as parameter modifications in extracted networks. This translation from qualitative proposals to quantitative changes enables rigorous analysis.

Comparative Assessment: Evaluating multiple interventions across different expert worldviews to identify robust strategies. Policies that reduce risk across different models deserve priority over those requiring specific assumptions.

Sensitivity Analysis: Understanding which uncertainties most affect policy conclusions. If an intervention’s effectiveness depends critically on disputed parameters, this highlights research priorities.

Implementation Guidance: Moving beyond “this policy reduces risk” to specific recommendations about design details, implementation sequences, and success metrics.

The system would transform abstract policy discussions into concrete quantitative analyses, enabling evidence-based decision-making in AI governance.

3.12 Summary of Technical Contributions

Looking back at the implementation journey, several achievements stand out:

Automated Extraction: The two-stage pipeline successfully transforms natural language arguments into formal models, achieving practical accuracy while maintaining transparency about limitations.

Hybrid Representation: BayesDown bridges qualitative and quantitative worlds, preserving semantic richness while enabling mathematical analysis.

Scalable Architecture: Modular design accommodates growth—new document types, improved extraction methods, additional visualization options—without fundamental restructuring.

Interactive Accessibility: Thoughtful visualization makes complex models understandable to diverse stakeholders, democratizing access to formal reasoning tools.

Policy Relevance: The ability to model interventions and assess robustness transforms academic exercises into practical governance tools.

These technical achievements validate the feasibility of computational coordination infrastructure for AI governance. Not as a complete solution, but as a meaningful enhancement to human judgment and collaboration.

The implementation demonstrates that the vision of automated argument extraction is not merely theoretical but practically achievable. While challenges remain—particularly in handling implicit reasoning and diverse uncertainty expressions—the system provides a foundation for enhanced coordination in AI governance.

The journey from concept to implementation revealed unexpected insights. The two-stage extraction process, initially a pragmatic choice, proved cognitively valid. The intermediate representations became valuable outputs themselves. The visualization challenges led to design innovations applicable beyond this project.

Most importantly, the implementation confirms that formal modeling of AI risk arguments need not remain the province of a few dedicated experts. Through automation and thoughtful design, these powerful tools can serve the broader community working to ensure advanced AI benefits humanity.

Having demonstrated technical feasibility and practical utility, we must now critically examine limitations, address objections, and explore broader implications. The next chapter undertakes this essential reflection, ensuring we neither oversell the approach nor undervalue its contributions.


  1. The development of effective prompts required extensive empirical refinement.

    Appendix L: Prompt Engineering - The Hidden Art documents this journey, revealing both the sophistication required for successful extraction and the brittleness of current approaches.↩︎

  2. 3↩︎

  3. I am extremely grateful for their help, support and the invaluable contribution. As lead engineer I had had the nagging suspicion that, maybe I had “hardcoded” by own intuitions into the system (through choices in the setup, system prompt, source selection etc.). I am relieved to let go of this concern and hope that future, large scale work confirms the potential for objectivity and convergence.↩︎