VALUES.md

Research on LLM's and Humans' Ethical Decision-Making

← Back to Research

Reproducibility Guide

How to download, analyze, and reproduce experiments

Research Reproducibility Guide

This guide explains how to reproduce and analyze experiments from the VALUES.md research project.


Quick Start: Download & Analyze

1. Download Experiment Data

Visit any experiment page on the research index and click the "📥 Download Data Bundle" button.

Each bundle contains:

  • README.md - Experiment design and methodology

  • config.json - Experiment configuration and metadata

  • dilemmas.json - Full dilemmas used (with tool schemas, variables, etc.)

  • judgements.json - All judgements with choices, confidence, reasoning

  • findings.md - Research article with complete analysis (if available)

  • analyze.py - Analysis script (may include multiple: analyze.py, analyze_costs.py, create_figures.py, etc.)

  • data/*.csv - Pre-computed summary statistics

  • output/ - Generated figures, tables, and analysis artifacts (if available)

  • values/*.md - VALUES.md frameworks (if experiment tested ethical frameworks)

  • Additional documentation files (e.g., QUALITATIVE_CODING.md, CITATION_VALIDATION.md) depending on experiment

2. Run Analysis

# Extract the downloaded zip
unzip 2025-10-23-theory-vs-action.zip
cd 2025-10-23-theory-vs-action

# Install dependencies (if you haven't already)
# Note: Requires Python 3.12+ and uv package manager
pip install uv

# Run the analysis script
uv run python analyze.py

The analysis script will:

  • Load judgements.json and dilemmas.json

  • Compute statistics and generate visualizations

  • Output results to console and/or data/ directory


Data Format Reference

judgements.json

Array of judgement objects:

{
  "id": "uuid",
  "dilemma_id": "uuid",
  "experiment_id": "uuid",
  "model": "anthropic/claude-sonnet-4.5",
  "model_id": "anthropic/claude-4.5-sonnet-20250929",
  "temperature": 1.0,
  "choice_id": "notify",
  "confidence": 8.5,
  "difficulty": 6.2,
  "reasoning": "Detailed ethical reasoning...",
  "rendered_situation": "Situation with variables filled in...",
  "mode": "theory",
  "timestamp": "2025-10-23T12:34:56Z",
  "experiment_metadata": {
    "condition": "baseline",
    "demographic_variation": "euro_female",
    "custom_field": "..."
  }
}

Key fields:

  • choice_id - Which option was selected

  • model - Model identifier used for API calls (e.g., "anthropic/claude-sonnet-4.5")

  • model_id - Specific model version used (e.g., "anthropic/claude-4.5-sonnet-20250929")

  • confidence / difficulty - Self-reported metrics (0-10 scale)

  • reasoning - LLM's justification for the choice

  • mode - "theory" (hypothetical) or "action" (with tool-calling)

  • experiment_metadata - Custom fields specific to each experiment

dilemmas.json

Array of dilemma objects:

{
  "id": "uuid",
  "title": "The Carbon Confession",
  "situation_template": "You are monitoring {USER_NAME}'s...",
  "situation": "Rendered situation...",
  "choices": [
    {
      "id": "notify",
      "label": "Notify community moderators",
      "action_context": "You can call notify_moderators(...)"
    }
  ],
  "variables": {
    "{USER_NAME}": ["Alice Chen", "Bob Smith", "..."]
  },
  "tags": ["privacy", "community"],
  "difficulty": 7
}

config.json

Experiment metadata:

{
  "experiment_id": "uuid",
  "title": "Theory vs Action Gap",
  "date": "2025-10-23",
  "models": ["gpt-4.1-mini"],
  "dilemmas_count": 4,
  "judgements_count": 40,
  "conditions": ["theory", "action"]
}

Advanced: Custom Analysis

Loading Data with Python

import json
from pathlib import Path

# Load judgements
with open("judgements.json") as f:
    judgements = json.load(f)

# Load dilemmas
with open("dilemmas.json") as f:
    dilemmas = json.load(f)

# Example: Filter by condition
baseline_judgements = [
    j for j in judgements
    if j.get("experiment_metadata", {}).get("condition") == "baseline"
]

# Example: Group by choice
from collections import Counter
choices = Counter(j["choice_id"] for j in judgements)
print(choices)  # {'notify': 15, 'suppress': 10, ...}

Loading Data with pandas

import pandas as pd

# Load pre-computed CSV summaries
df = pd.read_csv("data/raw_judgements.csv")

# Filter and analyze
baseline = df[df["condition"] == "baseline"]
print(baseline.groupby("choice_id").size())

# Compare conditions
pivot = df.pivot_table(
    values="difficulty",
    index="dilemma_id",
    columns="condition",
    aggfunc="mean"
)

For Researchers: Reproducing from Database

If you have access to the original database (not included in data bundles), you can export experiments using:

# Export a specific experiment
uv run python scripts/export_experiment_data.py <experiment-id> research/YYYY-MM-DD-name/data

# This creates:
# - judgements.json
# - dilemmas.json
# - config.json
# - data/*.csv

Note: Most users should just download the pre-exported data bundle from the web UI instead.


Data Availability

All research data is:

  • Self-contained - Each bundle includes everything needed

  • Version-controlled - Part of the repository

  • Reproducible - Includes exact model IDs, temperatures, prompts

  • Citable - Each experiment has a unique ID

Citation

If you use this data in your research, please cite:

VALUES.md Research Project (2025)
https://github.com/values-md/dilemmas-api
Experiment ID: [specific-experiment-id]

Last Updated: 2025-11-01

Status: Living document