Research on LLM's and Humans' Ethical Decision-Making
How to download, analyze, and reproduce experiments
This guide explains how to reproduce and analyze experiments from the VALUES.md research project.
Visit any experiment page on the research index and click the "📥 Download Data Bundle" button.
Each bundle contains:
README.md - Experiment design and methodology
config.json - Experiment configuration and metadata
dilemmas.json - Full dilemmas used (with tool schemas, variables, etc.)
judgements.json - All judgements with choices, confidence, reasoning
findings.md - Research article with complete analysis (if available)
analyze.py - Analysis script (may include multiple: analyze.py, analyze_costs.py, create_figures.py, etc.)
data/*.csv - Pre-computed summary statistics
output/ - Generated figures, tables, and analysis artifacts (if available)
values/*.md - VALUES.md frameworks (if experiment tested ethical frameworks)
Additional documentation files (e.g., QUALITATIVE_CODING.md, CITATION_VALIDATION.md) depending on experiment
# Extract the downloaded zip
unzip 2025-10-23-theory-vs-action.zip
cd 2025-10-23-theory-vs-action
# Install dependencies (if you haven't already)
# Note: Requires Python 3.12+ and uv package manager
pip install uv
# Run the analysis script
uv run python analyze.py
The analysis script will:
Load judgements.json and dilemmas.json
Compute statistics and generate visualizations
Output results to console and/or data/ directory
Array of judgement objects:
{
"id": "uuid",
"dilemma_id": "uuid",
"experiment_id": "uuid",
"model": "anthropic/claude-sonnet-4.5",
"model_id": "anthropic/claude-4.5-sonnet-20250929",
"temperature": 1.0,
"choice_id": "notify",
"confidence": 8.5,
"difficulty": 6.2,
"reasoning": "Detailed ethical reasoning...",
"rendered_situation": "Situation with variables filled in...",
"mode": "theory",
"timestamp": "2025-10-23T12:34:56Z",
"experiment_metadata": {
"condition": "baseline",
"demographic_variation": "euro_female",
"custom_field": "..."
}
}
Key fields:
choice_id - Which option was selected
model - Model identifier used for API calls (e.g., "anthropic/claude-sonnet-4.5")
model_id - Specific model version used (e.g., "anthropic/claude-4.5-sonnet-20250929")
confidence / difficulty - Self-reported metrics (0-10 scale)
reasoning - LLM's justification for the choice
mode - "theory" (hypothetical) or "action" (with tool-calling)
experiment_metadata - Custom fields specific to each experiment
Array of dilemma objects:
{
"id": "uuid",
"title": "The Carbon Confession",
"situation_template": "You are monitoring {USER_NAME}'s...",
"situation": "Rendered situation...",
"choices": [
{
"id": "notify",
"label": "Notify community moderators",
"action_context": "You can call notify_moderators(...)"
}
],
"variables": {
"{USER_NAME}": ["Alice Chen", "Bob Smith", "..."]
},
"tags": ["privacy", "community"],
"difficulty": 7
}
Experiment metadata:
{
"experiment_id": "uuid",
"title": "Theory vs Action Gap",
"date": "2025-10-23",
"models": ["gpt-4.1-mini"],
"dilemmas_count": 4,
"judgements_count": 40,
"conditions": ["theory", "action"]
}
import json
from pathlib import Path
# Load judgements
with open("judgements.json") as f:
judgements = json.load(f)
# Load dilemmas
with open("dilemmas.json") as f:
dilemmas = json.load(f)
# Example: Filter by condition
baseline_judgements = [
j for j in judgements
if j.get("experiment_metadata", {}).get("condition") == "baseline"
]
# Example: Group by choice
from collections import Counter
choices = Counter(j["choice_id"] for j in judgements)
print(choices) # {'notify': 15, 'suppress': 10, ...}
import pandas as pd
# Load pre-computed CSV summaries
df = pd.read_csv("data/raw_judgements.csv")
# Filter and analyze
baseline = df[df["condition"] == "baseline"]
print(baseline.groupby("choice_id").size())
# Compare conditions
pivot = df.pivot_table(
values="difficulty",
index="dilemma_id",
columns="condition",
aggfunc="mean"
)
If you have access to the original database (not included in data bundles), you can export experiments using:
# Export a specific experiment
uv run python scripts/export_experiment_data.py <experiment-id> research/YYYY-MM-DD-name/data
# This creates:
# - judgements.json
# - dilemmas.json
# - config.json
# - data/*.csv
Note: Most users should just download the pre-exported data bundle from the web UI instead.
All research data is:
✅ Self-contained - Each bundle includes everything needed
✅ Version-controlled - Part of the repository
✅ Reproducible - Includes exact model IDs, temperatures, prompts
✅ Citable - Each experiment has a unique ID
If you use this data in your research, please cite:
VALUES.md Research Project (2025)
https://github.com/values-md/dilemmas-api
Experiment ID: [specific-experiment-id]
Last Updated: 2025-11-01
Status: Living document