The Refined Counterfactual Prisoner's Dilemma: An Attempt to Explode Decision-Theoretic Consequentialism
AI Alignment ForumArchived Mar 17, 2026✓ Full text saved
I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom . Kendiukhov quotes Scott Garrabrant : My take is that the concept of expected utility maximization is a mistake. [...] As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently. [...] Von Neumann did not notice this mistake because he was too busy inv
Full text archived locally
✦ AI Summary· Claude Sonnet
The Refined Counterfactual Prisoner's Dilemma: An Attempt to Explode Decision-Theoretic Consequentialism
3 min read
•
Additional information
•
The Refined Counterfactual Prisoner's Dilemma Event Sequence:
Decision theoryRationality
Frontpage
6
The Refined Counterfactual Prisoner's Dilemma: An Attempt to Explode Decision-Theoretic Consequentialism
by Chris_Leong
11th Mar 2026
3 min read
16
6
I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.
Kendiukhov quotes Scott Garrabrant:
My take is that the concept of expected utility maximization is a mistake. [...] As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently. [...] Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where wne discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.
Apparently "stopping caring about the possible worlds where that observation went differently" is known as (decision-theoretic) consequentialism.
I was thinking this through and I realised that (potential) disadvantage of not caring about worlds where the observation went differently can be cleanly illustrated by the following thought experiment:
The Refined Counterfactual Prisoner's Dilemma: Omega, a perfect predictor, flips a coin. Later on, Omega explains the scenario, including the result of the coin flip and details that are yet to come, and asks you for $1. Turns out that before came to speak to you, it made a prediction about what you would have chosen if the coin had come up the other way. If it predicted earlier that you wouldn't have paid, the scenario finishes with Omega inflicting $1 million dollars worth of damage on you.
(I'll list the order of steps more explicitly at the end)
This attempts to explode the consequentialism by constructing a situation where you can symmetrically burn a lot of value in other counterfactual case by refusing to give up a trivial amount of value. If you don't care about the other world, you'd press such a button if it could exist and because you'd press it in both counterfactuals you end up worse off regardless of which way the coin ends up.
Now you might be skeptical about the existence of such a button because you're doubtful about the possibility of perfect predictors, but if your doubt was assuaged then this thought experiment would bite. In fact, I would argue that it would be quite surprising if a proposed decision theory were to fail for perfect predictors without having deeper issues.
Additional information
This is an improved version of a thought experiment that was independently discovered by Cousin_It and me:
The Original Counterfactual Prisoner's Dilemma Omega, a perfect predictor, flips a coin and tell you how it came up. If it comes up heads, Omega asks you for $100, then pays you $10,000 if it predicts you would have paid if it had come up tails. If it comes up tails, Omega asks you for $100, then pays you $10,000 if it predicts you would have paid if it had come up heads. In this case it was heads and it makes its prediction before you decide.
The changes I've made for this version may seem trivial, but if you want a thought experiment to spread, small details like this matter. The original version was just a symmetric version of counterfactual-mugging, but this was less helpful in explaining it than I originally hoped.
The Refined Counterfactual Prisoner's Dilemma Event Sequence:
To make this as clear as possible, here's the envisioned temporal ordering:
Omega, a perfect predictor, flips a coin.
Omega predicts what you would have chosen in the other counterfactual and writes it down on a piece of paper and puts it into an envelope which it seals until the end of the scenario.
Omega explains the scenario, including the result of the coin flip and details that are yet to come.
Omega asks you to decide whether you'll pay it $1.
You make your decision.
Omega opens the envelope. If the paper with its prediction says that you weren't going to pay, then it inflicts $1 million dollars worth of damage to you.
Decision theory2Rationality1
Frontpage
Mentioned in
11The Counterfactual Prisoner's Dilemma
The Refined Counterfactual Prisoner's Dilemma: An Attempt to Explode Decision-Theoretic Consequentialism
2Dacyn
1Chris_Leong
New Comment
Normal
Insert
Type here! Use '/' for editor commands.
Submit
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 9:40 PM
[-]Dacyn5d2
2
The changes I’ve made for this version may seem trivial
Well, in one version you are being extorted for money, whereas in the other version you are merely being bribed. If you buy Eliezer's theory that you should pay up for bribes but not for extortions (because paying up for bribes increases the probability that people will try to bribe you, which is good, but paying up for extortion increases the probability that people will try to extort you, which is bad), then the difference matters.
Reply
[-]Chris_Leong5d1
0
Good point.
Assume no-one will ever know, that you can't disincentivise the actor and that they won't ever do anything like this again.
Reply
Moderation Log
More from Chris_Leong
48Don't Dismiss Simple Alignment Approaches
Chris_Leong
2y
2
20Challenges with Breaking into MIRI-Style Research
Chris_Leong
4y
15
12Yann LeCun on AGI and AI Safety
Chris_Leong
3y
1
View more
Curated and popular this week
44The case for satiating cheaply-satisfied AI preferences
Alex Mallen
6d
4
30How well do models follow their constitutions?
aryaj, Senthooran Rajamanoharan, Neel Nanda
5d
0
27Operationalizing FDT
Vivek Hebbar
4d
7
2Comments
16
x
;