Consequentialist-Recommendation Consequentialism
by paulfchristiano
An act consequentialist evaluates possible acts by the goodness of their consequences. In some situations this leads to bad consequences. For example, I may decline to trust a consequentialist because I am (justifiably) concerned that they will betray my trust whenever it is in their interest. This outcome is widely considered unsatisfactory, and is often taken to imply that a person should not willingly become an act consequentialist.
Rule consequentialism is an alternative to act consequentialism. In this framework, a rule is good in accordance with the goodness of the consequences of embracing it, and an act is prohibited if it violates a good rule. For example, the rule “respect promises” has good consequences; so some rule consequentialists might embrace it, and I might expect such consequentialists to be trustworthy. But in fact it’s hard to say anything about what a rule consequentialist would do, because it’s hard to say what counts as a “rule” or what embracing a rule entails. For some notions of “rule,” rule consequentialism degenerates to act consequentialism. For every other careful specification I’ve seen, the resulting theory has extremely bad behavior (i.e. adopting it reliably leads to terrible consequences).
Continuing in this line, we could define an infinite family of flavors of consequentialism. The difference between these theories is their construction of counterfactuals. The act consequentialist reasons: “If I lie in this case, there will be no bad consequences.” The rule consequentialist reasons: “If I lie in this case, it is because I am not following the rule ‘be honest.’ If I don’t/didn’t embrace the rule ‘be honest,’ there will be/would have been bad consequences.”
Pretty much all of these forms of consequentialism are self-defeating in a certain sense. If I ask the act consequentialist: “What do you recommend that I do?” they will not make the recommendation of act consequentialism. Instead, they will consider the realities of my psychology and what I am likely to follow through on, the effects of their recommendation on other people who will overhear it (who might anticipate that I’ll act in accordance with their recommendation), and so on. A similar situation obtains with respect to rule consequentialism, or “motivation consequentialism.” Moreover, this applies not only to the act consequentialist’s utterances, but to their own choice of moral theory.
There is an (essentially) unique form of consequentialism that is not self-defeating in this sense. Namely, consider T-consequentialism, the theory that recommends X such that “T-consequentialism recommends X” has optimal consequences. I call this consequentialist-recommendation consequentialism. Less pithily but more accurately, we might call it (“preceded by its quotation and then parenthesized and suffixed with `-recommendation consequentialism'” preceded by its quotation and then parenthesized and suffixed with `-recommendation consequentialism’)-recommendation consequentialism. For now I’ll just call it T-consequentialism.
So for example, when I am considering whether to betray a friend’s confidence, I reason: “My friend knows that I am a T-consequentialist. If T-consequentialism recommends betraying confidence in this case, my friend would not have trusted me, and the outcome would have been worse. So T-consequentialism recommends being trustworthy,” and following the recommendations of T-consequentialism I would be trustworthy. If I am considering devoting 99% vs. 90% of my spare funds to do-gooding, I might reason “If T-consequentialism recommended devoting 99% of my spare funds to do-gooding I would not have embraced T-consequentialism (nor act consequentialism). So T-consequentialism will make less demanding recommendations.” ETA: note that “T-consequentialism recommends” is a fact about a certain decision procedure. It is a logical rather than empirical or moral fact—what recommendation do I obtain if I follow this decision procedure?
This is uniquely not self-defeating: if T-consequentialism recommended anything other than X then the consequences would be worse (by definition). If we are willing to entertain logical uncertainty, T-consequentialism is well-defined, because while we are trying to figure out whether T-consequentialism recommends X, we can (by definition) conceive of the world in which T recommends any X. Once we can no longer conceive of the world in which T recommends X, we know that T does not recommend X. This well-definition is also a unique virtue of consequentialist-recommendation consequentialism. Act and rule consequentialism, by comparison, run into the standard problem of computing counterfactuals—given that I am actually going to do X, I might be unable to imagine doing Y != X, and so I cannot evaluate the consequences of doing Y except by appealing to some ill-defined notion of “closest possible world.”
Mathematically formalizing T-consequentialism is a bit tricky, and is essentially still an open problem (though many consequences of such T-consequentialist theories are understood, provided that any such theories exist, and as far as I know no moral theories other than act consequentialism have a rigorous mathematical interpretation). T-consequentialism and the above discussion of it are very heavily informed by “updateless decision theory“, a semi-rigorous decision theory developed in the community around LessWrong.
I’m generally suspicious of arguments of the form “consequentialism leads to bad consequences.” That seems awfully, y’know, consequentialist.
A consequentialist who recognizes the value of being the kind of person other people can trust should act accordingly. That doesn’t mean not being consequentialist, it just means not being nearsighted.
I think I agree with Patrick’s post (which is a better statement of the same ideas), but the important caveat is “if you have a good decision theory.” Neither rule nor act consequentialism is compatible with reasonable decision theory. What I’m proposing is basically just UDT-consequentialism, but in the same language as rule/act consequentialism.
A couple points:
1) I’m not sure how “T-consequentialism recommends X” can have evaluable consequences. There is (abstractly speaking) already a theory for every possible combination of recommendations. How exactly is calling one of them “T-consequentialism” supposed to have empirical side effects?
2) Is there a reason why non-self effacing is considered preferable to getting better results?
1) We can unpack “T-consequentialism” as “recommend whatever X would have the best consequences, as a recommendation of T-consequentialism” (defined by diagonalization). That’s a well-defined protocol, so say “T-consequentialism recommends X” is a logical statement which can have logical consequences.
2) Certainly when debating what moral theory we ought to adopt, you will have a hard time arguing for a self-effacing theory (since I can immediately argue you out of it). It’s meant more as a meta-argument, that shows that no other moral theory will be able to stand up to attack. But also the formalization of “getting better results” is part of what is up for debate, so alternative desiderata (like reflective consistency) can help shed light on that question.
1) Your answer here doesn’t seem to address my complaint. How does a recommendation of T-consequentialism have any consequences. T-consequentialism doesn’t have any direct physical manifestations.
2) Sorry. By self-effacing I mean a theory that one would not necessarily recommend to other people. This does not mean that I could be argued out of it. Also, I claim that whatever formulation you use for “getting better results” then given some simple (and mostly correct assumptions), a proper implementation of act-consequentialism uniquely beats out every other theory.
1) If I act on the recommendations of T-consequentialism (or think about them), then the recommendations of T-consequentialism have consequences.
2) Oh. It seems like the same argument that suggests “I wouldn’t recommend act consequentialism to others” also implies “I wouldn’t recommend act consequentialism to myself,” unless we have a model in which internal dynamics (unlike external dynamics) are subject to perfect consequentialist-self control, are perfectly opaque to the rest of the world, etc.
I agree that adopting act consequentialism now wins (on the act consequentialist account), but think that (a) you wouldn’t use act consequentialism in the future, given realistic models of psychology and human interaction [or given the unrealistic ability to make binding commitments], and (b) the act-consequentialist account of “getting better results” begs the question, by adopting a certain framework for evaluating the counterfactuals. I agree that modifying reality so that everything stays the same but you used act-consequentialism would improve results (modulo the caveats in (a)), but that doesn’t seem like the right way to construct the counterfactual, and I don’t see how to justify that construction. Do you have some other theorem in mind?
1) You cannot act directly on the recommendations of T-consequentialism, you can only act on what you believe them to be. This cannot be altered by changing what it’s actual recommendations are.
But I think I see what you are getting at. It seems like you are assuming that there is a gigantic, universally accessible database of what T-consequentialism recommends in every possible situation and we are optimizing over that. On the other hand, I think that in order to produce reasonable results you would need to already have a lot of people that have decided to do whatever this database recommends. For example, if a substantial number of people have decided to be anti-T-consequentialists, (i.e., have decided to, as best possible, do the exact opposite of what T-consequentialism recommends) you end up with a theory that you really shouldn’t follow.
On the other hand I think that realistically (once you require that T-consequentialism only recommend simple courses of action to avoid all recommendations being “do such and such on order to develop save fusion power / Friendly GAI / near linear time SAT algorithm / etc.”) T-consequentialism would recommend things that spark productive debate rather than things that are actually good ideas.
I feel like T-Consequentialism is too close to the following theory, which seems a little silly to me:
Terminology-Consequentialism: The moral theory that produces the best results when referred to as “Terminology-Consequentialism”
In some sense this is the optimal definition of “Terminology-Consequentialism”. On the other hand, this just seems silly to me.
2) OK. I see your point that it might be considered problematic that the theory is such that I cannot even count on future-me following it. On the other hand, this does not seem like good enough grounds to declare the moral theory wrong. That seems to me like saying “humans are physically incapable of running anywhere near c, so we should redefine the speed of light to be 20mph.”
I agree that that specify exactly what you do is not the correct way to construct counterfactuals. Decision theory should not conclude that the optimal course of action for someone deathly afraid of heights is to collect the treasure at the other end of a rickety bridge, or suggest that you write down a 10-page proof of the Riemann Hypothesis right now (which is probably something you are physically capable of doing). As I see it the correct way to create counterfactuals is to see what happens if you cause the part of my brain that does high level planning to run a different algorithm (which is of course required to be simple and computationally inexpensive). Then again, I suppose that this does produce different results what is generally considered to be act-consequentialism.
One can contrive some artificial scenarios where (e.g. because powerful aliens hate T-Consequentialists) the best consequences would come from T-Consequentialism making recommendations so terrible and stupid that no one would find T-Consequentialism appealing, no? (Not sure it’s a problem.)
We can also contrive consequences where act consequentialism would do the same (in which powerful aliens create simulations of you). After all, if you are following T-consequentialism, this is essentially saying “The aliens simulate what you would do, and then if you do something good they kill a kitten,” which is a problem for everyone (to the extent it’s a problem for anyone).
This may be especially problematic for T-consequentialism, but its not obvious to me whether these situations are fundamentally different. The real issue is probably responsiveness to blackmail, which already comes up in the case “if T-consequentialism recommended that I did something more extreme, I wouldn’t do it,” and is one of the subtleties in the definition that I don’t really know how to reason about.
Also, note that T-consequentialism makes recommendations on a case-by-case basis, so even most people followed anti-T-consequentialism, unless they were specifically thinking about what T-consequentialism recommended for you, they wouldn’t necessarily mess things up for you.
The weirdness I had in mind arises exactly from the fact that we’re not holding it constant who is and isn’t a follower of T-consequentialism. Because we’re not holding this constant, in some (artificial) situations the best consequences will come from T-consequentialism being rejected, so in these situations T-consequentialism gives recommendations whose ‘job’ isn’t to be accepted, but to be rejected. In the T-consequentialists-hating-aliens scenario, the best recommendations for T-consequentialism to give are whatever recommendations will make humans conclude ‘T-consequentialism is stupid’ and not become T-consequentialists.
Brief historical note: the standard reference for “motivation consequentialism” is Adams (1976), who called it “motive utilitarianism”:
http://www.jstor.org/discover/10.2307/2025783
There is also “desire utilitarianism,” aka “desirism”:
http://atheistethicist.blogspot.com/2011/12/basic-review-of-desirism.html
I don’t recommend motive consequentialism, but global consequentialism, which roughly holds that the right X is the X that leads to the best outcome, for any category of evaluands X (acts, rules, motives, etc). This is similar, but not identical to your T-consequentialism. Indeed I have written a PhD thesis on this topic, which to the best of my knowledge is the most in depth defence and exploration of the topic of applying consequentialism to everything. If anyone wants a copy, let me know via email.
What does it mean for Z to lead to an outcome? Presumably we consider counterfactuals in which Z obtains. But how do you construct such counterfactuals? In order to determine whether it is right to break a promise, I consider the “nearest possible world” in which I break that promise, but it is underdetermined whether I break all of my promises in that world or just that one. Here I am advocating that the “nearest possible world” is the one in which T-consequentialism recommends I break my promise.
It seems like this is a core question—independently of what I call “right,” the actions I take are quite sensitive to how I define counterfactuals. At least that’s how it looks to me. Do you think this is not the case? (Perhaps I should read your thesis, since this is all coming from a talk you gave on it which was presumably less detailed.) I said you seemed to endorse motive consequentialism because I remember you advocating surgery on motivations as the appropriate way to construct counterfactuals when the question came up.
“What does it mean for Z to lead to an outcome?”
I’m not sure, but it is something that is very fundamental and faces pretty much everyone in ethics and decision theory — even questions as basic as whether I should have a sandwich for lunch.
Indeed, it also comes up in all standard physics questions (at least those that non-physicists ask). Will this wineglass break if I drop it on the pavement? Well, we need to select a completely described world in which that happens in order for physical theories to answer this. i.e. supposing I don’t in fact knock it off the table, we need to select a relevant counterfactual world. So this is nothing unique to my theory, or to consequentialism, or even to ethics. We can’t do any reasoning about practically influencing the world without solving or somehow bypassing this issue.
One analogy I find useful is to compare physical theories to ethical axiologies. Both need a theory of counterfactuals (or alternative bypassing method) to apply them: the axiologies are in a less tenable position than the physical theories in this sense.
I do think you should at least browse my thesis. The most relevant chapter for you is chapter 3 (?) on global consequentialism. I’d also read the opening and closing statements of the other chapters and dive into the main text only if there is something you disagree with or would like to see spelled out.
I’ve read chapter 3 / skimmed the rest of your thesis, and agree with what is said. It seems like the issue of global vs. local consequentialism is orthogonal to the issue of how to construct counterfactuals. Sorry to be a bit simplistic about it and incorrectly describe your view, I’ll edit the post (and should have earlier).
For many questions of interest (e.g. “what action is best for me to take?”) the construction of the counterfactual seems to be an important issue that can give different answers (for example, “if I take action X, do others anticipate that I take action X?”). So it doesn’t seem like a minor issue.
I am comfortable constructing counterfactuals for consequentialist judgments, because those judgments are under the control of the consequentialist theory itself. (Even in this case I think there are open questions, but the situation is relatively clear.) I don’t know how to construct counterfactuals in general and I’m not convinced it is always a meaningful exercise.
I’m not sure it’s correct to characterize all forms of consequentialism except T-consequentialism as being “self-defeating in a strong formal sense”. Act consequentialism, for instance, doesn’t imply that act consequentialism is false, only that consequentialist agents will in certain circumstances say that it is. I don’t exactly know what you mean by “strong formal sense”, but as far as I can see there is nothing self-defeating about such a theory, in any relevant sense. Of course, we might still want a theory to lack this property, though not because possessing it would make the theory self-defeating, but because it would make it implausible on other grounds. Speaking for myself, however, I don’t see any convincing argument for thinking that having this property constitutes a reason for rejecting a theory.
It’s not just that an act consequentialist sometimes says that act consequentialism is false; if they are offered an opportunity to become a different sort of consequentialist, they would immediately take it (e.g. to one box, etc. etc.). So in a mature world, you wouldn’t expect to see many act consequentialists around.
[…] Related: consequentialist-recommendation consequentialism […]
[…] decision theory (something like “updateless decision theory for humans”, as gestured at before. This decision theory might look “epistemically modest” sometimes and […]