Consequentialist-Recommendation Consequentialism

by paulfchristiano

An act consequentialist evaluates possible acts by the goodness of their consequences. In some situations this leads to bad consequences. For example, I may decline to trust a consequentialist because I am (justifiably) concerned that they will betray my trust whenever it is in their interest. This outcome is widely considered unsatisfactory, and is often taken to imply that a person should not willingly become an act consequentialist.

Rule consequentialism is an alternative to act consequentialism. In this framework, a rule is good in accordance with the goodness of the consequences of embracing it, and an act is prohibited if it violates a good rule. For example, the rule “respect promises” has good consequences; so some rule consequentialists might embrace it, and I might expect such consequentialists to be trustworthy. But in fact it’s hard to say anything about what a rule consequentialist would do, because it’s hard to say what counts as a “rule” or what embracing a rule entails. For some notions of “rule,” rule consequentialism degenerates to act consequentialism. For every other careful specification I’ve seen, the resulting theory has extremely bad behavior (i.e. adopting it reliably leads to terrible consequences).

Continuing in this line, we could define an infinite family of flavors of consequentialism. The difference between these theories is their construction of counterfactuals. The act consequentialist reasons: “If I lie in this case, there will be no bad consequences.” The rule consequentialist reasons: “If I lie in this case, it is because I am not following the rule ‘be honest.’ If I don’t/didn’t embrace the rule ‘be honest,’ there will be/would have been bad consequences.”

Pretty much all of these forms of consequentialism are self-defeating in a certain sense. If I ask the act consequentialist: “What do you recommend that I do?” they will not make the recommendation of act consequentialism. Instead, they will consider the realities of my psychology and what I am likely to follow through on, the effects of their recommendation on other people who will overhear it (who might anticipate that I’ll act in accordance with their recommendation), and so on. A similar situation obtains with respect to rule consequentialism, or “motivation consequentialism.” Moreover, this applies not only to the act consequentialist’s utterances, but to their own choice of moral theory.

There is an (essentially) unique form of consequentialism that is not self-defeating in this sense. Namely, consider T-consequentialism, the theory that recommends X such that “T-consequentialism recommends X” has optimal consequences. I call this consequentialist-recommendation consequentialism. Less pithily but more accurately, we might call it (preceded by its quotation and then parenthesized and suffixed with `-recommendation consequentialism'” preceded by its quotation and then parenthesized and suffixed with `-recommendation consequentialism’)-recommendation consequentialism. For now I’ll just call it T-consequentialism.

So for example, when I am considering whether to betray a friend’s confidence, I reason: “My friend knows that I am a T-consequentialist. If T-consequentialism recommends betraying confidence in this case, my friend would not have trusted me, and the outcome would have been worse. So T-consequentialism recommends being trustworthy,” and following the recommendations of T-consequentialism I would be trustworthy. If I am considering devoting 99% vs. 90% of my spare funds to do-gooding, I might reason “If T-consequentialism recommended devoting 99% of my spare funds to do-gooding I would not have embraced T-consequentialism (nor act consequentialism). So T-consequentialism will make less demanding recommendations.” ETA: note that “T-consequentialism recommends” is a fact about a certain decision procedure. It is a logical rather than empirical or moral fact—what recommendation do I obtain if I follow this decision procedure?

This is uniquely not self-defeating: if T-consequentialism recommended anything other than X then the consequences would be worse (by definition). If we are willing to entertain logical uncertainty, T-consequentialism is well-defined, because while we are trying to figure out whether T-consequentialism recommends X, we can (by definition) conceive of the world in which T recommends any X. Once we can no longer conceive of the world in which T recommends X, we know that T does not recommend X. This well-definition is also a unique virtue of consequentialist-recommendation consequentialism. Act and rule consequentialism, by comparison, run into the standard problem of computing counterfactuals—given that I am actually going to do X, I might be unable to imagine doing Y != X, and so I cannot evaluate the consequences of doing Y except by appealing to some ill-defined notion of “closest possible world.”

Mathematically formalizing T-consequentialism is a bit tricky, and is essentially still an open problem (though many consequences of such T-consequentialist theories are understood, provided that any such theories exist, and as far as I know no moral theories other than act consequentialism have a rigorous mathematical interpretation). T-consequentialism and the above discussion of it are very heavily informed by “updateless decision theory“, a semi-rigorous decision theory developed in the community around LessWrong.