Against moral advocacy

by paulfchristiano

Sometimes people talk about changing long-term social values as an altruistic intervention; for example, trying to make people care more about animals, or God, or other people, or ancestors, etc., in the hopes that these changes might propagate forward (say because altruistic people work to create a more altruistic world) and eventually have a direct effect on how society uses available resources. I think this is unlikely to be a reasonable goal, not necessarily because it is not possible (though it does seem far-fetched), but because even if it were possible it would not be particularly desirable. I wanted to spend a post outlining my reasoning.

Disclaimer: this is a bit of an odd post. The impatient reader is recommended to skip it. 



First, some clarifications:

  • I expect that improving tech and the continuation of natural selection will push society towards equilibria where important values are enshrined and carefully maintained. I’m focusing on such scenarios, and imagining the very long-run values of society, which will determine the behavior of society in the very long run.  Contemporary social values won’t directly determine long-run values, and indeed a huge change in contemporary values may correspond to a relatively modest change in long-run values, but we might nevertheless think those small effects on the very long-run are quite important.
  • Changing people’s values might be instrumentally useful in the short-term; I am considering changing people’s values in the long-term. For example, suppose I am trying to convince people to be more altruistic. I might be doing this for two quite different reasons: (1) in the short-term I can see various reasons that more altruistic people might do more useful things, or (2) I expect social values to be “sticky,” perhaps because more altruistic people will work to create a more altruistic world, and so I hope that increasing the altruism of existing people will make the entire future more altruistic. I think (1) is sensible though in practice it doesn’t look very highly-leveraged. In this post I want to talk about (2).
  • I’m interested in the values that people and groups ultimately implement, rather than the values they currently seem to exhibit. For these purposes I don’t so much care how others feel about object-level disputes (e.g. importance of animal suffering, theism, moral realism, importance of scriptural sanctions against sodomy, etc. etc.) , except insofar as it is evidence about their dispositions. Instead, I care about the kind of people and organizations and cultural institutions that others would choose to create (if they were free to choose). And the kind of people their descendants would choose to succeed them, and so on, and finally on the values ultimately endorsed by the result of such a process of evolution. Trying to encourage others to engage in this process of reflection is expected to change their behavior and their apparent values, in a way that is positive both according to me and them.
  • I have many preferences which I would call non-altruistic. For example, I would prefer that I have a cookie than that you have a cookie. I care more about myself and those around me than about others. However, what I am interested in with this blog, and what I think are the lowest hanging fruits for large-scale social coordination, are the subset of my values that I would call altruistic. In this post I’m going to sometimes use “values/preferences” and “altruistic values/preferences” interchangeably. On the other hand, I do mean to include making other people altruistic as a candidate intervention (one that I don’t think is reasonable).
  • The position I’m arguing for here is an empirical one, which depends on contingent facts about the world and about moral psychology. There are many possible worlds in which it wouldn’t hold, and empirical facts I could learn that would convince me that it is wrong.

The arguments

Here are the arguments which make me skeptical about influencing long-run social values:

  1. Most important for me are the decision-theoretic / pragmatic considerations in favor of “being nice.” The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number compelling justifications for each individual to cooperate.
  2. My preferences are loosely defined by reference to what I would judge good upon reflection. I think that most humans endorse a similar principle of reflection when pressed, and that only a relatively small number would choose to “lock in” their current values at the expense of further reflection; this consideration tends to make me supportive of futures determined by others’ values.
  3. I care much more strongly about those aspects of my preferences which are non-idiosyncratic. I am willing to mostly discard judgments that depend on my current blood sugar, and to a lesser but still significant extent to discount judgments that depend on quirks of my life history or biology. If I discovered that other people had reflected on it and thought that an outcome was particularly good, I would suspect that this outcome was indeed good according to my own considered preferences.
  4. In practice humans seem to endorse a relatively narrow spectrum of altruistic goals. Most people agree broadly about which things are good and which are bad (with a few notable exceptions) and where there is disagreement about what is good, it seems to be possible to pursue many competing notions of goodness with only modest losses in efficiency.

I think that these considerations together suggest that changing social values is at least an order of magnitude less important than we might otherwise expect. I think that convincing an additional 10% of the world to share my values would increase the value of the future by less than 1% via it’s effect on far future values (it may have much larger effects for better or worse by changing people’s behavior in the near term, as described above). Moreover, I would strongly consider pursuing an intervention that reduced the risk of extinction by 0.1% instead, for decision-theoretic reasons.

Why does it matter?

There are a number of interventions which try to directly change people’s values, perhaps by promoting particular political or philosophical views, by convincing people not to eat meat or animal products, by improving mindfulness or promoting various religious agendas, and so on. Other projects might not try to change people’s values, but might try to change the social balance between different values by increasing the influence of people with particular values. I think the vast majority of these proposals are unreasonable prima facie (as with most charitable activities), but some of them might warrant further investigation if we thought that changing social values was likely to be important.

More importantly to me, many of these issues come up when evaluating the significance of various disasters which we might try to mitigate. For example, if contemporary human civilization was destroyed but humanity survived, it would potentially have a huge impact on the structure of society. If humans managed to kill each other completely, but failed to scour the Earth entirely, it would result in an even more radically different future (supposing that the earth was still healthy enough to eventually cough up some successors). How bad are those changes? Should we resist the destruction and wholesale replacement of our civilization more than we would resist a normal catastrophe that left civilization intact?

This post describes 4 broad arguments against influencing social values. For most applications of interest a proper subset of these arguments will apply, but it still seems worth laying them all out in one place.

The arguments

1. In favor of “being nice”

Broadly, there are two sorts of arguments in favor of being nice, even towards people who have completely alien values, both of which I find very compelling.

The first reason is that cooperation can lead to significant gains from trade, even when direct opportunities for cooperation are not available. I am glad to work on a cause which most people believe to be good, rather than trying to distort the values of the future at their expense. This helps when seeking approval or accommodation for projects, when trying to convince others to help out, and in a variety of less salient cases. I think this is a large effect, because much of the impact that altruistic folks can hope to have comes from the rest of the world being basically on board with their project and supportive.

The second reason is that by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we should expect some of those gains to accrue to us—even if we tell ex post that they won’t. This is a much more technical point about decision theory, which I don’t want to dwell on too much (see e.g. Gary Drescher’s Good & Real for a good exposition of it). The basic intuition is already found in the prisoner’s dilemma: if we have an opportunity to impose a large cost on a confederate for our own gain (who has a similar opportunity), should we do it? What if the confederate is a perfect copy of ourselves, created X seconds ago and leading an independent life since then? How large does X have to be before we defect? What if the confederate does not have a similar opportunity, or if we can see the confederate’s choice before we make our own? Consideration of such scenarios tends to put pressure on simplistic accounts of decision theory, and working through the associated mathematics and seeing coherent alternatives has led me to take them very seriously. I would often cooperate on the prisoner’s dilemma without a realistic hope of reciprocation, and I think the same reasoning can be applied (perhaps even more strongly) at the level of groups of people.

2. Optimism about reflection

My preferences are defined (loosely) in terms of what I would consider good upon reflection. I think that most people endorse a similar principle in practice, and that only a small fraction of people would relinquish the opportunity to reflect under normal conditions. I think that if civilization prospers and creates any value at all, then it will be because people were able to keep this option open. This is again a messy empirical question, but the basic justification is that our values are sufficiently complex that any attempt to pin them down in the year 2100 is simply unlikely to capture most of what we care about. If the future has to be run by a totalitarian government based on principles written down precisely in the year 2100 I don’t expect the future to be very good, and so I don’t much care about influencing what principles people would choose to write down in that situation. (Of course a totalitarian government doesn’t seem like a likely endpoint for our civilization—more like a rocky intermediate step at worst—but there are more realistic analogous situations.)

So in the very long run I expect people (and society collectively) to have engaged in considerable reflection about what we prefer and why. I think that my own altruistic preferences differ from others largely in that I have taken more time to clarify them and have been considerably more explicit about this process; I also expect my moral views to continue to change in the future. So in the very long-run I expect social values to accord relatively well with my own values.

Moreover I think that contemporary moral differences are not particularly important either as a determinant about preferences-on-reflection or as an indicator about those preferences. Most of these judgments are based on relatively little reflection and, at least anecdotally, they seem to be pretty contingent. Even if I would prefer that I own the future than others, I don’t care much about the difference between two people, one of whom shares my superficial values. I don’t think that two people who vote Democrat are too much more likely to agree with each other upon extensive reflection than two arbitrary people—I think the differences between Democrats and Republicans are quite contingent—and I think the situation is basically the same with respect to philosophical arguments at the level of depth that they have currently been explored. So if a Democrat said that they wanted to convince people to become Democrats so that the future would be full of people who shared their values, I would be highly skeptical. And again, I think a similar situation obtains for most modern issues.

3. I care directly about others’ values

To the extent that my values would differ from others’ values upon reflection, I find myself strongly inclined to give some weight to others’ preferences. There are basically two angles that lead to this conclusion:

  1. I feel inclined to reject altruistic preferences that are contingent. To the extent that I would prefer something different if I had been raised slightly differently, I feel very strongly inclined to honor my counterfactual preferences. I think these are the same intuitions that motivate moral realists (though I think I am best described as a dedicated anti-realist).
  2. Relatedly, I feel inclined to honor existing people’s preferences. This is closely related to section (1) above, and I feel like it would be double-counting to consider both decision-theoretic arguments in favor of niceness and terminal values for niceness (that is, I expect that I have intuitions directly about what I ought to do, rather than about what states of the world are valuable). But these intuitions still seem to bolster the view that I ought to be nice by providing a number of independent lines of justification each of which could stand on its own. (This also gives me some reason to care about the welfare of existing people, rather than focusing exclusively on much more numerous future people; I’ll discuss this issue in future posts.)

I find these intuitions relatively compelling, and they further whittle away at the gains from influencing social values. These intuitions are also closely related to the intuitions that move me to care about aggregate welfare, rather than being concerned only with my own welfare. Again, it is easy to double-count here, but I think that these intuitions are an important force shaping my values and make it much less likely that I should try to convince other people to share my own values.

4. Convergence of values

Although there are some significant philosophical disagreements amongst existing people, they do not seem to endorse too “wide” a spread of different values. Perhaps some people would prefer worlds with larger populations while others would prefer worlds with smaller, happier populations. Perhaps some think that almost all value is in complex human brains while others think that insects have morally relevant experiences. But even accounting for many of these differences, it seems to me that the space of all human values is not very broad.

To formalize the divergence between two sets of values V1 and V2, we could ask the following hypothetical question: to what extent is it possible to create a world that was good according to both V1 and V2? Formally, let Wi be a world optimized according to Vi, and for any world W let pi be such that Vi is indifferent between W and { a probability pi of Wi and a probability (1 – pi) of 0 }. Then given a set of values V1, V2, …, Vn we can ask what the maximum possible value of p1 + p2 + … + pn is over all feasible worlds W. My vague guess is that if we performed this exercise with the considered values of all living humans, we would get something like 0.1 to 0.5 (per person). This formalization is a bit sketchy, and the conclusion is little more than a restatement of my intuitions, but I hope that using this kind of approach for thinking about value divergence can at least help improve communication.

Other cases


One way in which different people’s values differ is in the extent to which they care about themselves vs. other people. I don’t know what happens to this distinction in the long run (I don’t know what happens to the notion of identity, or altruistic inclinations, upon extensive reflection) but if some people would choose to monopolize a significant fraction of social resources for their own benefit, it could potentially significantly decrease the whole value of the universe.

I tend to think this is not a significant concern for a number of reasons. First is that arguments 1 applies very well in this setting and recommends cooperation. Second is that ordinary self-interest seems to be relatively satiable, so that most people would be disinclined to monopolize a billion stars for their own advantage. Third is that because most people’s self-interest seems to be relatively satiable and short-sighted, I expect the self-interested parts of people’s motivations to mostly trade vast resources in the distant future for modest resources soon, and to mostly trade the possibility of astronomical resources for a certainty of more modest resources. Fourth is that arguments 3 & 4 from above apply, though to a lesser extent than for altruistic values—most people’s idea of a good time is not optimal by my lights, but it is still pretty good. There seems to be a good chance that sufficiently advanced self-interest requires the existence of large prosperous worlds that I would consider valuable. (Though in the modern world massive expenditures for selfish goals don’t seem to produce much value.)

Each of these considerations is relatively weak, but I think on average each reduces the significance of selfishness by a factor of about 2 in expectation, so the total effect is to massively reduce the badness of selfishness in expectation. On balance I feel unenthusiastic about trying to make the distant future more altruistic (though if there were really great opportunities to do so, or significant considerations I haven’t yet considered, this could easily be revised).

Stranger things than these

This discussion has focused on the difference between different people’s values in modern society. Such differences might be relatively small compared to the differences between completely different populations.

  1. Different human societies might have wildly different values, but I think the arguments above still apply almost entirely wholesale (the only exception being that there are no opportunities for gains from trade). So if modern civilization is destroyed and eventually successfully rebuilt, I think we should treat that as recovering most of Earth’s altruistic potential (though I would certainly hate for it to happen).
  2. Different intelligent species might have even more fundamentally different values. Now arguments 1 and 3 apply in force, but arguments 2 and 4 are substantially weakened. These arguments can be salvaged to some extent by looking at other moral intuitions, particularly metamoral intuitions regarding parsimony, and seeing what aspects of human cognition human morality depends most sensitively on. I would guess that if humanity is destroyed and replaced by e.g. great apes, this would modestly reduce Earth’s potential according to my values. But this scenario makes the decision-theoretic argument in 1 particularly potent, and I think it would be an error to treat such a collapse as a huge cost (again, setting aside the massive loss of human life, the death of me and everyone I love, and so on). The situation is pretty similar though more extreme if we consider intelligent life from elsewhere in the universe. This is a bit of a weird topic, but I do hope to spend a little more time on it in a future post, explaining in more detail why I think we should be nice to aliens we encounter even if their lives are completely morally worthless by our accounting.
  3. Automation that humans fail to control might pursue values which no humans share. If most available resources were controlled by such automations, I would consider that a very significant moral loss, destroying most of society’s potential. Arguments 2-4 do not seem to very reliably apply in this case, the gains-from-trade argument does not apply, and it seems superficially erroneous to apply the decision-theoretic argument in 1 to this case. However, the assertion that the decision-theoretic arguments don’t apply is a bit sensitive to the nature of the automation (and particularly where their values do come from) and to my uncertainty about the strength and conclusions of the decision-theoretic arguments. (Clarifying these points is closely related to my moral intuitions about this kind of “corner case.”) So I could imagine changing my position on this point, and I would certainly guess that an outcome where uncontrolled automations push society in arbitrary directions is much better than an outcome in which Earth is scoured.