Against moral advocacy
by paulfchristiano
Sometimes people talk about changing long-term social values as an altruistic intervention; for example, trying to make people care more about animals, or God, or other people, or ancestors, etc., in the hopes that these changes might propagate forward (say because altruistic people work to create a more altruistic world) and eventually have a direct effect on how society uses available resources. I think this is unlikely to be a reasonable goal, not necessarily because it is not possible (though it does seem far-fetched), but because even if it were possible it would not be particularly desirable. I wanted to spend a post outlining my reasoning.
Disclaimer: this is a bit of an odd post. The impatient reader is recommended to skip it.
Introduction
Clarifications
First, some clarifications:
- I expect that improving tech and the continuation of natural selection will push society towards equilibria where important values are enshrined and carefully maintained. I’m focusing on such scenarios, and imagining the very long-run values of society, which will determine the behavior of society in the very long run. Contemporary social values won’t directly determine long-run values, and indeed a huge change in contemporary values may correspond to a relatively modest change in long-run values, but we might nevertheless think those small effects on the very long-run are quite important.
- Changing people’s values might be instrumentally useful in the short-term; I am considering changing people’s values in the long-term. For example, suppose I am trying to convince people to be more altruistic. I might be doing this for two quite different reasons: (1) in the short-term I can see various reasons that more altruistic people might do more useful things, or (2) I expect social values to be “sticky,” perhaps because more altruistic people will work to create a more altruistic world, and so I hope that increasing the altruism of existing people will make the entire future more altruistic. I think (1) is sensible though in practice it doesn’t look very highly-leveraged. In this post I want to talk about (2).
- I’m interested in the values that people and groups ultimately implement, rather than the values they currently seem to exhibit. For these purposes I don’t so much care how others feel about object-level disputes (e.g. importance of animal suffering, theism, moral realism, importance of scriptural sanctions against sodomy, etc. etc.) , except insofar as it is evidence about their dispositions. Instead, I care about the kind of people and organizations and cultural institutions that others would choose to create (if they were free to choose). And the kind of people their descendants would choose to succeed them, and so on, and finally on the values ultimately endorsed by the result of such a process of evolution. Trying to encourage others to engage in this process of reflection is expected to change their behavior and their apparent values, in a way that is positive both according to me and them.
- I have many preferences which I would call non-altruistic. For example, I would prefer that I have a cookie than that you have a cookie. I care more about myself and those around me than about others. However, what I am interested in with this blog, and what I think are the lowest hanging fruits for large-scale social coordination, are the subset of my values that I would call altruistic. In this post I’m going to sometimes use “values/preferences” and “altruistic values/preferences” interchangeably. On the other hand, I do mean to include making other people altruistic as a candidate intervention (one that I don’t think is reasonable).
- The position I’m arguing for here is an empirical one, which depends on contingent facts about the world and about moral psychology. There are many possible worlds in which it wouldn’t hold, and empirical facts I could learn that would convince me that it is wrong.
The arguments
Here are the arguments which make me skeptical about influencing long-run social values:
- Most important for me are the decision-theoretic / pragmatic considerations in favor of “being nice.” The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number compelling justifications for each individual to cooperate.
- My preferences are loosely defined by reference to what I would judge good upon reflection. I think that most humans endorse a similar principle of reflection when pressed, and that only a relatively small number would choose to “lock in” their current values at the expense of further reflection; this consideration tends to make me supportive of futures determined by others’ values.
- I care much more strongly about those aspects of my preferences which are non-idiosyncratic. I am willing to mostly discard judgments that depend on my current blood sugar, and to a lesser but still significant extent to discount judgments that depend on quirks of my life history or biology. If I discovered that other people had reflected on it and thought that an outcome was particularly good, I would suspect that this outcome was indeed good according to my own considered preferences.
- In practice humans seem to endorse a relatively narrow spectrum of altruistic goals. Most people agree broadly about which things are good and which are bad (with a few notable exceptions) and where there is disagreement about what is good, it seems to be possible to pursue many competing notions of goodness with only modest losses in efficiency.
I think that these considerations together suggest that changing social values is at least an order of magnitude less important than we might otherwise expect. I think that convincing an additional 10% of the world to share my values would increase the value of the future by less than 1% via it’s effect on far future values (it may have much larger effects for better or worse by changing people’s behavior in the near term, as described above). Moreover, I would strongly consider pursuing an intervention that reduced the risk of extinction by 0.1% instead, for decision-theoretic reasons.
Why does it matter?
There are a number of interventions which try to directly change people’s values, perhaps by promoting particular political or philosophical views, by convincing people not to eat meat or animal products, by improving mindfulness or promoting various religious agendas, and so on. Other projects might not try to change people’s values, but might try to change the social balance between different values by increasing the influence of people with particular values. I think the vast majority of these proposals are unreasonable prima facie (as with most charitable activities), but some of them might warrant further investigation if we thought that changing social values was likely to be important.
More importantly to me, many of these issues come up when evaluating the significance of various disasters which we might try to mitigate. For example, if contemporary human civilization was destroyed but humanity survived, it would potentially have a huge impact on the structure of society. If humans managed to kill each other completely, but failed to scour the Earth entirely, it would result in an even more radically different future (supposing that the earth was still healthy enough to eventually cough up some successors). How bad are those changes? Should we resist the destruction and wholesale replacement of our civilization more than we would resist a normal catastrophe that left civilization intact?
This post describes 4 broad arguments against influencing social values. For most applications of interest a proper subset of these arguments will apply, but it still seems worth laying them all out in one place.
The arguments
1. In favor of “being nice”
Broadly, there are two sorts of arguments in favor of being nice, even towards people who have completely alien values, both of which I find very compelling.
The first reason is that cooperation can lead to significant gains from trade, even when direct opportunities for cooperation are not available. I am glad to work on a cause which most people believe to be good, rather than trying to distort the values of the future at their expense. This helps when seeking approval or accommodation for projects, when trying to convince others to help out, and in a variety of less salient cases. I think this is a large effect, because much of the impact that altruistic folks can hope to have comes from the rest of the world being basically on board with their project and supportive.
The second reason is that by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we should expect some of those gains to accrue to us—even if we tell ex post that they won’t. This is a much more technical point about decision theory, which I don’t want to dwell on too much (see e.g. Gary Drescher’s Good & Real for a good exposition of it). The basic intuition is already found in the prisoner’s dilemma: if we have an opportunity to impose a large cost on a confederate for our own gain (who has a similar opportunity), should we do it? What if the confederate is a perfect copy of ourselves, created X seconds ago and leading an independent life since then? How large does X have to be before we defect? What if the confederate does not have a similar opportunity, or if we can see the confederate’s choice before we make our own? Consideration of such scenarios tends to put pressure on simplistic accounts of decision theory, and working through the associated mathematics and seeing coherent alternatives has led me to take them very seriously. I would often cooperate on the prisoner’s dilemma without a realistic hope of reciprocation, and I think the same reasoning can be applied (perhaps even more strongly) at the level of groups of people.
2. Optimism about reflection
My preferences are defined (loosely) in terms of what I would consider good upon reflection. I think that most people endorse a similar principle in practice, and that only a small fraction of people would relinquish the opportunity to reflect under normal conditions. I think that if civilization prospers and creates any value at all, then it will be because people were able to keep this option open. This is again a messy empirical question, but the basic justification is that our values are sufficiently complex that any attempt to pin them down in the year 2100 is simply unlikely to capture most of what we care about. If the future has to be run by a totalitarian government based on principles written down precisely in the year 2100 I don’t expect the future to be very good, and so I don’t much care about influencing what principles people would choose to write down in that situation. (Of course a totalitarian government doesn’t seem like a likely endpoint for our civilization—more like a rocky intermediate step at worst—but there are more realistic analogous situations.)
So in the very long run I expect people (and society collectively) to have engaged in considerable reflection about what we prefer and why. I think that my own altruistic preferences differ from others largely in that I have taken more time to clarify them and have been considerably more explicit about this process; I also expect my moral views to continue to change in the future. So in the very long-run I expect social values to accord relatively well with my own values.
Moreover I think that contemporary moral differences are not particularly important either as a determinant about preferences-on-reflection or as an indicator about those preferences. Most of these judgments are based on relatively little reflection and, at least anecdotally, they seem to be pretty contingent. Even if I would prefer that I own the future than others, I don’t care much about the difference between two people, one of whom shares my superficial values. I don’t think that two people who vote Democrat are too much more likely to agree with each other upon extensive reflection than two arbitrary people—I think the differences between Democrats and Republicans are quite contingent—and I think the situation is basically the same with respect to philosophical arguments at the level of depth that they have currently been explored. So if a Democrat said that they wanted to convince people to become Democrats so that the future would be full of people who shared their values, I would be highly skeptical. And again, I think a similar situation obtains for most modern issues.
3. I care directly about others’ values
To the extent that my values would differ from others’ values upon reflection, I find myself strongly inclined to give some weight to others’ preferences. There are basically two angles that lead to this conclusion:
- I feel inclined to reject altruistic preferences that are contingent. To the extent that I would prefer something different if I had been raised slightly differently, I feel very strongly inclined to honor my counterfactual preferences. I think these are the same intuitions that motivate moral realists (though I think I am best described as a dedicated anti-realist).
- Relatedly, I feel inclined to honor existing people’s preferences. This is closely related to section (1) above, and I feel like it would be double-counting to consider both decision-theoretic arguments in favor of niceness and terminal values for niceness (that is, I expect that I have intuitions directly about what I ought to do, rather than about what states of the world are valuable). But these intuitions still seem to bolster the view that I ought to be nice by providing a number of independent lines of justification each of which could stand on its own. (This also gives me some reason to care about the welfare of existing people, rather than focusing exclusively on much more numerous future people; I’ll discuss this issue in future posts.)
I find these intuitions relatively compelling, and they further whittle away at the gains from influencing social values. These intuitions are also closely related to the intuitions that move me to care about aggregate welfare, rather than being concerned only with my own welfare. Again, it is easy to double-count here, but I think that these intuitions are an important force shaping my values and make it much less likely that I should try to convince other people to share my own values.
4. Convergence of values
Although there are some significant philosophical disagreements amongst existing people, they do not seem to endorse too “wide” a spread of different values. Perhaps some people would prefer worlds with larger populations while others would prefer worlds with smaller, happier populations. Perhaps some think that almost all value is in complex human brains while others think that insects have morally relevant experiences. But even accounting for many of these differences, it seems to me that the space of all human values is not very broad.
To formalize the divergence between two sets of values V1 and V2, we could ask the following hypothetical question: to what extent is it possible to create a world that was good according to both V1 and V2? Formally, let Wi be a world optimized according to Vi, and for any world W let pi be such that Vi is indifferent between W and { a probability pi of Wi and a probability (1 – pi) of 0 }. Then given a set of values V1, V2, …, Vn we can ask what the maximum possible value of p1 + p2 + … + pn is over all feasible worlds W. My vague guess is that if we performed this exercise with the considered values of all living humans, we would get something like 0.1 to 0.5 (per person). This formalization is a bit sketchy, and the conclusion is little more than a restatement of my intuitions, but I hope that using this kind of approach for thinking about value divergence can at least help improve communication.
Other cases
Selfishness
One way in which different people’s values differ is in the extent to which they care about themselves vs. other people. I don’t know what happens to this distinction in the long run (I don’t know what happens to the notion of identity, or altruistic inclinations, upon extensive reflection) but if some people would choose to monopolize a significant fraction of social resources for their own benefit, it could potentially significantly decrease the whole value of the universe.
I tend to think this is not a significant concern for a number of reasons. First is that arguments 1 applies very well in this setting and recommends cooperation. Second is that ordinary self-interest seems to be relatively satiable, so that most people would be disinclined to monopolize a billion stars for their own advantage. Third is that because most people’s self-interest seems to be relatively satiable and short-sighted, I expect the self-interested parts of people’s motivations to mostly trade vast resources in the distant future for modest resources soon, and to mostly trade the possibility of astronomical resources for a certainty of more modest resources. Fourth is that arguments 3 & 4 from above apply, though to a lesser extent than for altruistic values—most people’s idea of a good time is not optimal by my lights, but it is still pretty good. There seems to be a good chance that sufficiently advanced self-interest requires the existence of large prosperous worlds that I would consider valuable. (Though in the modern world massive expenditures for selfish goals don’t seem to produce much value.)
Each of these considerations is relatively weak, but I think on average each reduces the significance of selfishness by a factor of about 2 in expectation, so the total effect is to massively reduce the badness of selfishness in expectation. On balance I feel unenthusiastic about trying to make the distant future more altruistic (though if there were really great opportunities to do so, or significant considerations I haven’t yet considered, this could easily be revised).
Stranger things than these
This discussion has focused on the difference between different people’s values in modern society. Such differences might be relatively small compared to the differences between completely different populations.
- Different human societies might have wildly different values, but I think the arguments above still apply almost entirely wholesale (the only exception being that there are no opportunities for gains from trade). So if modern civilization is destroyed and eventually successfully rebuilt, I think we should treat that as recovering most of Earth’s altruistic potential (though I would certainly hate for it to happen).
- Different intelligent species might have even more fundamentally different values. Now arguments 1 and 3 apply in force, but arguments 2 and 4 are substantially weakened. These arguments can be salvaged to some extent by looking at other moral intuitions, particularly metamoral intuitions regarding parsimony, and seeing what aspects of human cognition human morality depends most sensitively on. I would guess that if humanity is destroyed and replaced by e.g. great apes, this would modestly reduce Earth’s potential according to my values. But this scenario makes the decision-theoretic argument in 1 particularly potent, and I think it would be an error to treat such a collapse as a huge cost (again, setting aside the massive loss of human life, the death of me and everyone I love, and so on). The situation is pretty similar though more extreme if we consider intelligent life from elsewhere in the universe. This is a bit of a weird topic, but I do hope to spend a little more time on it in a future post, explaining in more detail why I think we should be nice to aliens we encounter even if their lives are completely morally worthless by our accounting.
- Automation that humans fail to control might pursue values which no humans share. If most available resources were controlled by such automations, I would consider that a very significant moral loss, destroying most of society’s potential. Arguments 2-4 do not seem to very reliably apply in this case, the gains-from-trade argument does not apply, and it seems superficially erroneous to apply the decision-theoretic argument in 1 to this case. However, the assertion that the decision-theoretic arguments don’t apply is a bit sensitive to the nature of the automation (and particularly where their values do come from) and to my uncertainty about the strength and conclusions of the decision-theoretic arguments. (Clarifying these points is closely related to my moral intuitions about this kind of “corner case.”) So I could imagine changing my position on this point, and I would certainly guess that an outcome where uncontrolled automations push society in arbitrary directions is much better than an outcome in which Earth is scoured.
What sense of ‘written down precisely’ do you have in mind in argument 2?
Anything that precludes further reflection, is the important category. I was imagining scenarios where people build effective consequentialist systems that follow some explicitly represented goal, like an organization’s mission statement or an algorithmically represented utility function.
Interesting post, Paul!
1. Being nice
a. Yes, making compromises can help with getting more supporters, and this is always a tradeoff that groups make, between how strongly to hold to principles vs. how much to appease others. Given diminishing marginal value of additional donations, supporters, etc., it’s not obvious you need to compromise very much for what you work on individually. The overall movement is more likely to succeed in the long term if it’s mainstream, but I would lean against compromising away so much because according to my values, that would represent a big loss. Anyway, it’s not clear if the “door in the face” effect is weaker or stronger than the “foot in the door” effect — sometimes promoting radical values helps move the debate more in your direction than if you compromised.
b. I would cooperate on Prisoner’s dilemma with an AI, but most people in the world right now are not such agents. People like you are more in that direction, but most of the world, and even many powerful people, are not thinking in Prisoner’s-dilemma terms right now. Anyway, why do we need to solve the Prisoner’s dilemma timelessly? Why can’t we do it the way politics does it: Form coalitions, have debates, and then make _explicit_ compromises upon negotiation? We can’t do this for runaway AIs in the basement — hence the game-theoretic part of Eliezer’s motivation for CEV. But we can do it for many power struggles, including those over allocation of (altruist and other) resources today.
2. Reflection
I think you underestimate the potentially extreme path-dependence of “values upon reflection.” What determines the outcome of reflection? It’s mostly the initial conditions that are being reflected upon. Sure, there are some pretty stable features of our minds due to biological dispositions, cooperative tendencies, etc. But I think there are major butterfly effects in future values too.
I think one reason for the illusion of convergence is cultural homogenization. If you mix red and blue dies and swirl them, it looks like “it all converges to the same color.” But that’s not to say if you added more red, the result wouldn’t be more red. Values spreading is like adding more red.
I also disagree with this: “I think that if civilization prospers and creates any value at all, then it will be because people were able to keep this option open. This is again a messy empirical question, but the basic justification is that our values are sufficiently complex that any attempt to pin them down in the year 2100 is simply unlikely to capture most of what we care about.” My own values are short-sighted enough that I could capture much of their value by picking something now rather than waiting to see what I thought later. This is because I only value my future opinions on some topics, and on other topics, I prefer my current opinions over whatever my future self might gravitate toward.
This is another big source of disagreement between us, because I think it’s not at all obvious we’ll have a civilization in which people debate and reflect upon their values in a democratic fashion. It’s also very possible people will (try to) lock in something in the year 2100, and that thing will be highly dependent on values in the world today. In fact, in my opinion, long-term values not locked in may be basically worthless because there’s so much entropy between me and them. In my opinion, the “lock in values at 2100” scenario is where _most of the expected value_ comes from. 🙂
3. Caring directly
I don’t share the intuition very much, especially when I imagine applying it to very alien values. What would you do with the baby eaters from “Three Worlds Collide”? The pebble sorters? Paperclip maximizers? People who think homosexuality, inter-racial marriage, birth control, flag burning are wrong? People who think others should burn forever for disrespecting God? All sorts of other hypothetical and actual minds that grossly disagree with you, including suffering maximizers? Perhaps you would claim that you have hope that suffering minimizers are more common in the multiverse than suffering maximizers. But what if it were otherwise? Would you then really incline toward the idea that torture is good?
4. Convergence
I think values are much more divergent than you suggest. 🙂
There are lots of ways to _force_ them to converge, but it’s not clear this is meaningful beyond a “history is written by the winners” kind of approach. Values do sometimes converge in practice, but this is different from suggesting that there’s any particular improvement going on. For example, our world today is (arguably) more like ancient Athens than ancient Sparta, and as a result, we have a certain set of values. If society were more like Sparta today, we would have a different set of values. We might somewhat converge on them, but the sense in which Spartan values are better than Athenian values would just come down to historical trajectories. Similarly, it may be that economic, political, and technological forces of the future push values in a certain direction that look not too different, but why should we favor them over other possibilities?
Thus, I don’t see future values as necessarily privileged above present-day values. They’re different but not “better” in a sense that I care about now. And it’s clear there’s enormous disagreement about values today.
Eventually, it’s plausible that a singleton will win the light cone, and this could enforce homogeneity of values. This doesn’t say much about whether we agree with what it enforces.
I think small differences in neural wiring can lead to big differences in dispositions. Some people want to create as many minds as possible; others want to create as few as possible. Some want to maximize worship of a divine spirit; others want to extinguish this impulse. Some want criminals to be punished; others want them to be cared for. There are billions of egoists all of whose (terminal) values sharply diverge, though you may be right that most of them can be satisfied without linear maximization of copies of themselves. There are many more differences like these.
I think there are many people (myself included) who would regard the average-case world as many orders of magnitude worse than their preferred world. People like you who hold an ensemble of values and want to account for those of other people aren’t in this category, but there are many others unlike you. A hard-core hedonistic utilitarian should regard a hedonium-filled universe as at least, say, ~100 times better than the average-case universe (unless there are at least 1% hedonistic utilitarians in the population).
Keep in mind, as I pointed out above, I’m not suggesting there needs to be competition in the end. It would be great if the hedonistic utilitarians made a bargain with the other parties to get some hedonium in exchange for the others getting some of what they want. But this doesn’t mean they can’t also push to expand their own sides as well.
Politics is a nice general example. A lot of what political parties and interest groups do amounts to values spreading: Promoting themselves to expand their support base. Then when it comes to battles over policy, the groups compromise. It seems plausible a similar model would continue into the future, and if so, values spreading would be an obvious approach in that case as well. Are you proposing that this political approach is itself broken and in need of improvement through better consideration of game theory and more cosmopolitan reflection on each other’s values?
Perhaps you would suggest that politics is too “zero sum” and should be replaced by trades for compromise — e.g., pro-life and pro-choice groups agreeing to each direct resources to another cause they both think is good? This sounds promising, but it’s not well established. In practice, most political teams defect on Prisoner’s Dilemma. If you want to change this system, that’s admirable, but readers should realize that this is decidedly not the mainstream approach.
As far as: “All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number compelling justifications for each individual to cooperate.”
Negative and negative-leaning utilitarians prefer a smaller pie. 🙂 Their preferred outcomes are not just not aligned but actually _negatively_ correlated with those of most other value systems.
The three categories you discuss in the end — different human societies, different intelligent species, and automation — don’t apply to altruistic values spreading, so although I don’t agree with parts of what you say there, they aren’t relevant to the main discussion.
“I would certainly guess that an outcome where uncontrolled automations push society in arbitrary directions is much better than an outcome in which Earth is scoured.”
Oh dear. 😦
Anyway, thanks again for producing such rich intellectual fodder through your writings!
– For most causes, I imagine that “how many supporters you have” is not the only or even main way that you need the world to cooperate. You also want to influence policy-makers, academics, business-folk, funders, etc., and this is easier where they are basically on board with your project.
– You don’t have to have an instance of the prisoner’s dilemma to “solve.” The case is intended to illustrate intuitions about cooperation.
– I think there are possible routes that close off further reflection, and perhaps you have gone down one of them, but these seem to be relatively rare. Most people would be willing to say “yeah, I would prefer it if I thought about these considerations more.”
– It seems like you have relatively rare values that can be explicitly written down at the moment, since they are satisfied by a barren universe. For most values a barren universe is quite bad, and then the difficulty of precisely codifying values is much more extreme.
– Yes, negative utilitarians have some of the few preferences that are anti-correlated with others’. I agree that argument 4 won’t apply to you.
– Suffering minimizers are almost certainly more common than suffering maximizers. Suffering’s evolutionary function is to be a thing that you minimize. I agree that there are various sorts of pathological minds it is harder to sympathize with. Certainly in the case of ‘3 words collide’ and similar thought experiments I am inclined to have sympathy for the others’ values.
– I am claiming that political advocacy is worse than it would otherwise appear because of its zero-sumness. But you don’t even need decision-theory for that one, you can just say “mostly people agree on values and disagree about empirical questions, and half of them are wrong.” I don’t think politics is “broken,” but I do think these kinds of considerations modestly undermine the value of advocacy.
– I think we all agree that long-run people will bargain more effectively, the question is whether you ought to refrain from burning resources while increasing your own influence (in particular because from a prior position this increases the probability of others doing the same, but all of the points from this post bear on the question).
Thanks for the replies!
– Sure. By “supporters” I meant to include influential people. The fact remains you can find some of them with extreme values, and you might be more persuasive as an individual if you have extreme values than moderate ones. This is an empirical and context-specific question.
– I find it hard to believe that if I choose to cooperate, many more people would do so. Maybe there are a few rational altruists like us out there whom my decision would timelessly influence. I don’t think it would have significant implications for society at large.
– I prefer to think more on some questions (“Do I want to weight by brain size?”) but not others (“Do I also care about the intrinsic value of truth?”). In practice, I do muse about both kinds for fun, but I hope my future self doesn’t come to be swayed by musings on the latter type.
– Yes. 🙂
– Yeah.
– 😦
– The relevant political disagreements to this discussion are about values, like abortion, stem-cell research, same-sex marriage, flag burning, racial discrimination, etc. Factual disagreements play little role in these cases.
Also, I don’t see that “half of people being wrong” undermines advocacy. A sizeable minority of people are wrong about thinking global warming is not real. This doesn’t undermine global-warming advocacy, because we know those people are wrong. (Sure, there are still many harder questions after that point, but at least some of the factual issues are obvious.)
– Won’t you be “burning resources” regardless of what you do? Maybe you meant “burning the cosmic commons” in by not working to reduce extinction risk?
Until cooperation becomes easier and more robust, maybe you should still promote your own pet cause.
In your formalisation of the compromise, it seems to me that you need a couple of changes (which you are probably aware of). The sum should presumably be an average, and you should presumably weight each component by credence in it, or by the proportion of people who support it or something like that. Also, the 0 should presumably be ‘the value of a universe consisting of nothing but empty space’ or the like, as moral theories don’t always come with inbuilt ratio scales. Even this might not be enough as some theories don’t have a defined value at all for that universe (e.g. average utilitarianism).
Indeed, since you are basically doing a form of normalisation based on the range of value (or the positive part thereof), you would presumably do better by doing normalisation based on variance instead (basically using the z-scores according to each theory). Owen, Will, and I have discussed this extensively and think that variance typically works better.
Good points.
I think in this case there are reasonable reasons to use 0 = “everyone dies today,” since that quantity is decision-relevant. It seems like using variance is tricky because of the choice of distribution over which to take the variance (though I remember talking to Owen about this and you guys having some slick solutions, it still seemed pretty fraught). I guess what we really care about is the relative value of a 1% boost in influence vs. a 1% boost in survival probability, and we’d want to think about the cleanest abstractions that let us reason about that productively.
An interesting consideration when thinking about other counterfactual species’ values is that our level of altruism is correlated with facts of nature such as “which gender moves out for reproduction”,”how frequent is the necessity of big game hunting in a society”, “which sort of certainty males have of who is whom’s parent”, “how frequently is it necessary that childcare is done by non-parents”. Much of how we behave today in iterated dilemmas has been shaped by natural selection in proportion to these natural facts, and were the natural facts different, such as evolving from chimps, bonobos or orangs, this could have deep long term consequences for societal values.
[…] Paul Christiano argues that moral advocacy may turn out to not be the best way to go about building EA, an intensional movement, even though […]
[…] development. Even immediate factors can modify our outlooks from moment to moment. Paul Christiano notes that blood sugar has an effect. (For instance,judges are more lenient after meals and snacks.) […]
I hope you are right, but in a world where claims such as “conscious beings no less able to suffer than a baby being eaten alive is problematic regardless of whether it’s natural” and “a moral all-powerful being would not create an infinite amount of torture-qualia for every human that breaks with its standards” are controversial, and where I don’t know if norms such as “we should try hard to be correct about things” and “we should give a lot of weight to what we would have have thought was best if were impartial in regards to time/species/etc” are held by a majority, I am not consoled that you necessarily are.
You write in the comment-section that “Certainly in the case of ‘3 worlds collide’ and similar thought experiments I am inclined to have sympathy for the others’ values”. Would act to stop the baby-eaters from continuing to cause these vast quantities and intensities of suffering?
[…] his essay “Against moral advocacy”, Paul Christiano argues that trying to spread one’s values is often a zero-sum game that we […]
[…] his essay “Against moral advocacy”, Paul Christiano argues that trying to spread one’s values is often a zero-sum game that we […]
[…] it to AI safety research), while others have pushed back against this view. Others have argued against moral advocacy being desirable in […]