Guesswork, feedback, and impact

by paulfchristiano

The way we get most complicated things done might be described as trial and error: we have some model of how a plan will lead to our desired goal, we try and implement the plan and discover that some aspect of our model was wrong, and then we refine the model and try again.

For example, if you write a large program it will have bugs in it (unless you have written very many programs before). If you try to run a business, your initial plans will probably fail (though a simple business plan might stick around as many details of the business are tweaked). If you try to build a machine, it won’t work unless you have quite a bit of relevant experience (e.g. building a similar machine before).

Unless a plan’s success rests on very simple arguments—for example, comparisons with similar plans that have worked before—it is likely to get thwarted by some unanticipated detail. (If there are implicitly N things that could go wrong with a plan, most of which you may not have thought of, then each one needs to go wrong with probability around 1/N for the whole thing to hold together. That’s pretty confident, for complicated domains in which many things might go wrong.) However, if we can try a plan and implicitly ask Nature “What were we wrong about? How will we fail now?” then the situation is changed. We can determine where our model of the world is wrong,  patch that particular error and repeat. Even if our model was wrong in many places, and even if we can never hope to build a complete model, at least we can eventually get a model which is right in the relevant ways.

Unfortunately, if we want to have a positive impact ont he world, we almost never get to test all of the relevant aspects of our world model. I think it’s useful to split up plans into two parts:

  1. Trying to achieve some observable goals, where we can make many attempts and improve each time.
  2. Hoping that achieving these goals will lead to a positive impact.

The trouble is that step (2) is normally based on our assumptions about the world; because we can’t see all of the effects our actions have on the world (and even when we can see them in some sense, we normally can’t see their goodness) we don’t have an opportunity to refine those assumptions. I think this is particularly a problem for would-be “effective” altruists, who tend to be willing to accept unseen (but rationally supported) benefits and to be disdainful of traditional philanthropists’ insistence that they see the fruits of their work. (Someone who helps kids they can see, for example, is loading more and more of their objective into step 1, and reducing step 2 to a smaller and smaller set of assumptions about what visible changes actually correspond to improvements. But even this minimal set of assumptions is still often problematic!)

So I think the upshot is to choose plans for which the arguments supporting step (2) are as simple as possible. Arguments without many moving parts, particularly which are substantiated by a direct appeal to historical regularity, may hold up even if you never get to check them. Conversely, load as much of the difficult work as possible into step (1).

Good examples include plans with observable goals like “stop someone useful from dying,” “make society richer,” or “make people measurably smarter.” Although those changes ultimately lead to a positive impact in a rather complicated way, they have a priori a neutral expectation, and strong historical regularities seems to push the expected impacts pretty far towards the positive side. Bad examples include plans with endpoints like “raise awareness of X” or “reduce outputs of industry Y;” in these cases there is typically some relatively complicated model relating observed impacts to positive outcomes, and the components of this model are hard or impossible to verify.

There may be some natural cases, like averting the risk of catastrophe, for which step (2) necessarily bears quite a lot of weight. Those are hard problems to deal with. I think they probably tend to be under-addressed (at least by efforts which have any real chance of having a positive impact), and so may be worth working on. But it’s important to keep in mind, if you do choose to work on them, that many folks will by default assume (not unjustifiably) that you will fail to do any good.