How does evidence relate to theory: Bayesianism

Let’s assume the following:

  1. There is no way to be absolutely certain that any hypothesis is true.
  2. A scientific hypothesis is not confirmed by way of a straightforward induction nor falsified as the falsificationist suggests.
  3. Insofar as we know anything, it is always (i) relative to the information that we have and (ii) influenced by the incoming information.
  4. What scientists want to know is the probability of a hypothesis given (new) evidence.
  5. The best way to express the probability that a hypothesis is true given some evidence is through probability theory, specifically through the use of Bayes’ theorem.

0.1 Some preliminaries

The goal of this section is to introduce some key terms necessary for expressing Bayes’ theorem. To facilitate this, it is helpful to have a working example.

This information can be represented using a probability tree (see Figure 1).


SVG-Viewer needed.


Figure 1: Of contracting a rare disease

First, let’s define what are known as unconditional probabilities or prior probabilities. These are probability that do not depend upon (are not conditioned by) something else being the case. In our example, there are three prior probabilities.

  1. The probability that an individual has the VX disease. The probability is 1 99.
  2. The probability that an individual tests positive for VX, independent of whether or not s/he has VX. To determine this: (.01 1) + (.99 0) = .01 or 1%
  3. The probability that an individual tests negative for VX, independent of whether or not s/he has VX. To determine this: (.01 0) + (.99 1.0) = .99 or 99%

This information can be put in the language of hypothesis and evidence / data / observation. First, suppose we have the hypothesis that we have VX. Call this hypothesis h. The prior probability of h is the probability of h independent of (or prior to) any test for VX.

Definition 1 (prior probability of h) the probability of h before we take into consideration some evidence e. Let P(h) stand for “the probability of the hypothesis (h)” independent of evidence e.

Second, with this hypothesis in mind, we can characterize the results of VX as evidence. Call either a positive or negative test e. The prior probability of e is the probability of e independent of whether or not we have the VX disease or not.

Definition 2 (prior probability of e) the probability of e before we take into consideration some hypothesis (theory) h. Let P(e) stand for “the probability of an observation (e)” independent of a hypothesis h.

What we have then is two different unconditional probabilities: P(h) and P(e). Note, however, that incoming information influences the probability of the hypothesis and that scientists want to know the probability of a hypothesis given some piece of evidence. In other words, given some incoming evidence e, we should adjust the probability of the h in light of P(e).

Definition 3 (conditional probability) The conditional probability of A given B is the probability of A on the condition that B.

There are two important conditional probabilities in our example.

  1. the probability of an observation, evidence, or data e on the condition that some hypothesis h is true (that is, the likelihood of an observation given a hypothesis).

Definition 4 (posterior probability of e - P(eh)) on the assumption (condition) that the hypothesis h is true, the probability of the evidence e. That is, the likelihood of some observation e given (in light of) the hypothesis h. Let P(eh) stands for the probability of e assuming h is true.

  1. the probability of a hypothesis h on the condition that some observation e is true (that is, the likelihood of a hypothesis given an observation).

Definition 5 (posterior probability of h - P(he)) on the assumption (condition) that some observation e is the case, the probability of the hypothesis (h). That is, the likelihood of the hypothesis given (in light of) the evidence e. Let P(he) stand for the probability of h given e.

The posterior probability of h given e, that is P(he) is what we want to know. Bayes theorem tells us how to calculate this information.

0.2 Bayes’ Theorem

At the core of the Bayesian approach to science is Bayes’ theorem. What is the theorem? Let’s consider three different, increasingly precise articulations of Bayes’ theorem.

Definition 6 (Super simple Bayes’ theorem) Belief (hypothesis) + new evidence = new and improved belief (hypothesis)

Definition 7 (Bayes’ theorem in English) The probability of a hypothesis given some new evidence is equal to the probability of the evidence given the truth of the hypothesis times the probability of the hypothesis independent of the evidence divided by the probability of the evidence independent of the hypothesis.

Definition 8 (Bayes theorem)

  1. P(he) = P(eh)P(h) P(e)
  2. P(he) = P(eh)P(h) P(eh)P(h)+P(enoth)P(noth)

0.3 Example uses of Bayes’ theorem

In what follows, some step by step examples are provided to show how Bayes’ theorem can be used to determine the posterior probability of some hypothesis given some evidence. In using Bayes’ theorem, the following step-by-step method will be followed:

Example 1 (Cancer screening)

SVG-Viewer needed.

Now take John. What is his probability of having VX P(h)? In looking at the probability tree, we would say it is 1%. Now suppose John is tested for VX and he tests positive. What now is his probability of having VX given that he has tested positive P(he)?

The answer, in this case, is obvious, but it can be computed using Bayes’ theorem. First, let’s write out Bayes’ theorem:

P(V X+) = P(+V X)P(V X) P(+)

Next, let’s input the values from our probability tree beginning with the prior probability that John had VX: P(V X) = P(.01)

P(V X+) = P(+V X)P(.01) P(+)

Next, input the probability that he tests positive given that he has VX P(+V X) = P(1)

P(V X+) = P(1)P(.01) P(+)

Next, we input the probability that he tests positive prior to determining whether he does or does not have VX. For this, we add the probability that he has VX to the probability that he tests positive (.01*1) to the probability that he does not have VX and tests positive (.99*0): P((.01 1) + (.99 0)) = .01

P(V X+) = P(1)P(.01) P(.01)

Finally, we do the calculations and, as expected, we find that the probability that John has VX given a positive test is 100%.

P(V X+) = P(1)

What we see from the above is that Bayes theorem can be used to determine the probability of the hypothesis that someone has the VX-disease P(he) given

  1. the probability that someone has the disease P(h)
  2. the probability that someone will test positive for the disease independent of any knowledge about whether someone in particular has the disease P(e)
  3. the probability that someone has the disease given that they tested positive P(eh)

What it does is allow us to update the probability of a hypothesis given new information. Let’s consider a more complicated and realistic example. We cannot expect every test to be 100% accurate.

Example 2 (Cancer screening)

Suppose a woman between 40-50 goes to have a mammogram. Her doctor meets with her to tell her that her test has come back positive. Since the test is not 100 percent reliable, she does not know if, in fact, she has cancer. It could be a false positive.

What is the probability that she actually has breast cancer.

SVG-Viewer needed.

We now can use the diagram and Bayes’ theorem to determine the probability that an individual has cancer given a positive test. First, we determine the probability that an individual who tests positive actually has cancer P(+cancer) = P(.90). This is multiplied by the probability that an individual has cancer independent of whether they tests positive or negative P(cancer) = P(.01).

P(cancer+) = P(+cancer)(P(cancer) P(+)

P(cancer+) = P(.90)(P(.01) P(+)

Next, we determine the likelihood that someone will test positive for cancer independent of whether they have cancer or not: P(+) = (.99 .10) + (.01 .90).

P(cancer+) = P(.90)P(.01) P(.99 .10) + (.01 .90)

Then using Bayes’ theorem we can calculate the probability that one has cancer given a positive test.

P(cancer+) = P(.009) P(.099) + (.009)

P(cancer+) = P(.009) P(.108)

P(cancer+) = P(.009) P(.108)

P(cancer+) = 8.3%

Discussion 1 Suppose you are working for a large corporation that drug tests employees. To keep things simple, let’s suppose that the drug is cocaine. If any individual fails the drug test, they are immediately fired and the employer informs the police.

Now suppose someone tests positive for cocaine. What is the probability that they are on cocaine given the positive test?

0.4 Bayesianism and Science: Some Key Points

Bayes’ theorem has a wide variety of applications. What role does it play in the philosophy of science?

Note 1 (Bayes’ theorem and the confirmation of a scientific theory.)

Bayes theorem can be used to formulate a theory of incremental confirmation. Namely, we can say that some evidence e confirms h if and only if P(he) > P(h). That is, if the posterior probability of a hypothesis is greater than the prior probability of a hypothesis, then the hypothesis has been confirmed by the evidence.

This does not mean that the hypothesis is true, likely, or that we should believe the theory when P(he) > P(h). Notice that in the case of cancer screening, even though the probability that you have cancer given a positive test is greater than the probability that you have cancer independent of a test (P(8.3%) > P(1%)), it is still more likely that you don’t have cancer than you do have it.

Note 2 (Bayes’ theorem largely corresponds to how we think observation and evidence influences the probability of a scientific theory)

First, if e is likely whether h is the case or not, e won’t (strongly) support a hypothesis h. That is, if e is true under a variety of competing hypotheses, then e won’t increase the probability of h being true.

For example, suppose a hypothesis h1 and two competing hypotheses h2,h3. If under each hypothesis on the condition that the hypothesis is true, then it is the case that the sky is blue (that is, P(eh) = 100%), then e does not change the likelihood of the h1. In short, evidence e that supports a variety of competing hypotheses equally or is true whether h is the case or not won’t be strongly supportive (see Figure 2).


SVG-Viewer needed.


Figure 2: Cancer tree

Second, if e is extremely unlikely unless h is true and it turns out that e is the case, then e will significantly increase the likelihood of h. The idea here is that if a hypothesis h makes a novel (or unusual) prediction – one that is not likely given other hypotheses –, and this prediction is confirmed, then the probability of the h significantly increases.


SVG-Viewer needed.


Note 3 (Bayes’ theorem and ad hoc modifications.)

Bayesianism is capable of accounting for problems associated with ad hoc modifications to theories. For suppose h runs into conflict with some observation e. A proponent of theory might, however, modify the theory in an ad hoc way in order to preserve the theory.

However, the Bayesian can account for why ad hoc modifications to theories fail to be better than the original theory simply because they are less susceptible to being falsified.

  1. Consider the theory P(h). Assign it some probability.
  2. Next, imagine that the theory runs into conflict with some observation e. The probability is then modified P(he).
  3. The ad hoc modification is then added to P(he) so that it no longer conflicts with e. That is, instead of P(h) we have P(h&a) where a is the ad hoc modification.
  4. According to the Bayesian, there are at least two reasons why P(h&a) < P(h):
    1. the probability of a theory plus an extra assumption (the ad hoc modification) is always greater than the theory itself. For example, the probability of say John is going to the store and will buy some milk is less than the probability he is going to the store.
    2. Many ad hoc modifications are implausible and so the prior probability of these modifications being true is usually low.

0.5 Problems with Bayesianism

Objection 1 (How do we know the prior probability of the hypothesis?)

The use of Bayes’ theorem requires that we know the (i) prior probability of the hypothesis, (ii) the prior probability of the evidence, and (iii) the conditional probability of the evidence given the hypothesis. How is the prior probability determined before applying Bayes’ theorem? For example:

  1. How do we determine the prior probability of having cancer?
  2. How do we determine the prior probability of drug use?
  3. How do we determine the prior probability of some scientific theory?
  4. suppose we want to know the probability that an athlete is on performance-enhancing drugs (PEDs) given a positive test for a PED. The use of Bayes’ theorem seems to require that we already know how many individuals use PEDs. But how do we know this kind of information without testing individuals and how do we know how reliable our test is without knowing how many individuals are on PEDs?

Response 1 (All hypotheses have equal prior probability) One response is to take an objectivist position. Namely, all hypotheses are equal until the evidence makes them more or less probable. Call this the objectivist position.

Imagine two boxers: Ryan and Frank. To determine the likelihood of one boxer beating the other, we begin by simply supposing that each has an equal chance. So the likelihood that Ryan will beat Frank is 50% and Frank beating Ryan is 50%. From there, we look at features in the world, using Bayes’ theorem, to adjust the likelihoods, e.g. injury, training, etc.

Objection 2 (Probability of scientific hypotheses) The objectivist approach might work in simple cases where there are two options, but it cannot work in science where there are potentially an infinite number of hypotheses. For if there are an infinite number of hypotheses, then probability of each is 0 and Bayes’ theorem won’t work if the prior probability of a hypothesis is 0 (no matter the evidence).

Response 2 (The prior probability of a hypothesis is determined subjectively) In contrast to an objectivist view of probability, let’s consider the subjectivist view. The subjectivist accepts Bayes’ theorem and interprets the prior probability that h in terms of the degree of confidence that people have in h occurring.

There are, however, some important caveats to this claim:

So, in the case of two boxers: Ryan and Frank, the probability that Ryan will beat Frank is whatever individuals would be willing to bet. From there, we look at features in the world, using Bayes’ theorem, to adjust the likelihoods, e.g. injury, training, etc.

Objection 3 (This makes probability rest on subjective considerations.) The problem seems to be that individuals might have different prior probabilities for the same probability. For example, I might say that the prior probability of astrology being true is 99% and the prior probability of it not being true is 1%. Where you might say that the prior probability of astrology being true is 1% while it being false is 99%.

Response 3 (Initial subjectivity is fine, probabilities converge.) Consider a hypothesis that you would not take to be very probable. Let’s suppose that I strongly believe that a certain man in an alley has psychic powers. I think the hypothesis that he can predict the future is 99.9% probable and you think it is closer to .1%. With this hypothesis in hand, we can subject him to various tests and this incoming information will (hopefully) lead us to assign the same probability to his ability to predict the future.

For example, suppose you and I go to test the hypothesis. You ask him to guess what number you are thinking of from 1-100, and he answers correctly, then 1-10,000 and he answers correctly. You are slightly more convinced. There are alternative explanations, but these seem less and less likely the more he guesses correctly. Finally, you ask him to guess tomorrow’s lottery numbers. He guesses correctly. This prediction is staggering and with each unbelievable predication, your re-use of Bayes theorem would lead you to change the probability of the man having psychic powers from .1% to 99.9%.

In short, the initial probability of a hypothesis is unimportant. What is important is that the hypothesis is allowed to adjust to incoming information.