**Bayes’ Theorem**

The solution to this problem is Bayes’ theorem. The theorem’s conclusion takes into account our background knowledge, *and* the evidence we have for the hypothesis we make. Where as before we were jumping to the conclusion that the probability of a miracle is low based on our prior knowledge of how the world works, we will now take into account the actual evidence of the miracle. Quantitatively speaking, the theorem makes us express our premises in terms of degrees as opposed to absolutes by forcing us to numerically label them as probabilities. This is important because most claims in life, especially about historical events, can only be discussed in terms of probabilities, with us often saying things like “most likely” or “more likely”, for example. So, those that object to doing math in history, think again, because everyday we already speak in terms of probabilities. Moreover, the theorem forces you to think of alternative hypothesis, reducing confirmation bias. This formalized and systematic approach to viewing your hypothesis allows for a clarity unrivaled by other methods. I offer two quotes below that explain its history and importance.

In simple terms, Bayes’s Theorem is a logical formula that deals with cases of empirical ambiguity, calculating how confident we can be in any particular conclusion, given what we know at the time. The theorem was discovered in the late eighteenth century and has since been formally proved, mathematically and logically, so we now know its conclusions are always necessarily true if its premises are true (probabilities). [Richard Carrier]

Bayes’s theorem is at the heart of everything from genetics to Google, from health insurance to hedge funds. It is a central relationship for thinking concretely about uncertainty, and–given quantitative data, which is sadly not always a given–for using mathematics as a tool for thinking clearly about the world. [Chris Wiggins, Scientific American]

**P(h|b) * P(e|h,b) / [ ( P(h|b) * P(e|h,b) ) + P(~h|b)**

** P(e |~h,b)]*

**P(h1|b) * P(e|h1,b) / [ [P(h1|b) * P(e|h1,b)] + [P(h2|b)**

** P(e|h2,b)] + [P(h3|b)*

** P(e|h3,b)] + …]*

*Epistemic probability*

**or**

*Posterior probability*

*Expected probability or Explanatory probability*

*Prior probability or Intrinsic probability*

**Prior Probability**

*all known*information about your hypothesis. This leads us to the concept of reference classes. A reference class can be thought of as a category of claims that all address a similar scenario. This information can be used (referenced) to assist us in finding how typical our explanation is; in other words, it estimates our priors. As an example from “Proving History” by Richard Carrier, a hypothesis you may promote to explain the evidence in the Gospels, empty tomb etc., is that Jesus Christ was raised from the dead by a supernatural agent. How do you derive a prior probability based on your background knowledge? Well, what you can do is look for similar scenarios that were believed to have occurred in the past. For instance, Romulus, Asclepius, Zalmoxis, Inanna, Lazarus, many Saints in Matthew or the Moabite of 2 Kings have all been purported to have been raised from the dead by a supernatural agent. So our reference class is all persons purported to be raised from the dead by a supernatural agent. We have at least ten of them from antiquity and probably more. Since prior probability is only based on background knowledge and not conditioned on the evidence, we can assume that each one of the persons claimed to have been raised by a supernatural agent is equivalent. That is, there’s no more reason to believe one story over the other since all equally contradict our background knowledge. If that is the case, then classical probability theory says we can divide the sample space into equivalent pieces such that they sum to one. So the prior probability would be 1/10 or 0.1 that Jesus Christ rose from the dead by a supernatural agent.

*b*and all are viable hypotheses that can explain the evidence. If incorporated, these can have the effect of lowering the posterior probability, but creating reference classes for multiple hypotheses can be challenging albeit the principle is the same as in the single case.

**Epistemic Probabilities**

*beliefs*that an event happened is true versus someone making it up (or being mistaken), and physical probabilities are probabilities (relative frequencies) of events occurring. An example might be what’s the probability of someone at random having a myocardial bridge in their heart, which is pretty small, incidence of occurrence being 3%. But the probability that you believe someone has a myocardial bridge can be quite high since it’s based or conditioned on the evidence at hand, say a recent angiogram. Moreover, epistemic probabilities often measure events that occur just once, like historical claims, versus physical probabilities which are often statistical averages of repeated phenomena. So you can’t empirically derive epistemic probabilities by repeating an experiment – say by taking the long term average of flipping a coin resulting in a relative frequency or probability of 0.5 – instead you must rely on thought experiments by deriving a reference class. The former method is known as the frequentist approach, while the latter method is known as the Bayesian approach. It’s best to think of these methods as different approaches designed for different kinds of problems rather than as rivals. Please see quote below, emphasizing the fidelity of the Bayesian method. The next post will discuss the consequent probability and eventually compute an epistemic probability of our hypothesis; we’ll stick with the miraculous hypothesis that Jesus was raised from the dead by a supernatural agent in order to explain a wide range of claims found in the Gospels and Epistles.

The specification of the prior is often the most subjective aspect of Bayesian probability theory, and it is one of the reasons statisticians held Bayesian inference in contempt. But closer examination of traditional statistical methods reveals that they all have their hidden assumptions and tricks built into them. Indeed, one of the advantages of Bayesian probability theory is that one’s assumptions are made up front, and any element of subjectivity in the reasoning process is directly exposed. [ Olshausen]