Bayes' theorem is simply a long or holistic way of looking at the world, one which is more in keeping with reality than the competing frequentist approach. A Bayesian (a person who subscribes to the logic of Bayes' Theorem) looks at the totality of the data, whereas a frequentist is concerned with just a specific slice of the data such as a test or a discrete dataset. Frequentist hypothesis testing is where we get P-values from. Frequentists are concerned with

*just the data from the current study*. Bayesians are concerned with the totality of the data, and they do meta-analyses, combining data from as many sources as they can. (But alas they are still reluctant frequentists, because they insist on combining only frequentist datasets, and shun attempts to incorporate more amorphous data such as "what is the likelihood of something like this based on common sense?")

Consider a trial of orange juice (OJ) for the treatment of sepsis. Suppose that 300 patients are enrolled and orange juice reduces sepsis mortality from 50% to 20% with P<0.001. The frequentist says "if the null hypothesis is true and there is no effect of orange juice in sepsis, the probability of finding a difference as great or greater than what was found between orange juice and placebo is less than 0.001; thus we reject the null hypothesis." The frequentist, on the basis of this trial, believes that orange juice is a thaumaturgical cure for sepsis. But the frequentist is wrong.

The Bayesian steps back from the trees and observes the entire forest. S/he has several other perspectives on these data:

- There are thousands of trials going on at any given time. By chance alone, one of them is going to show an effect that is large and statistically improbable (P<0.001) but is nonetheless due to chance alone, a roll of the dice. When everybody's gambling in Vegas, somebody's going to win.
- It is highly unlikely that
*anything*improves sepsis mortality by 30%, this would be unprecedented - It is highly unlikely that
*orange juice*would have an effect on sepsis mortality, there is no biological plausability for this - There is little precedent for
*any*therapy ofkind improving mortality by 30%*any**in any disease*

With this very low "prior probability" (pre-test probability), the Bayesian can incorporate other data such as the frequentist data from a clinical trial or data from other observations into his worldview and come up with a "posterior probability" of the efficacy of orange juice for sepsis. There are ways to do this formally, but basically Bayesian thinkers will update their probability estimates iteratively as evidence accumulates. Suppose a Bayesian has a prior probability that orange juice is nonsensical and will have no effect in sepsis, but then several laboratory studies identify an anti-inflammatory molecule in orange juice that affects cytokines, and in rat models, orange juice modulates the immune response to sepsis. Then a few small trials show that in rat models of peritonitis, orange juice saves rat lives. The good Bayesian would say "well, it sounded far fetched, but these data suggest there may be something to this orange juice stuff." The obvious problem, as you may have identified already is that this is all very difficult to quantify formally, it's very nebulous and seems like guesswork. It is indeed very thorny but worth paying attention to nonetheless - low priors make you circumspect about unexpected positive results, and high priors can make you rationally dismiss unexpected negative results.

Now we segue to clinical medicine. Like the orange juice trial was the test of a scientific hypothesis, medical laboratory tests are tests of clinical hypotheses. And like the orange juice trial, if the laboratory tests are considered in isolation (the trial was positive, therefore orange juice works; the XYZ disease test was positive therefore disease XYZ is present.), without consideration of prior probabilities, egregious errors are liable to be made.

Suppose that we are ordering a pregnancy test that has a high sensitivity and specificity (basic knowledge of these is presumed in this discussion). Suppose that the test is positive. The patient is pregnant, right?

Well, that's how the frequentist approaches the problem. Test positive, pregnant, test negative, not pregnant. But the Bayesian is not satisfied. Firstly, what if I told you this person is a male. Is he pregnant? Heavens, no! Men don't get pregnant, the prior probability of pregnancy is zero, he's not pregnant. (He probably has testicular cancer.) What if I told you that the person was a woman but that she has had a hysterectomy, is she pregnant? What if the test is negative, but I told you that she has missed two menstrual periods and had multiple sexual encounters in the last two months? The posterior probability of disease depends not only on the test result, but also on the prior probability of disease before we even consider the test result.

And so it is with so many tests in medicine - exercise treadmill tests, V/Q scans, d-dimers, "screening CT scans", executive physicals, every test we do must be viewed in the context of the prior probability. This is also why, when we get a potassium level back in a stable patient without renal failure and the level is 8.0 and there are no EKG changes, we repeat the test. Because we don't believe it, the prior probability is low, so it must be a false positive, drawn from a vein downstream from the potassium replacement infusion, or a hemolyzed sample or some other error or anomaly.

To determine how prior probability influences the posterior probability after incorporating test results, please play with the calculator on the right sidebar of the blog. Note that if prior probability is 50%, the posterior probabilities are in fact the sensitivity and 1-specificity of the test. So, if the sensitivity and specificity are 90%, the posterior probability of disease is 90% with a positive test and 10% with a negative test. (Note that in the calculator instead of positive predictive value and negative predictive value, it reports the posterior probabilities which are the PPV and 1-NPV.)

On the other hand, if the prior probability is 10%, the posterior probabilities are 50% if the test is positive and 1% if the test is negative. Likewise, if the prior probability is 90%, the posterior probabilities are 99% for a positive test and 50% for a negative test. Playing with the calculator is very instructive. You will learn that if the prior probability of disease is low, a negative test is helpful; conversely if the prior probability of disease is high, a positive test is helpful. At extremes of prior probability, testing is not useful. You already know that the person more than likely does or does not have the disease, and the test can be uninformative in terms of pushing you across a decision threshold.

If you want to do Bayes theorem on the fly without my calculator, you have to convert priors to odds ratios and sensitivities and specificities to likelihood ratios - then the product of the prior odds and the likelihood ratio equals the posterior odds. This can be easily done, but I won't belabor it here - you can use the calculator. [If you want to do it, here's an example: Prior probability of pulmonary embolism (PE) thought to be low, let's say 10%. Odds of PE is thus 1/9 ("10 total odds, 1 odds it is PE, 9 odds it is not PE"). Let's say the sensitivity and specificity of our test are both 90%. A likelihood ratio (LR) for a positive test is sensitivity/1-specificity [9/1], that for a negative test is 1-sensitivity/specificity [1/9]. If the test is positive, multiply prior odds and positive LR and this gives you the posterior odds ratio. So, 1/9 * 9/1 = 9/9 posterior odds - converted to probability it's 50%, as in the examples in the preceding paragraph. If the test is negative, 1/9 * 1/9 = 1/81. Total odds 82, 1 odds there is PE, 81 odds there is not, probability of PE is 1/82=1.2% Confirm it with the calculator.]

A good test is one with an associated LR of close to 10 or higher, i.e., sensitivity and specificity both on the order of 90%. A LR of about half that corresponds to a sensitivity and specificity of 80%. This test is useful, but its utility is becoming marginal. Below a LR of about 4, the test is almost worthless.

So remember, if you order a troponin in a 32-year-old woman with altered mental status (why would you do that anyway?) and it's positive, it's more likely that she has sepsis than she has a myocardial infarction. If you order a d-dimer in a patient with a low modified Wells score and it's negative, you're done, there is no PE. If it's positive, it tells you nothing, you need to pursue things further. (That's right, the only useful d-dimer result is a negative one.) If you order a BNP level in a patient with swollen legs, orthopnea, JVD and Kerley B lines on chest X-ray and it's negative, it is still likely that the patient has congestive heart failure. You cannot use testing in isolation for diagnosis, you will get burned time and again if you do this. You need a history (and physical) and an exposure narrative for the patient so you can begin to order the diagnostic possibilities on the basis of their prior probabilities. Failure to do this is feloniously widespread, and leads to countless errors in diagnosis and therapeutics.

The test results modify the probability of what we already think (the prior probability), they do not become the answer to the question in isolation. This is the essence of Bayes' Theorem - thinking prior to testing.

Here is another real world analogy for sensitivity and specificity and prior probability. When we are afield hunting varmints (groundhogs) at long range (out to 1500 yards) we search for them using binoculars. The probability of finding them in any area of ground at any given time is low. The novice, not recognizing this, is apt to call anything of the approximate shape, size, and color a "possible groundhog." The expert knows that most of these things will be rocks, dirt clumps, cow patties, sticks and the like. S/he knows to wait to see a very specific sign: movement. Or to use a spotting scope for a closer look.

ReplyDeleteThat was a great explanation! Its the first time I enjoy a text on statistics. Thank you!

ReplyDeletePS: waiting for more ;)

suggestions?

ReplyDeleteSure :)

ReplyDeleteSome real misleading statistics examples or papers would be nice.

Interpreting P values and CI

When statistical significance can be misleading.

Size effects, bias in statistics...