Improving patient care using clinical guidelines and judgement

Charles Manski 27 October 2017



Guidance to clinicians on patient care has increasingly become institutionalised through clinical practice guidelines (CPGs). Dictionaries define a guideline as a suggestion for behaviour, but clinicians have strong incentives to comply with these guidelines when they are issued, making adherence to them almost compulsory. A patient's health insurance plan may require adherence as a condition for reimbursement of the cost of treatment. Adherence may also be used as evidence of due diligence to defend a malpractice claim.

The medical literature contains many commentaries exhorting clinicians to adhere to guidelines. They argue that CPG developers have a better knowledge of treatment response than clinicians. As Institute of Medicine (2011, p.26)) states: "Trustworthy CPGs have the potential to reduce inappropriate practice variation."

Statements like this demonstrate the widespread belief that adherence to guidelines is socially preferable to decentralised clinical decision-making. Yet there is no welfare analysis that supports this belief. There are two reasons why patient care adhering to guidelines may differ from the care that clinicians provide:

  • Guideline developers may differ from clinicians in their ability to predict how decisions affect patient outcomes; or
  • Guideline developers and clinicians may differ in how they evaluate patient outcomes.

Welfare comparison requires consideration of both factors. In recent work (Manski 2017), I consider how limited ability to assess patient risk of illness, and to predict treatment response, may affect the welfare that adherence to guidelines or decentralised clinical practice achieve.

Optimal personalised care assuming rational expectations

To provide a baseline, I consider an idealised setting studied by medical economists such as Phelps and Mushlin (1988). These studies assume that a clinician makes accurate probabilistic risk assessments and predictions of treatment response conditional on all observed patient covariates. That is, they have rational expectations. The studies assume that the objective is to maximise a patient's expected utility.

In this setting, analysis of optimal personalised care shows that adherence to a CPG cannot outperform decentralised practice, and may perform less well. If a CPG conditions its recommendations on all the patient covariates that clinicians observe, it can do no better than reproduce clinical decisions. If the CPG makes recommendations conditional on a subset of the clinically observable covariates, as is typically the case, adhering to the CPG may yield inferior welfare because the guideline does not personalise patient care. Thus, if clinicians have rational expectations, there is no informational argument for adhering to CPGs.

The inferiority of adhering to CPGs holds because the problem of optimising care has a simple solution. Patients should be divided into groups having the same observed covariates. All patients in a group should be given the care that yields the highest within-group expected utility. Maximum expected utility increases as more patient covariates are observed.

Treatment with imperfect clinical judgment

If it were reasonable to suppose that clinicians had rational expectations, there would be no utilitarian argument to develop CPGs. Empirical psychological research, however, has concluded that evidence-based predictions consistently outperform clinical judgement, even when clinical judgment uses additional covariates as predictors.

An influential review article by Dawes et al. (1989, p.1668) distinguished statistical prediction and clinical judgment:

"In the clinical method the decision-maker combines or processes information in her or her head. In the actuarial or statistical method the human judge is eliminated and conclusions rest solely on empirically established relations between data and the condition or event of interest."

Comparing the two, even when a clinician observes patient covariates not utilised in available statistical prediction, they cautioned against use of clinical judgment to predict disease risk or treatment response (p.1670):

"Might the clinician attain superiority if given an informational edge? ... The research addressing this question has yielded consistent results … Even when given an information edge, the clinical judge still fails to surpass the actuarial method; in fact, access to additional information often does nothing to close the gap between the two methods."

Psychological research challenged the realism of assuming clinicians have rational expectations, but it did not per se imply that adherence to CPGs would yield greater welfare than decision-making using clinical judgment. One issue has been that the psychological literature has not addressed all welfare-relevant aspects of clinical decisions. Psychologists have studied the accuracy of risk assessments made by statistical predictors and by clinicians, but they have not done similar studies of the accuracy of evaluations of patient preferences over health outcomes. Also psychological research has seldom examined the accuracy of probabilistic risk assessments. It has been more common to assess the accuracy of point predictions. Study of the logical relationship between probabilistic and point predictions shows that data on the latter at most yields wide bounds on the former.

Given these and other issues, we cannot conclude that imperfect clinical judgment makes adherence to CPGs superior to decentralised decision-making. The findings of psychological research only imply that welfare comparison is a delicate choice between alternative second-best systems for patient care. Adherence to CPGs may be inferior to the extent that CPGs condition on fewer patient covariates than do clinicians, but it may be superior to the extent that imperfect clinical judgment yields sub-optimal decisions. the precise trade-off depends on the context.

Questionable methodological practices in evidence-based medicine

The psychological literature has questioned the judgment of clinicians, but it has not questioned the accuracy of the predictions used in evidence-based guideline development. Predictions are evidence-based, but this does not mean that they use evidence effectively. Questionable methodological practices have long afflicted research on health outcomes, and may have affected guideline development too. This further complicates a comparison of adherence to guidelines to decentralised practice.

One questionable practice would be the extrapolation of findings from randomised trials to clinical decisions. Guideline developers use trial data to predict treatment response whenever this data is available. Trials are appealing because, given sufficient sample size and complete observation of outcomes, they deliver credible findings about treatment response in the study population. Extrapolating from these findings, however, can be difficult. Wishful extrapolation commonly assumes that the treatment response that would occur in practice in the same as in trials. This may not be true. Study populations commonly differ from patient populations. Experimental treatments differ from treatments used in practice. The surrogate outcomes measured in trials differ from outcomes of health interest.

Using hypothesis testing to compare treatments is also questionable. A common procedure when comparing two treatments is to view one as the status quo and the other as an innovation. The usual null hypothesis would be that that the innovation was no better than the status quo, and the alternative would be that the innovation was better. If the null hypothesis is not rejected, guidelines recommend that the status quo is used in practice. If the null is rejected, the innovation becomes the treatment of choice. The convention has been to fix the probability of rejecting the null hypothesis when it is correct (Type I error) and choose sample size to fix the probability of rejecting the alternative hypothesis when it is correct (Type II error).

Manski and Tetenov (2016) observed that hypothesis testing may yield unsatisfactory results for clinical decisions for several reasons. These include:

  • Use of conventional error probabilities: It has been standard to fix the probability of Type I error at 5% and of Type II error at 10-20%, but the theory of hypothesis testing gives no rationale for using these error probabilities. There is no reason why a clinician concerned with patient welfare should make treatment choices that have a much greater probability of Type II than Type I error.
  • Inattention to magnitudes of losses when errors occur: A clinician should care about more than error probabilities. He or she should care about the magnitudes of the losses to patient welfare should errors occur. A given error probability should be less acceptable when the welfare difference between treatments is larger, but the theory of hypothesis testing would not take this into account.

Limitation to settings with two treatments: A clinician often chooses among several treatments, and many clinical trials compare more than two treatments. Yet the standard theory of hypothesis testing only contemplates choice between two treatments. Statisticians have struggled to extend it to deal with a comparison of multiple treatments.

Doing better

Evidence-based research can inform patient care more effectively than it does at present. Studies should quantify how identification problems and statistical imprecision jointly affect the feasibility of making credible predictions of health outcomes. Identification is usually the dominant problem.

Recognising that knowledge of treatment response is incomplete, I recommend formal consideration of patient care as a problem of decision-making under uncertainty. There is no optimal way to make decisions under uncertainty, but there are reasonable ways with well-understood welfare properties. These include maximisation of subjective expected welfare, the maximin criterion, and the minimax-regret criterion.

There is precedent for verbal recognition of uncertainty in the literature on medical decision-making. For example, Institute of Medicine (2011 p.33) called attention to the assertion by the Evidence-Based Medicine Working Group:

"[C]linicians must accept uncertainty and the notion that clinical decisions are often made with scant knowledge of their true impact."

Verbal recognition of uncertainty, however, has not led guideline developers to examine patient care formally as a problem of decision-making under uncertainty. I find this surprising. Medical research makes much use of biological science, technology, and quantitative statistical methods. Why then should CPG development acknowledge uncertainty only verbally? Formal analysis of patient care under uncertainty has much to contribute to guideline development and to decision-making by clinicians.


Dawes, R, R Faust, and P Meehl (1989), "Clinical Versus Actuarial Judgment," Science, 243: 1668-1674.

Institute of Medicine (2011), Clinical Practice Guidelines We Can Trust, Washington, DC: National Academies Press.

Manski, C (2017), "Improving Clinical Guidelines and Decisions under Uncertainty," National Bureau of Economic Research Working Paper No 23915.

Manski, C and A Tetenov (2016), " Sufficient Trial Size to Inform Clinical Practice," Proceedings of the National Academy of Sciences, 113: 10518-10523.

Phelps, C and A Mushlin (1988), "Focusing Technology Assessment using Medical Decision Theory," Medical Decision Making, 8: 279-289.



Topics:  Health economics

Tags:  clinical judgement, welfare, guidelines, decision-making, decision-making under uncertainty, medicine, healthcare

Board of Trustees Professor in Economics, Northwestern University