Establishing a Ground Truth

A typical problem when benchmarking a clinical system is establishing a ground truth. Let us assume a clinical support system that supports a physician in interpreting radiology images (e.g., by recognizing tumors). The only intuitive method we have to evaluate the performance of the system is to test it on a set of labeled images, or in other words, through asking questions we already know the answers to. However, creating this ground truth will require us to rely on an analysis through the very process we attempt to improve, namely the “manual” analysis by a physician.

The scenario can be frequently observed wherever we try to recognize patterns. Another example is research oriented extracting of knowledge from patient records, where we would attempt to recognize adverse drug effects or develop best practices. A note might be indicating the negative impact on the patients health, although it is not recognized as such be the medical expert that is the referee for the benchmark due complexity, illegibility or counterintuitive nature.

There are methods to damp the effect, such as increasing the number and competence of the referees or implementing a round of reconsideration of results (e.g., the physicians can be confronted another time with images that have been recorded as false positives during the automated tumor detection), but those methods are often expensive and time consuming or, in the worst case, just not available.

Therefore we have to keep in mind that when dealing with a highly complicated and often intuition driven field such as medicine we have to constantly account for possible human errors, and that there is always the possibility that we have our job well enough to outperform the quality of the human decision. Or to coin it less optimistically: sometimes even great results can become a problem.

The Search for Evidence

An interesting but also discomforting thought that came up last during a discussion regarding Biomedical Informatics: Our very computational idea of evidence based medicine is with its maybe ten years of of presence still a fairly new concept to the field. For decades, the procedures applied to patients were largely based on experience, intuition and a somewhat vague and very encapsulated “it worked quite well last time”-approach rather than on globally proven facts.

We only just started to imagine and implement systems for collecting health care information and delivering it back to health care professionals in forms that empower them to optimal decision making. Keeping in mind how young this concept of Health Information Exchange is, we can only assume that there is certainly still room for massive improvement.