From a recent talk about the progress of HIE in CT through the HITE-CT initiative:
The avarage person in CT is seeing nine physicians on a regular basis. As family doctor, I try to limit that and to explain my patients that that is just really nine times the trouble you can get yourself into.
A typical problem when benchmarking a clinical system is establishing a ground truth. Let us assume a clinical support system that supports a physician in interpreting radiology images (e.g., by recognizing tumors). The only intuitive method we have to evaluate the performance of the system is to test it on a set of labeled images, or in other words, through asking questions we already know the answers to. However, creating this ground truth will require us to rely on an analysis through the very process we attempt to improve, namely the “manual” analysis by a physician.
The scenario can be frequently observed wherever we try to recognize patterns. Another example is research oriented extracting of knowledge from patient records, where we would attempt to recognize adverse drug effects or develop best practices. A note might be indicating the negative impact on the patients health, although it is not recognized as such be the medical expert that is the referee for the benchmark due complexity, illegibility or counterintuitive nature.
There are methods to damp the effect, such as increasing the number and competence of the referees or implementing a round of reconsideration of results (e.g., the physicians can be confronted another time with images that have been recorded as false positives during the automated tumor detection), but those methods are often expensive and time consuming or, in the worst case, just not available.
Therefore we have to keep in mind that when dealing with a highly complicated and often intuition driven field such as medicine we have to constantly account for possible human errors, and that there is always the possibility that we have our job well enough to outperform the quality of the human decision. Or to coin it less optimistically: sometimes even great results can become a problem.
Today the UConn CSE department was hosting a guest lecture by Dr. Vladimir Vapnik who was presenting Learning using privileged information (LUPI), an extension to his popular support vector machine (SVM) method. While the underlying Math is rather uninviting, the concept itself is stunningly convincing. It it based around the the idea that the training phase for a learning problem can be supplemented and improved with additional data from a privileged information space. Intuitively, this can be described as similar to a teacher giving a student subjective explanations on top of the text book examples.
The surprising point is to see that this information does not necessarily make any sense in the actual problem space. A provoking example Vapnik used during his talk was an experiment where poems describing the graphically representation of 8 and 5 were used to improve the pattern recognition of those digits. Although those informal descriptions appear to be nonsense when one looks at the graphically oriented problem, the LUPI extension of SVM is actually able to leverage them to improve the training phase of the recognition problem.
Details and several other examples related to biomedical informatics and statistics can be found in the related publication. I recommend at least skipping through it.