Maskery, S.M., Hu, H., Hooke, J., Shriver, C.D., Liebman, M.N., A Bayesian Derived Network of Breast Pathology Co-Occurrence
Abstract
In this paper we present the validation and verification of a machine-learning based Bayesian
network of breast pathology co-occurrence. The present/not present occurrences of 29 common
breast pathologies from 1631 pathology reports were used to build the network. All pathology
reports were developed by a single pathologist. The resulting network has 25 diagnosis nodes
interconnected by 40 arcs. Each arc represents a predicted co-occurrence or null co-occurrence.
Model verification involved assessing the robustness of the original network structure after
random exclusion of 25%, 50%, and 75% of the pathology report dataset. The structure of the
network appears stable as random removal of 75% of the records in the original dataset leaves
81% of the original network intact. Model validation was primarily assessed by review of the
breast pathology literature for each arc in the network. Almost all network identified cooccurrences
(95%) have been published in the breast pathology literature or were verified by
expert opinion. In conclusion, the Bayesian network of breast pathology co-occurrence
presented here is both robust with respect to incomplete data and validated by consistency with
the breast pathology literature and by expert opinion. Further, the ability to utilize a specific
pathology observation to predict multiple co-current pathologies enables exploration of
pathology co-occurrence patterns in an intuitive manner that may have broader application in
both the breast pathologist clinical community and the breast cancer research community.
Corresponding author: Susan Maskery, Windber Research Institute, 620 7th Street, Windber, PA 15963 s.maskery@wriwindber.org
Journal of Biomedical Informatics (2007), doi: 10.1016/j.jbi. 2007.12.005