Open Data for Digital Pathology
Digital pathology is rapidly growing and also evolving, with many new innovations and capabilities appearing weekly. This is nowhere more evident than on the exhibit floor and in the presentations at Tri-Con, AACR, USCAP, or even last month’s ECDP 2019. New technologies, new products, new vendors are announced at each of these meetings, all aimed at delivering the promise of better and faster (cheaper is an open question) diagnosis and research based on sections of normal and diseased human tissue.
Rapid expansion of technologies and products inevitably results in new data types– images, annotations, analytics– all in different data formats. This is not necessarily bad– in a field undergoing rapid innovation, new data types often go hand-in-hand with innovation. However, this dynamic, changing data environment challenges the creation of standards that anticipate and support all new technologies and products. Until digital pathology becomes a completely solved and stable enterprise, data exchange between different applications and environments will be a constant, critical component of success in this domain. How do we enable data exchange in a rapidly evolving, rapidly growing market and domain like digital pathology?
Standards for WSI Data
The DICOM Working Group 26 has spent several years developing the DICOM Supplement 145 specification for WSI data [1]. The specification and the approach of this effort follow the successful development of the DICOM data standard that is heavily used in radiology. First released in 2010, DICOM Supplement 145 provides a specification for image data, metadata and annotations for WSI data. After several years, academic and commercial entities are working on implementations of this standard (e.g. [2,3] https://www.orthanc-server.com).
As has happened in radiology, it is likely that several implementations will appear from both academic and commercial entities. Although the appearance of these implementations will be a very important step for the community, the current DICOM WSI representations do not address many of the innovations in WSI data acquisition and analysis that have appeared in the last few years. These include, but aren’t limited to, multiplexed imaging [4–6] and single cell profiling [7]. These new technologies are used primarily in research applications, but their power and utility means they will likely become used at least as a companion clinical diagnostic in the near future. In addition, the explosion of machine learning (ML) tools using conventional and deep neural networks create models and analytic metadata that should be recorded alongside WSI data. As the data structures generated in WSI become richer and more complex, standardised data formats must evolve at least as rapidly, if not more so.
Open, Adaptable, Exchangeable Image Data Formats
Since 2002 the OME Consortium has built, released, and supported open file formats for imaging in life sciences and biomedical research. OME is an open consortium and includes both grant-funded academic labs and commercial partners like Glencoe Software. Our philosophy has always been to create specifications using open discussion, and then develop reference implementations in open source software, provide real examples of the format, and publish documentation that shows how to use and extend the format to support integration into new applications and imaging modalities. For example, OME and LOCI released the public specification of the OME-TIFF format for 3D fluorescence microscopy in 2005, but the open and adaptable nature of this format meant that it could evolve to support many different imaging modalities including several that weren’t even known at the time it was originally developed. For example in 2006 OME expanded the metadata structures in OME-TIFF to support high content screening in the academic and biopharmaceutical communities. In 2011 OME-TIFF was further extended to support fluorescence lifetime and other modalities that require up to seven different dimensions in a single file format.
In 2017, OME, with contributions from several members of the community and Glencoe Software’s Melissa Linkert, began work on a multi-resolution, tiled file format suitable for WSI data. Digital imaging of single tissue sections was evolving rapidly, but we assumed that multiplexed data would soon become routinely used. Moreover, it seemed clear that 3D pathology would soon emerge, where advances in 3D imaging in the life sciences would be applied to clinical specimens [8], so a WSI file format must anticipate support for these datasets. The resulting file specification was discussed and published online. Full implementations, with documentation and examples were built by OME and Glencoe Software and were released with Bio-Formats 6.0 in early 2019.
This extension to OME-TIFF includes support for multi-resolution, tiled, multi-dimensional data in WSI imaging. This means the same file format can be used for brightfield, fluorescence, 3D and other emerging imaging modalities. In all cases, OME-TIFF’s extensible metadata structures allow metadata with the sample and/or case, slide processing, image acquisition and analytics to be included alongside the binary data (the pixels) in a single file. The release of open source reference software implementations includes open source readers and writers. With these tools, any other software can integrate Bio-Formats and immediately read and write this new version of OME-TIFF. For example Pete Bankhead’s QuPath WSI analysis application [9] has integrated Bio-Formats to provide full support for this file format. Manual Stritt’s Orbit machine learning application has similar functionality. The key point: open source, documented implementations of data formats like OME-TIFF allow a rapid adoption and validation of open file formats which is an important capability for any data transport format.
Glencoe Software has also adopted this updated OME-TIFF by including Bio-Formats 6.0 in the software it delivers to its customers. This means that Glencoe Software’s customers can either continue to use the proprietary file formats recorded by their WSI scanners, or if desired, convert to OME-TIFF and access the files with Glencoe’s OMERO Plus and PathViewer, or use Bio-Formats to access the data from 3rd party software. This is true regardless of the type of WSI data they record– brightfield, fluorescence, multiplexed and others are all supported. The flexible annotation system built into OME-TIFF means that complex information related to case studies slide processing and analytic outputs all can be included in a single format. This kind of flexibility, often referred to as interoperability, is a key advantage of using open, supported reference implementations developed by and with the community.
Future WSI Formats and Implementations
OME and Glencoe Software have substantial experience with digital pathology partners and use cases from academia, biotech, pharma and clinical research. In all these domains, we have seen the need for solutions that, on the one hand, make data as open and accessible as possible, and on the other, are supported by a wide variety of analysis and processing tools. Just like many other rapidly innovating imaging domains, digital pathology requires a combination of defined format specifications, public example files, and open reference software implementations that provide the means for interoperability between innovative imaging tools and datasets. Our work has focussed on TIFF-based implementations because TIFF is an available, broadly supported open specification and implementation for imaging data. We have recently started broadening the range of containers formats we support. This opens the possibility of delivering open implementations of DICOM WSI for the community to test, validate and, if useful, adopt. We look forward to discussing and defining such implementations in collaboration with the wider WSI imaging community.
References
- Singh R, Chubb L, Pantanowitz L, Parwani A. Standardization in digital pathology: Supplement 145 of the DICOM standards. J Pathol Inform. 2011;2: 23. doi:10.4103/2153-3539.80719
- Jodogne S. The Orthanc Ecosystem for Medical Imaging. J Digit Imaging. 2018;31: 341–352. doi:10.1007/s10278-018-0082-y
- Herrmann MD, Clunie DA, Fedorov A, Doyle SW, Pieper S, Klepeis V, et al. Implementing the DICOM Standard for Digital Pathology. J Pathol Inform. 2018;9: 37. doi:10.4103/jpi.jpi_42_18
- Lin J-R, Izar B, Wang S, Yapp C, Mei S, Shah PM, et al. Highly multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional optical microscopes. Elife. 2018;7. doi:10.7554/eLife.31657
- Goltsev Y, Samusik N, Kennedy-Darling J, Bhate S, Hale M, Vazquez G, et al. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell. 2018;174: 968–981.e15. doi:10.1016/j.cell.2018.07.010
- Gerdes MJ, Sevinsky CJ, Sood A, Adak S, Bello MO, Bordwell A, et al. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc Natl Acad Sci U S A. 2013;110: 11982–11987. doi:10.1073/pnas.1300136110
- Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58: 610–620. doi:10.1016/j.molcel.2015.04.005
- Glaser AK, Reder NP, Chen Y, McCarty EF, Yin C, Wei L, et al. Light-sheet microscopy for slide-free non-destructive pathology of large clinical specimens. Nat Biomed Eng. 2017;1. doi:10.1038/s41551-017-0084
- Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017;7: 16878. doi:10.1038/s41598-017-17204-5
About the Authors:
Jason Swedlow1,2, Chris Allan1, Sebastien Besson2, Jean-Marie Burel2, Josh Moore2
1Glencoe Software, Seattle, WA, USA
2Open Microscopy Environment, University of Dundee, Dundee, UK