Criteria for the translation of radiomics into clinically useful tests

Criteria for the translation of radiomics into clinically useful tests

Play all audios:

Loading...

ABSTRACT Computer-extracted tumour characteristics have been incorporated into medical imaging computer-aided diagnosis (CAD) algorithms for decades. With the advent of radiomics, an


extension of CAD involving high-throughput computer-extracted quantitative characterization of healthy or pathological structures and processes as captured by medical imaging, interest in


such computer-extracted measurements has increased substantially. However, despite the thousands of radiomic studies, the number of settings in which radiomics has been successfully


translated into a clinically useful tool or has obtained FDA clearance is comparatively small. This relative dearth might be attributable to factors such as the varying imaging and radiomic


feature extraction protocols used from study to study, the numerous potential pitfalls in the analysis of radiomic data, and the lack of studies showing that acting upon a radiomic-based


tool leads to a favourable benefit–risk balance for the patient. Several guidelines on specific aspects of radiomic data acquisition and analysis are already available, although a similar


roadmap for the overall process of translating radiomics into tools that can be used in clinical care is needed. Herein, we provide 16 criteria for the effective execution of this process in


the hopes that they will guide the development of more clinically useful radiomic tests in the future. KEY POINTS * Despite tens of thousands of radiomic studies, the number of settings in


which radiomics is used to guide clinical decision-making is limited, in part owing to a lack of standardization of the radiomic measurement extraction processes and the lack of evidence


demonstrating adequate clinical validity and utility. * Processes to acquire and process source images and extract radiomic measurements should be established and harmonized. * A radiomic


model should be tested on external data not used for its development or, if no such dataset is available, tested using proper internal validation techniques. * Model outputs should be shown


to guide disease management decisions in a way that leads to a favourable risk–benefit balance for patients. * Clinical performance should be assessed periodically in its intended clinical


setting (task and population) after model lockdown. * A list of 16 criteria for the optimal development of a radiomic test has been compiled herein and should hopefully guide the


implementation of future radiomic analyses. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access


through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to


this journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy


now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer


support SIMILAR CONTENT BEING VIEWED BY OTHERS ROBUST IMAGING HABITAT COMPUTATION USING VOXEL-WISE RADIOMICS FEATURES Article Open access 11 October 2021 INVESTIGATION OF RADIOMICS BASED


INTRA-PATIENT INTER-TUMOR HETEROGENEITY AND THE IMPACT OF TUMOR SUBSAMPLING STRATEGIES Article Open access 14 October 2022 IDENTIFICATION OF CT RADIOMIC FEATURES ROBUST TO ACQUISITION AND


SEGMENTATION VARIATIONS FOR IMPROVED PREDICTION OF RADIOTHERAPY-TREATED LUNG CANCER PATIENT RECURRENCE Article Open access 19 April 2024 REFERENCES * Gillies, R. J., Kinahan, P. E. &


Hricak, H. Radiomics: images are more than pictures, they are data. _Radiology_ 278, 563–577 (2016). Article  Google Scholar  * Giger, M. L. Update on the potential of computer-aided


diagnosis for breast cancer. _Fut. Oncol._ 6, 1–4 (2010). Article  Google Scholar  * Doi, K. Computer-aided diagnosis in medical imaging: historical review, current status, and future


potential. _Comput. Med. Imaging Graph._ 31, 198–211 (2007). Article  Google Scholar  * Lambin, P. et al. Radiomics: extracting more information from medical images using advanced feature


analysis. _Eur. J. Cancer_ 48, 441–446 (2012). Article  Google Scholar  * FDA-NIH Biomarker Working Group. _BEST (Biomarkers, EndpointS, and other Tools) Resource_ (Food and Drug


Administration and National Institutes of Health, 2016). * FDA. _Artificial Intelligence and Machine Learning (AI/ML)-Enabled Devices_


https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices. (2022). * Fornacon-Wood, I. M. et al. Reliability


and prognostic value of radiomic features are highly dependent on choice of feature extraction platform. _Eur. Radiol._ 30, 6241–6250 (2020). Article  Google Scholar  * Radiomics. _Radiomics


Quality Score – RQS 2.0_ https://www.radiomics.world/rqs2 (2022). * Zwanenburg, A. et al. The image biomarker standardization initiative: standardized quantitative radiomics for high


throughput image-based phenotyping. _Radiology_ 295, 328–338 (2020). Article  Google Scholar  * Kumar, V. et al. Radiomics: the process and the challenges. _Magn. Reson. Imaging_ 30,


1234–1248 (2012). Article  Google Scholar  * Fournier, L. et al. Incorporating radiomics into clinical trials: expert consensus endorsed by the European society of radiology on


considerations for data-driven compared to biologically driven quantitative biomarkers. _Eur. Radiol._ 31, 6001–6012 (2021). Article  Google Scholar  * McShane, L. M. et al. Criteria for the


use of omics-based predictors in clinical trials: explanation and elaboration. _BMC Med._ 11, 220 (2013). Article  Google Scholar  * Jiang, Y., Edwards, A. V. & Newstead, G. M.


Artificial intelligence applied to breast MRI for improved diagnosis. _Radiology_ 298, 39–46 (2021). Article  Google Scholar  * Data Science Institute, American College of Radiology. _FDA


Cleared AI Algorithms_ https://www.acrdsi.org/DSI-Services/FDA-Cleared-AI-Algorithms, (2022). * Clark, G. M. Prognostic factors versus predictive factors: examples from a clinical trial of


erlotinib. _Mol. Oncol._ 1, 406–412 (2008). Article  Google Scholar  * Aerts, H. J. W. L. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.


_Nat. Commun._ 5, 4006 (2014). Article  CAS  Google Scholar  * Li, H. et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the


TCGA/TCIA data set. _NPJ Breast Cancer_ 2, 16012 (2016). Article  Google Scholar  * Li, H. et al. MRI radiomics signatures for predicting the risk of breast cancer recurrence as given by


research versions of gene assays of MammaPrint, Oncotype DX, and PAM50. _Radiology_ 281, 382–391 (2016). Article  Google Scholar  * Cha, K. H. et al. Bladder cancer treatment response


assessment in CT using radiomics with deep learning. _Nat. Sci. Rep._ 7, 8738 (2017). Google Scholar  * Drukker, K. et al. Most-enhancing tumor volume by mri radiomics predicts


recurrence-free survival “Early On” in neoadjuvant treatment of breast cancer. _Cancer Imaging_ 18, 12 (2018). Article  Google Scholar  * Huang, E. P., Lin, F. I. & Shankar, L. K. Beyond


correlations, sensitivities, and specificities: a roadmap for demonstrating utility of advanced imaging in oncology treatment and clinical trial design. _Acad. Radiol._ 24, 1036–1049


(2017). Article  Google Scholar  * Subramanian, J. & Simon, R. What should physicians look for in evaluating prognostic gene-expression signatures? _Nat. Rev. Clin. Oncol._ 7, 327–334


(2010). Article  Google Scholar  * Shafiq-Ul-Hassan, M. et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. _Med. Phys._ 44, 1050–1062 (2017).


Article  CAS  Google Scholar  * Berenguer, R. et al. Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters. _Radiology_ 288, 407–415 (2018).


Article  Google Scholar  * American College of Radiology. _ACR Appropriateness Criteria_ https://www.acr.org/Clinical-Resources/ACR-Appropriateness-Criteria (2022). * Society of Nuclear


Medicine and Medical Imaging. _Procedure Standards_ https://www.snmmi.org/ClinicalPractice/content.aspx?ItemNumber=6414. (2022). * European Association of Nuclear Medicine. _Guidelines_


https://www.eanm.org/publications/guidelines/ (2022). * QIBQ Wiki. _Profiles_ http://qibawiki.rsna.org/index.php/Profiles (2022). * Fass, L. Imaging and cancer: a review. _Mol. Oncol._ 2,


115–152 (2008). Article  Google Scholar  * Zhao, B. et al. Exploring intra- and inter-reader variability in unidimensional, bidimensional, and volumetric measurements of solid tumors on CT


scans reconstructed at different slice intervals. _Eur. J. Radiol._ 82, 959–968 (2013). Article  Google Scholar  * O’Connor, J. P. B., Jackson, A., Parker, G. J. M., Roberts, C. &


Jayson, G. C. Dynamic contrast-enhanced MRI in clinical trials of anti-vascular therapies. _Nat. Rev. Clin. Oncol._ 9, 167–177 (2012). Article  Google Scholar  * Tudorica, L. A. et al. QIN:


a feasible high spatiotemporal resolution breast DCE-MRI protocol for clinical settings. _Magn. Reson. Imaging_ 30, 1257–1267 (2012). Article  Google Scholar  * Nardone, V. et al. Delta


radiomics: a systematic review. _Radiol. Med._ 126, 1571–1583 (2021). Article  Google Scholar  * Pinker, K., Riedl, C. & Weber, W. A. Evaluating tumor response with FDG-PET: updates on


PERCIST, comparison with EORTC criteria and clues to future development. _Eur. J. Nucl. Med. Mol. Imaging_ 44, 55–66 (2017). Article  Google Scholar  * Mackin, D. et al. Harmonizing the


pixel size in retrospective computed tomography radiomics studies. _PLoS ONE_ 12, e0178524 (2017). Article  Google Scholar  * Madabhushi, A., Udupa, J. K. & Souza, A. Generalized scale:


theory, algorithms, and application to image inhomogeneity correction. _Comput. Image Vis. Underst._ 101, 100–121 (2006). Article  Google Scholar  * Madabhushi, A. & Udupa, J. K. New


methods of MR image intensity standardization via generalized scale. _Med. Phys._ 33, 3426–3434 (2006). Article  Google Scholar  * Whitney, H. M. et al. Harmonization of radiomic features of


breast lesions across international DCE-MRI datasets. _J. Med. Imaging_ 7, 012707 (2020). Article  Google Scholar  * Duron, L. et al. Gray-level discretization impacts reproducible MRI


radiomics texture features. _PLoS ONE_ 14, e0213459 (2019). Article  CAS  Google Scholar  * Larue, R. T. H. M. et al. Influence of gray level discretization on radiomic feature stability for


different CT scanners, tube currents, and slice thicknesses: a comprehensive phantom study. _Acta Oncol._ 56, 1544–1553 (2017). Article  Google Scholar  * Leijenaar, R. T. et al. The effect


of SUV discretization in quantitative FDG-PET radiomics: the need for standardized methodology in tumor texture analysis. _Nat. Sci. Rep._ 5, 11075 (2015). CAS  Google Scholar  * Willemink,


M. J. et al. Preparing medical imaging data for machine learning. _Radiology_ 295, 4–15 (2020). Article  Google Scholar  * Mali, S. A. et al. Making radiomics more reproducible across


scanner and imaging protocol variations: a review of harmonization methods. _J. Per. Med._ 11, 842 (2021). Article  Google Scholar  * Lin, Y. et al. Deep learning for fully automated tumor


segmentation and extraction of magnetic resonance radiomics features in cervical cancer. _Eur. Radiol._ 30, 1297–1305 (2020). Article  Google Scholar  * Parmar, C., Grossman, P., Bussink,


J., Lambin, P. & Aerts, H. J. W. L. Machine learning methods for quantitative radiomic biomarkers. _Nat. Sci. Rep._ 5, 13087 (2015). CAS  Google Scholar  * Primakov, S. P. et al.


Automated detection and segmentation of non-small cell lung cancer computed tomography images. _Nat. Commun._ 13, 3423 (2022). Article  CAS  Google Scholar  * Gilhuijs, K. G. A., Giger, M.


L. & Bick, U. Automated analysis of breast lesions in three dimensions using dynamic magnetic resonance imaging. _Med. Phys._ 25, 1647–1654 (1998). Article  CAS  Google Scholar  * Chen,


W., Giger, M. L., Lan, L. & Bick, U. Computerized interpretation of breast MRI: investigation of enhancement-variance dynamics. _Med. Phys._ 31, 1076–1082 (2004). Article  Google Scholar


  * Chen, W., Giger, M. L., Bick, U. & Newstead, G. Automatic identification and classification of characteristic kinetic curves of breast lesions on DCE-MRI. _Med. Phys._ 33, 2878–2887


(2006). Article  Google Scholar  * Chen, W., Giger, M. L., Li, H., Bick, U. & Newstead, G. Volumetric texture analysis of breast lesions on contrast-enhanced magnetic resonance images.


_Magn. Reson. Med._ 58, 562–571 (2007). Article  Google Scholar  * van Timmeren, J. E. et al. Test-retest data for radiomics feature stability analysis: generalizable or study-specific?


_Tomography_ 2, 361–365 (2016). Article  Google Scholar  * Afshar, P., Mohammadi, A., Plataniotis, K. N., Oikonomou, A. & Benali, H. From hand-crafted to deep learning-based cancer


radiomics: challenges and opportunities. _IEEE Signal. Process. Mag._ 36, 132–160 (2019). Article  Google Scholar  * Sahiner, B. et al. Deep learning in medical imaging and radiation


therapy. _Med. Phys._ 46, e1–e36 (2019). Article  Google Scholar  * Li, Z., Wang, Y., Yu, J., Guo, Y. & Cao, W. Deep learning based radiomics (DLR) and its usage in noninvasive IDH1


prediction for low grade glioma. _Nat. Sci. Rep._ 7, 1–11 (2017). Google Scholar  * Antropova, N., Huynh, B. Q. & Giger, M. L. A deep feature fusion methodology for breast cancer


diagnosis demonstrated on three imaging modality datasets. _Med. Phys._ 44, 5162–5171 (2017). Article  CAS  Google Scholar  * International Organization for Standardization. _Guidance for


the Use of Repeatability, Reproducibility, and Trueness Estimates in Measurement Uncertainty Evaluation_ https://www.iso.org/obp/ui/#iso:std:iso:21748:ed-2:v1:en (2020). * Drukker, K.,


Pesce, L. & Giger, M. L. Repeatability in computer-aided diagnosis: application to breast cancer diagnosis on sonography. _Med. Phys._ 37, 2659–2669 (2010). Article  Google Scholar  *


Kessler, L. G. et al. The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions. _Stat. Methods Med. Res._ 24,


9–26 (2015). Article  Google Scholar  * Raunig, D. L. et al. Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment. _Stat. Methods Med. Res._


24, 27–67 (2015). Article  Google Scholar  * Huang, E. P. et al. Multiparametric quantitative imaging in risk prediction: recommendations for data acquisition, technical performance


assessment, and model development and validation. _Acad. Radiol._ https://doi.org/10.1016/j.acra.2022.09.018 (2022). Article  Google Scholar  * McHugh, D. J. et al. Image contrast, image


preprocessing, and T1-mapping affect MRI radiomic feature repeatability in patients with colorectal cancer liver metastases. _Cancers_ 13, 240 (2021). Article  Google Scholar  * Jha, A. K.


et al. Repeatability and reproducibility study of radiomic features on a phantom and human cohort. _Sci. Rep._ 11, 2055 (2021). Article  CAS  Google Scholar  * Bissoto, A., Perez, F., Valle,


E. & Avila, S. Skin lesion synthesis with generative adversarial networks. _OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based


Procedures, and Skin Image Analysis. OR 2.0 First International Workshop, CARE Fifth International Workshop, CLIP Seventh International Workshop, ISIC Third International Workshop_. Springer


Lecture Notes in Computer Science (Springer, 2019). * Sullivan, D. C. et al. Metrology standards for quantitative imaging biomarkers. _Radiology_ 277, 813–825 (2015). Article  Google


Scholar  * Hackstadt, A. J. & Hess, A. M. Filtering for increased power for microarray data analysis. _BMC Bioinformatics_ 10, 11 (2009). Article  Google Scholar  * Luo, J. et al. A


comparison of batch effect removal methods for enhancement of prediction performance using MACQ-II microarray gene expression data. _Pharmacogenomics J._ 10, 278–291 (2010). Article  CAS 


Google Scholar  * Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical bayes methods. _Biostatistics_ 8, 118–127 (2007). Article 


Google Scholar  * Orlhac, F. et al. A post-reconstruction harmonization method for multicenter radiomic studies in PET. _J. Nucl. Med._ 59, 1321–1328 (2018). Article  CAS  Google Scholar  *


Parker, H. S. & Leek, J. T. The practical effect of batch on genomic prediction. _Stat. Appl. Genet. Mol. Biol._ 11, 10 (2012). Article  Google Scholar  * Robinson, K., Li, H., Lan, L.,


Schacht, D. & Giger, M. Radiomics robustness assessment and classification evaluation: a two-stage method demonstrated on multivendor FFDM. _Med. Phys._ 46, 2145–2156 (2019). Article 


Google Scholar  * _The Cancer Imaging Archive_ http://cancerimagingarchive.net (2020). * Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information


repository. _J. Digital Imaging_ 26, 1045–1057 (2013). Article  Google Scholar  * Zhu, Y. et al. Deciphering genomic underpinnings of quantitative MRI-based radiomic phenotypes of invasive


breast carcinoma. _Nat. Sci. Rep._ 5, 17787 (2015). CAS  Google Scholar  * Riley, R. D. et al. Minimum sample size for developing a multivariable prediction model: part II — binary and


time-to-event outcomes. _Stat. Med._ 38, 1276–1296 (2018). Article  Google Scholar  * Riley, R. D. et al. Minimum sample size for external validation of a clinical prediction model with a


binary outcome. _Stat. Med._ 40, 4230–4251 (2021). Article  Google Scholar  * Riley, R. D. et al. Minimum sample size calculations for external validation of a clinical prediction model with


a time-to-event outcome. _Stat. Med._ 41, 1280–1295 (2022). Article  Google Scholar  * Cho, J., Lee, K., Shin, E., Choy, G. & Do, S. How much data is needed to train a medical image


deep learning system to achieve necessary high accuracy? Preprint at https://doi.org/10.48550/arXiv.1511.06348 (2015). * Whitney, H., Li, H., Ji, Y., Liu, P. & Giger, M. L. Comparison of


breast MRI tumor classification using human-engineered radiomics, transfer learning from deep convolutional neural networks, and fusion methods. _Proc. IEEE_ 108, 163–177 (2020). Article 


Google Scholar  * Hastie, T., Tibshirani, R. & Friedman, J. _The Elements of Statistical Learning: Data Mining, Inference and Prediction_ 2nd edn (Springer, 2009). * Deist, T. M. et al.


Machine learning algorithms for outcome prediction in (chemo)radiotherapy: an empirical comparison of classifiers. _Med. Phys._ 45, 3449–3459 (2018). Article  Google Scholar  * Haykin S.


_Neural Networks: A Comprehensive Foundation_ (Prentice Hall, 1994). * Ben-Dor, A. et al. Tissue classification with gene expression profiles. _J. Comput. Biol._ 7, 559–583 (2000). Article 


CAS  Google Scholar  * Dudoit, S., Fridlyand, J. & Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. _J. Am. Stat. Assoc._


97, 77–87 (2002). Article  CAS  Google Scholar  * Heinze, G., Wallisch, C. & Dunkler, D. Variable selection — a review and recommendations for the practicing statistician. _Biom. J._ 60,


431–449 (2018). Article  Google Scholar  * Tibshirani, R. Regression shrinkage and selection via the LASSO. _J. R. Stat. Soc. Ser. B_ 58, 267–288 (1996). Google Scholar  * Hanley, J. A.


& McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. _Radiology_ 143, 29–36 (1982). Article  CAS  Google Scholar  * Harrell, F. E. Jr.,


Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the yield of medical tests. _J. Am. Med. Assoc._ 247, 2543–2546 (1982). Article  Google Scholar  * Hosmer, D. W. &


Lemeshow, S. Goodness of fit tests for the multiple logistic regression model. _Commun. Stat. Theory Methods_ 9, 1043–1069 (1980). Article  Google Scholar  * Lemeshow, S. & Hosmer, D. A


review of goodness of fit statistics for use in the development of logistic regression model. _Am. J. Epidemiol._ 115, 92–106 (1982). Article  CAS  Google Scholar  * van Calster, B. &


Steyerberg, E. W. _Wiley StatsRef: Statistics Reference Online_ (John Wiley and Sons, Ltd., 2018). * Bröcker, J. & Smith, L. A. Increasing the reliability of reliability diagrams.


_Weather Forecast._ 22, 651–661 (2007). Article  Google Scholar  * McLachlan, G. J. _Discriminant Analysis and Statistical Pattern Recognition_ (John Wiley and Sons, 2002). * Stone, M.


Cross-validatory choice and assessment of statistical predictions. _J. R. Stat. Soc. Ser. B_ 36, 111–147 (1974). Google Scholar  * Breiman, L. Bagging predictors. _Mach. Learn._ 24, 123–140


(1996). Article  Google Scholar  * Molinaro, A. M., Simon, R. & Pfeffer, R. M. Prediction error estimation: a comparison of resampling methods. _Bioinformatics_ 21, 3301–3307 (2005).


Article  CAS  Google Scholar  * Dobbin, K. K. & Simon, R. M. Optimally splitting cases for training and testing high-dimensional classifiers. _BMC Med. Genomics_ 4, 31 (2011). Article 


Google Scholar  * Sachs, M. C. & McShane, L. M. Issues in developing multivariable molecular signatures for guiding clinical care decisions. _J. Biopharm. Stat._ 26, 1098–1110 (2016).


Article  Google Scholar  * Varma, S. & Simon, R. Bias in error estimation when using cross-validation for model selection. _BMC Bioinformatics_ 7, 91 (2006). Article  Google Scholar  *


Salahuddin, Z., Woodruff, H. C., Chatterjee, A. & Lambin, P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. _Comput. Biol. Med._


140, 105111 (2022). Article  Google Scholar  * Hilsenbeck, S. G., Clark, G. M. & McGuire, W. L. Why do so many prognostic factors fail to pan out? _Breast Cancer Res. Treat._ 22, 197–206


(1992). Article  CAS  Google Scholar  * Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. _Med. Decis. Mak._ 26, 565–574 (2006).


Article  Google Scholar  * Wu, G. et al. Preoperative CT-based radiomics combined with intraoperative frozen section is predictive of invasive adenocarcinoma in pulmonary nodules: a


multicenter study. _Eur. Radiol._ 30, 2680–2691 (2020). Article  Google Scholar  * Hayes, D. F. Defining clinical utility of tumor biomarker tests: a clinician’s viewpoint. _J. Clin. Oncol._


39, 238–249 (2021). Article  Google Scholar  * Saha, A., Hosseinzadeh, M. & Huisman, H. End-to-end prostate cancer detection in bpmri via 3d cnns: effects of attention mechanisms,


clinical priori and decoupled false positive reduction. _Med. Image Anal._ 73, 102155 (2021). Article  Google Scholar  * Hosseinzadeh, M. et al. Deep learning-assisted prostate cancer


detection on bi-parametric MRI: minimum training data size requirements and effect of prior knowledge. _Eur. Radiol._ 32, 2224–2234 (2022). Article  CAS  Google Scholar  * Baughan, N. et al.


_Sequestration of Imaging Studies in MIDRC: A Multi-institutional Data Commons._ _Medical Imaging 2002; Image Perception, Observer Performance, and Technology Assessment_, vol. 12035 (SPIE,


2022). * Simon, R. M., Paik, S. & Hayes, D. F. Use of archived specimens in evaluation of prognostic and predictive biomarkers. _J. Natl Cancer Inst._ 101, 1446–1452 (2009). Article 


Google Scholar  * Pappalardo, F., Gusso, G., Tshinanu, F. M. & Viceconti, M. In silico clinical trials: concepts and early adoptions. _Brief. Bioinforma._ 20, 1699–1708 (2019). Article 


CAS  Google Scholar  * Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Board on Health Care Services, Board on Health Sciences Policy,


Institute of Medicine. _Evolution of Translational Omics: Lessons Learned and the Path Forward_ (The National Academies Press, 2012). * Altman, D. G., McShane, L. M., Sauerbrei, W. &


Taube, S. E. Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. _PLoS Med._ 9, e1001216 (2012). Article  Google Scholar  * Equator Network.


_Enhancing the Quality and Transparency of Health Research_ (EQUATOR) https://www.equator-network.org/ (2022). Download references ACKNOWLEDGEMENTS P.L. acknowledges support for the


publication of this work from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement CHAIMELEON No. 952172, EuCanImage No. 952103, IMI-OPTIMA No. 101034347


and ERC advanced grant (ERC-ADG-2015 No. 694812 – Hypoximmuno). P.K. acknowledges support for the publication of this work from NCI grant P50 CA228944. AUTHOR INFORMATION AUTHORS AND


AFFILIATIONS * Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Rockville, MD, USA Erich P. Huang, Lisa M. McShane & Lalitha K.


Shankar * Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK James P. B. O’Connor * Department of Radiology, University of Chicago, Chicago, IL, USA Maryellen L.


Giger * Department of Precision Medicine, Maastricht University, Maastricht, Netherlands Philippe Lambin * Department of Radiology, University of Washington, Seattle, WA, USA Paul E. Kinahan


* Department of Diagnostic Radiology, University of Maryland, Baltimore, MD, USA Eliot L. Siegel Authors * Erich P. Huang View author publications You can also search for this author


inPubMed Google Scholar * James P. B. O’Connor View author publications You can also search for this author inPubMed Google Scholar * Lisa M. McShane View author publications You can also


search for this author inPubMed Google Scholar * Maryellen L. Giger View author publications You can also search for this author inPubMed Google Scholar * Philippe Lambin View author


publications You can also search for this author inPubMed Google Scholar * Paul E. Kinahan View author publications You can also search for this author inPubMed Google Scholar * Eliot L.


Siegel View author publications You can also search for this author inPubMed Google Scholar * Lalitha K. Shankar View author publications You can also search for this author inPubMed Google


Scholar CORRESPONDING AUTHOR Correspondence to Erich P. Huang. ETHICS DECLARATIONS COMPETING INTERESTS M.G. has acted as a scientific adviser of Quantitative Insights (now Qlarity Imaging),


is the contact Principal Investigator for MIDRC (funded by NIBIB COVID-19 Contract 75N92020D00021), receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverain


Medical, Mitsubishi and Toshiba, holds stocks in R2/Hologic, is a shareholder in Qview, and is a co-founder of and equity holder in Quantitative Insights (now Qlarity Imaging). P.L. is a


co-founder, minority shareholder and member of the advisory board of Oncoradiomics, and is listed as a co-inventor on several licensed patents in radiomics. E.P.H., J.P.B.O.-C., L.M.M.,


P.E.K., E.L.S. and L.K.S. declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Reviews Clinical Oncology_ thanks K. Bera. J.-E. Bibault, J. Tian and the other,


anonymous, reviewer(s) for their contribution to the peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in


published maps and institutional affiliations. GLOSSARY * Biomarker A characteristic indicating non-pathological or pathological biological processes and/or an increased likelihood of a


response to an exposure or intervention5. * Clinical utility The degree to which acting upon the results of the radiomic test leads to a favourable benefit–risk balance for the patient. *


Clinical validity The adequacy of the clinical performance of the radiomic test for its intended purpose. * Deep learning A class of machine learning based on neural networks. * Model A


computational algorithm applied to extracted image features or voxel-level image data themselves. * Model outputs The result of a computational algorithm applied to the extracted image


features or voxel-level data themselves; a quantity to be used in guiding clinical management. * Model validation Establishment of the ability of a model to predict an outcome of interest


when applied to new data. * Neural network A type of computational algorithm based on the operation of biological neural systems in animals that feeds the input (in this context, feature


measurements or voxel-level data) through a series of nodes that perform mathematical operations on the outputs of preceding nodes to produce an output. In a convolutional neural network,


these mathematical operations involve applying convolutional kernels to the outputs of preceding nodes. * Normalization A process for adjusting the voxel intensity values of an image for


differences resulting from variability in image acquisition and processing parameters. * Omics The study of related sets of biological molecules in a comprehensive fashion with examples


including genomics, transcriptomics, proteomics, metabolomics and epigenomics109. Radiomics naturally extends this definition to include quantification of radiological imaging features for


the purposes of characterization and measurement of structure, function and interaction between biological molecules in a comprehensive and high-throughput manner. * Overfitting The process


of fitting an overly complex model to noise in the data, thus producing a model that is only poorly predictive when applied to completely new data. * Performance metric A quantity indicating


the ability of a model to predict an outcome of interest. * Phantoms An object that is imaged to measure the technical performance of an imaging device. * Radiomic features Quantities


computed from voxel-level image data. * Radiomic test A system comprising materials, methods and procedures for image acquisition, processing and feature extraction, and methods or criteria


for interpretation of the image data for use in guiding clinical management. * Technical artefacts The effects of factors, such as imaging centre, device, operator or device-calibration


settings, on the distribution of the feature measurements. * Technical validity The quality of the feature measurements in terms of their accuracy in assaying an underlying characteristic of


interest or their variability when the feature extraction process is applied repeatedly to the same patient. * Test lockdown Full specification of all image acquisition, processing and


feature extraction procedures, all aspects of the underlying model, and interpretations of the output. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE


Huang, E.P., O’Connor, J.P.B., McShane, L.M. _et al._ Criteria for the translation of radiomics into clinically useful tests. _Nat Rev Clin Oncol_ 20, 69–82 (2023).


https://doi.org/10.1038/s41571-022-00707-0 Download citation * Accepted: 02 November 2022 * Published: 28 November 2022 * Issue Date: February 2023 * DOI:


https://doi.org/10.1038/s41571-022-00707-0 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not


currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative