General-purpose foundation models for increased autonomy in robot-assisted surgery

General-purpose foundation models for increased autonomy in robot-assisted surgery

Play all audios:

Loading...

ABSTRACT The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a


target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video


demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems


that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is


challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and


surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in


robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely


positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery. Access through your institution Buy or subscribe This


is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our


best-value online-access subscription $32.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital issues and online access to articles $119.00 per year only


$9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated during checkout


ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS INNOVATING ROBOT-ASSISTED


SURGERY THROUGH LARGE VISION MODELS Article 12 May 2025 USER-SPECIFIED INVERSE KINEMATICS TAUGHT IN VIRTUAL REALITY REDUCE TIME AND EFFORT TO HAND-GUIDE REDUNDANT SURGICAL ROBOTS Article


Open access 13 February 2025 ROBOTIC SURGERY Article 22 April 2025 REFERENCES * Blakeslee, S. Robot arm assists in three brain operations. _The New York Times_ (25 June 1985). * Seo, H.-J.


et al. Comparison of robot-assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta-analysis. _Yonsei Med. J._ 57, 1165–1177 (2016). Google


Scholar  * Sheetz, K. H., Claflin, J. & Dimick, J. B. Trends in the adoption of robotic surgery for common surgical procedures. _JAMA Netw. Open_ 3, e1918911 (2020). Google Scholar  *


Dhanani, N. H. et al. The evidence behind robot-assisted abdominopelvic surgery: a systematic review. _Ann. Intern. Med._ 174, 1110–1117 (2021). Google Scholar  * Lotan, Y. Is robotic


surgery cost-effective: no. _Curr. Opin. Urol._ 22, 66–69 (2012). Google Scholar  * Shademan, A. et al. Supervised autonomous robotic soft tissue surgery. _Sci. Transl. Med._ 8, 337ra64


(2016). Google Scholar  * Saeidi, H. et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis. _Sci. Robot._ 7, eabj2908 (2022). Google Scholar  * Kuntz, A. et al.


Autonomous medical needle steering in vivo. _Sci. Robot._ 8, eadf7614 (2023). Google Scholar  * Richter, F. et al. Autonomous robotic suction to clear the surgical field for hemostasis using


image-based blood flow detection. _IEEE Robot. Autom. Lett._ 6, 1383–1390 (2021). Google Scholar  * Reed, S. et al. A generalist agent. _Trans. Mach. Learn. Res._


https://openreview.net/forum?id=1ikK0kHjvj (2022). * Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In _Proc. Robotics: Science and Systems XIX_ (eds Bekris,


K. et al.) 25 (RSS, 2023). * Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In _Conference on Robot Learning_ 2165–2183 (PMLR, 2023). *


Open X-Embodiment Collaboration. Open X-Embodiment: robotic learning datasets and RT-X models. _GitHub_ https://robotics-transformer-x.github.io (2023). * Hu, Y. et al. Toward


general-purpose robots via foundation models: a survey and meta-analysis. Preprint at https://arxiv.org/abs/2312.08782 (2023). * Arulkumaran, K., Deisenroth, M. P., Brundage, M. &


Bharath, A. A. Deep reinforcement learning: a brief survey. _IEEE Signal Process. Mag._ 34, 26–38 (2017). Google Scholar  * Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M.


Learning quadrupedal locomotion over challenging terrain. _Sci. Robot._ 5, eabc5986 (2020). Google Scholar  * Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in


challenging terrains using egocentric vision. In _Conference on Robot Learning_ 403–415 (PMLR, 2023). * Liu, R., Nageotte, F., Zanne, P., de Mathelin, M. & Dresp-Langley, B. Deep


reinforcement learning for the control of robotic manipulation: a focussed mini-review. _Robotics_ 10, 22 (2021). Google Scholar  * Zhao, T. Z., Kumar, V., Levine, S. & Finn, C. Learning


fine-grained bimanual manipulation with low-cost hardware. In _Proc. Robotics: Science and Systems XIX_ (eds Bekris, K. et al.) 16 (RSS, 2023). * Yip, M. & Das, N. in _The Encyclopedia


of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics_ (ed. Patel, R. V.) 281–313 (World Scientific, 2019). * Zhang, C., Vinyals, O., Munos, R. & Bengio, S. A study on


overfitting in deep reinforcement learning. Preprint at https://arxiv.org/abs/1804.06893 (2018). * Van Den Berg, J. et al. Superhuman performance of surgical tasks by robots using iterative


learning from human-guided demonstrations. In _2010 IEEE International Conference on Robotics and Automation_ 2074–2081 (IEEE, 2010). * Hu, Y. et al. Model predictive optimization for


imitation learning from demonstrations. _Robot. Auton. Syst._ 163, 104381 (2023). Google Scholar  * Huang, T., Chen, K., Li, B., Liu, Y. H. & Dou, Q. Demonstration-guided reinforcement


learning with efficient exploration for task automation of surgical robot. In _2023 IEEE International Conference on Robotics and Automation (ICRA)_ 4640–4647 (IEEE, 2023). * Osa, T. et al.


An algorithmic perspective on imitation learning. _Found. Trends Robot._ 7, 1–179 (2018). Google Scholar  * Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons


we have learned. _Int. J. Robot. Res._ 40, 698–721 (2021). Google Scholar  * Octo Model Team et al. Octo: an open-source generalist robot policy. _GitHub_ https://octo-models.github.io


(2023). * Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021). * Moor, M. et al. Foundation models for generalist


medical artificial intelligence. _Nature_ 616, 259–265 (2023). Google Scholar  * Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at


https://arxiv.org/abs/2307.09288 (2023). * Vaswani, A. et al. Attention is all you need. In _Advances in Neural Information Processing Systems 30_ (eds Guyon, I. et al.) (NIPS 2017). *


Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In _International Conference on Learning Representations_ (ICLR, 2021). * Zemmar, A.,


Lozano, A. M. & Nelson, B. J. The rise of robots in surgical environments during COVID-19. _Nat. Mach. Intell._ 2, 566–572 (2020). Google Scholar  * Wang, K., Ho, C.-C., Zhang, C. &


Wang, B. A review on the 3D printing of functional structures for medical phantoms and regenerated tissue and organ applications. _Engineering_ 3, 653–662 (2017). Google Scholar  * Ghazi, A.


A call for change. Can 3D printing replace cadavers for surgical training? _Urol. Clin._ 49, 39–56 (2022). Google Scholar  * Bismuth, H. Surgical anatomy and anatomical surgery of the


liver. _World J. Surg._ 6, 3–9 (1982). Google Scholar  * Rice, C. P. et al. Operative complications and economic outcomes of cholecystectomy for acute cholecystitis. _World J.


Gastroenterol._ 25, 6916 (2019). Google Scholar  * Kumar, A., Zhou, A., Tucker, G. & Levine, S. Conservative q-learning for offline reinforcement learning. _Adv. Neural Inf. Process.


Syst._ 33, 1179–1191 (2020). Google Scholar  * Yevgen, C. et al. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In _Conference on Robot Learning_


3909–3928 (PMLR, 2023). * Angelopoulos, A. N. & Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Preprint at


https://arxiv.org/abs/2107.07511 (2021). * Ren, A. Z. et al. Robots that ask for help: uncertainty alignment for large language model planners. In _Conference on Robot Learning_ 661–682


(PMLR, 2023). * Zhang, T. Toward automated vehicle teleoperation: vision, opportunities, and challenges. _IEEE Internet Things J._ 7, 11347–11354 (2020). Google Scholar  * Lim, T., Hwang,


M., Kim, E. & Cha, H. Authority transfer according to a driver intervention intention considering coexistence of communication delay. _Computers_ 12, 228 (2023). Google Scholar  *


Alhajj, H., Lamard, M., Conze, P.-h., Cochener, B. & Quellec, G. Cataracts. _IEEEDataPort_ https://doi.org/10.21227/ac97-8m18 (2021). * Schoeffmann, K. et al. Cataract-101: video dataset


of 101 cataract surgeries. In _Proc. 9th ACM Multimedia Systems Conference_ 421–425 (ACM, 2018). * Bouget, D. et al. Detecting surgical tools by modelling local appearance and global shape.


_IEEE Trans. Med. Imaging_ 34, 2603–2617 (2015). Google Scholar  * Twinanda, A. P. et al. EndoNet: a deep architecture for recognition tasks on laparoscopic videos. _IEEE Trans. Med.


Imaging_ 36, 86–97 (2016). Google Scholar  * Hong, W.-Y. et al. CholecSeg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on Cholec80. Preprint at


https://arxiv.org/abs/2012.12453 (2020). * Nwoye, C. I. et al. Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. _Med. Image Anal._ 78,


102433 (2022). Google Scholar  * Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. _Sci. Data_ 8, 101 (2021). Google Scholar  *


Valderrama, N. et al. Towards holistic surgical scene understanding. In _International Conference on Medical Image Computing and Computer-assisted Intervention_ 442–452 (Springer, 2022). *


Gao, Y. et al. Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. _MICCAI Workshop: M2cai_


https://api.semanticscholar.org/CorpusID:16185857 (2014). * Madapana, N. et al. Desk: a robotic activity dataset for dexterous surgical skills transfer to medical robots. In _2019 IEEE/RSJ


International Conference on Intelligent Robots and Systems (IROS)_ 6928–6934 (IEEE, 2019). * Huaulmé, A. et al. Peg Transfer Workflow recognition challenge report: does multi-modal data


improve recognition? Preprint at https://arxiv.org/abs/2202.05821 (2022). * Rivas-Blanco, I., Del-Pulgar, C. J. P., Mariani, A., Tortora, G. & Reina, A. J. A surgical dataset from the da


Vinci research kit for task automation and recognition. In _2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)_ 1–6 (IEEE,


2023). * Goodman, E. D. et al. A real-time spatiotemporal AI model analyzes skill in open surgical videos. Preprint at https://arxiv.org/abs/2112.07219 (2021). * Yuan, K. et al. Learning


multi-modal representations by watching hundreds of surgical video lectures. Preprint at https://arxiv.org/abs/2307.15220 (2023). * Schmidgall, S., Cho, J., Zakka, C. & Hiesinger, W.


GP-VLS: a general-purpose vision language model for surgery. Preprint at https://arxiv.org/abs/2407.19305 (2024). * Kim, H.-S., Kim, D.-J. & Yoon, K.-H. Medical big data is not yet


available: why we need realism rather than exaggeration. _Endocrinol. Metab._ 34, 349–354 (2019). Google Scholar  * Gabelica, M., Bojčić, R. & Puljak, L. Many researchers were not


compliant with their published data sharing statement: a mixed-methods study. _J. Clin. Epidemiol._ 150, 33–41 (2022). Google Scholar  * Hamilton, D. G. et al. Prevalence and predictors of


data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. _BMJ_ 382, e075767 (2023). * Lin, J. et al. Automatic analysis


of available source code of top artificial intelligence conference papers. _Int. J. Softw. Eng. Knowl. Eng._ 32, 947–970 (2022). Google Scholar  * Rives, A. et al. Biological structure and


function emerge from scaling unsupervised learning to 250 million protein sequences. _Proc. Natl Acad. Sci. USA_ 118, e2016239118 (2021). Google Scholar  * Jumper, J. et al. Highly accurate


protein structure prediction with AlphaFold. _Nature_ 596, 583–589 (2021). Google Scholar  * Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie, W. Towards generalist foundation model for


radiology. Preprint at https://arxiv.org/abs/2308.02463 (2023). * Wang, D. et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification. _Sci.


Data_ 10, 574 (2023). Google Scholar  * Hsu, L. G. et al. Nonsurgical factors that influence the outcome of bariatric surgery: a review. _Psychosom. Med._ 60, 338–346 (1998). Google Scholar


  * Benoist, S., Panis, Y., Alves, A. & Valleur, P. Impact of obesity on surgical outcomes after colorectal resection. _Am. J. Surg._ 179, 275–281 (2000). Google Scholar  * Rosenberger,


P. H., Jokl, P. & Ickovics, J. Psychosocial factors and surgical outcomes: an evidence-based literature review. _J. Am. Acad. Orthop. Surg._ 14, 397–405 (2006). Google Scholar  * Lam, K.


et al. Machine learning for technical skill assessment in surgery: a systematic review. _npj Digit. Med._ 5, 24 (2022). Google Scholar  * Khalid, S., Goldenberg, M., Grantcharov, T., Taati,


B. & Rudzicz, F. Evaluation of deep learning models for identifying surgical actions and measuring performance. _JAMA Netw. Open_ 3, e201664–e201664 (2020). Google Scholar  * Haque, T.


F. et al. An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (EASE). _Urol. Pract._


9, 532–539 (2022). Google Scholar  * Moon, M. R. Early-and late-career surgeon deficiencies in complex cases. _J. Thorac. Cardiovasc. Surg._ 164, 1023–1025 (2022). Google Scholar  *


O’Sullivan, S. et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. _Int. J. Med. Robot. Comput.


Assist. Surg._ 15, e1968 (2019). Google Scholar  * Van Norman, G. A. Drugs, devices, and the FDA: part 2: an overview of approval processes: FDA approval of medical devices. _JACC Basic


Transl. Sci._ 1, 277–287 (2016). Google Scholar  * Kim, J. W. et al. Surgical robot transformer (SRT): imitation learning for surgical tasks. In _Conference on Robot Learning_ (PMLR, 2024).


* Beasley, R. A. Medical robots: current systems and research directions. _J. Robot._ 2012, 401613 (2012). Google Scholar  * Lee, C. et al. A grip force model for the da Vinci end-effector


to predict a compensation force. _Med. Biol. Eng. Comput._ 5, 253–261 (2015). Google Scholar  Download references ACKNOWLEDGEMENTS This material is based on work supported by the National


Science Foundation under grant numbers DGE 2139757, NSF/FRR 2144348, NIH R56EB033807 and ARPA-H AY1AX000023. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Electrical and


Computer Engineering, Johns Hopkins University, Baltimore, MD, USA Samuel Schmidgall * Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA Ji Woong Kim & 


Axel Krieger * Kahlert School of Computing, University of Utah, Salt Lake City, UT, USA Alan Kuntz * Department of Urology, Johns Hopkins Medical Institute, Baltimore, MD, USA Ahmed Ezzat


Ghazi Authors * Samuel Schmidgall View author publications You can also search for this author inPubMed Google Scholar * Ji Woong Kim View author publications You can also search for this


author inPubMed Google Scholar * Alan Kuntz View author publications You can also search for this author inPubMed Google Scholar * Ahmed Ezzat Ghazi View author publications You can also


search for this author inPubMed Google Scholar * Axel Krieger View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to


Samuel Schmidgall. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Machine Intelligence_ thanks Francesco


Stella and Zhen Li for their contribution to the peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in


published maps and institutional affiliations. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Schmidgall, S., Kim, J.W., Kuntz, A. _et al._


General-purpose foundation models for increased autonomy in robot-assisted surgery. _Nat Mach Intell_ 6, 1275–1283 (2024). https://doi.org/10.1038/s42256-024-00917-4 Download citation *


Received: 22 December 2023 * Accepted: 24 September 2024 * Published: 01 November 2024 * Issue Date: November 2024 * DOI: https://doi.org/10.1038/s42256-024-00917-4 SHARE THIS ARTICLE Anyone


you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by


the Springer Nature SharedIt content-sharing initiative