Play all audios:
ABSTRACT The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a
target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video
demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems
that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is
challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and
surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in
robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely
positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery. Access through your institution Buy or subscribe This
is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our
best-value online-access subscription $32.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital issues and online access to articles $119.00 per year only
$9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated during checkout
ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS INNOVATING ROBOT-ASSISTED
SURGERY THROUGH LARGE VISION MODELS Article 12 May 2025 USER-SPECIFIED INVERSE KINEMATICS TAUGHT IN VIRTUAL REALITY REDUCE TIME AND EFFORT TO HAND-GUIDE REDUNDANT SURGICAL ROBOTS Article
Open access 13 February 2025 ROBOTIC SURGERY Article 22 April 2025 REFERENCES * Blakeslee, S. Robot arm assists in three brain operations. _The New York Times_ (25 June 1985). * Seo, H.-J.
et al. Comparison of robot-assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta-analysis. _Yonsei Med. J._ 57, 1165–1177 (2016). Google
Scholar * Sheetz, K. H., Claflin, J. & Dimick, J. B. Trends in the adoption of robotic surgery for common surgical procedures. _JAMA Netw. Open_ 3, e1918911 (2020). Google Scholar *
Dhanani, N. H. et al. The evidence behind robot-assisted abdominopelvic surgery: a systematic review. _Ann. Intern. Med._ 174, 1110–1117 (2021). Google Scholar * Lotan, Y. Is robotic
surgery cost-effective: no. _Curr. Opin. Urol._ 22, 66–69 (2012). Google Scholar * Shademan, A. et al. Supervised autonomous robotic soft tissue surgery. _Sci. Transl. Med._ 8, 337ra64
(2016). Google Scholar * Saeidi, H. et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis. _Sci. Robot._ 7, eabj2908 (2022). Google Scholar * Kuntz, A. et al.
Autonomous medical needle steering in vivo. _Sci. Robot._ 8, eadf7614 (2023). Google Scholar * Richter, F. et al. Autonomous robotic suction to clear the surgical field for hemostasis using
image-based blood flow detection. _IEEE Robot. Autom. Lett._ 6, 1383–1390 (2021). Google Scholar * Reed, S. et al. A generalist agent. _Trans. Mach. Learn. Res._
https://openreview.net/forum?id=1ikK0kHjvj (2022). * Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In _Proc. Robotics: Science and Systems XIX_ (eds Bekris,
K. et al.) 25 (RSS, 2023). * Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In _Conference on Robot Learning_ 2165–2183 (PMLR, 2023). *
Open X-Embodiment Collaboration. Open X-Embodiment: robotic learning datasets and RT-X models. _GitHub_ https://robotics-transformer-x.github.io (2023). * Hu, Y. et al. Toward
general-purpose robots via foundation models: a survey and meta-analysis. Preprint at https://arxiv.org/abs/2312.08782 (2023). * Arulkumaran, K., Deisenroth, M. P., Brundage, M. &
Bharath, A. A. Deep reinforcement learning: a brief survey. _IEEE Signal Process. Mag._ 34, 26–38 (2017). Google Scholar * Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M.
Learning quadrupedal locomotion over challenging terrain. _Sci. Robot._ 5, eabc5986 (2020). Google Scholar * Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in
challenging terrains using egocentric vision. In _Conference on Robot Learning_ 403–415 (PMLR, 2023). * Liu, R., Nageotte, F., Zanne, P., de Mathelin, M. & Dresp-Langley, B. Deep
reinforcement learning for the control of robotic manipulation: a focussed mini-review. _Robotics_ 10, 22 (2021). Google Scholar * Zhao, T. Z., Kumar, V., Levine, S. & Finn, C. Learning
fine-grained bimanual manipulation with low-cost hardware. In _Proc. Robotics: Science and Systems XIX_ (eds Bekris, K. et al.) 16 (RSS, 2023). * Yip, M. & Das, N. in _The Encyclopedia
of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics_ (ed. Patel, R. V.) 281–313 (World Scientific, 2019). * Zhang, C., Vinyals, O., Munos, R. & Bengio, S. A study on
overfitting in deep reinforcement learning. Preprint at https://arxiv.org/abs/1804.06893 (2018). * Van Den Berg, J. et al. Superhuman performance of surgical tasks by robots using iterative
learning from human-guided demonstrations. In _2010 IEEE International Conference on Robotics and Automation_ 2074–2081 (IEEE, 2010). * Hu, Y. et al. Model predictive optimization for
imitation learning from demonstrations. _Robot. Auton. Syst._ 163, 104381 (2023). Google Scholar * Huang, T., Chen, K., Li, B., Liu, Y. H. & Dou, Q. Demonstration-guided reinforcement
learning with efficient exploration for task automation of surgical robot. In _2023 IEEE International Conference on Robotics and Automation (ICRA)_ 4640–4647 (IEEE, 2023). * Osa, T. et al.
An algorithmic perspective on imitation learning. _Found. Trends Robot._ 7, 1–179 (2018). Google Scholar * Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons
we have learned. _Int. J. Robot. Res._ 40, 698–721 (2021). Google Scholar * Octo Model Team et al. Octo: an open-source generalist robot policy. _GitHub_ https://octo-models.github.io
(2023). * Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021). * Moor, M. et al. Foundation models for generalist
medical artificial intelligence. _Nature_ 616, 259–265 (2023). Google Scholar * Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at
https://arxiv.org/abs/2307.09288 (2023). * Vaswani, A. et al. Attention is all you need. In _Advances in Neural Information Processing Systems 30_ (eds Guyon, I. et al.) (NIPS 2017). *
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In _International Conference on Learning Representations_ (ICLR, 2021). * Zemmar, A.,
Lozano, A. M. & Nelson, B. J. The rise of robots in surgical environments during COVID-19. _Nat. Mach. Intell._ 2, 566–572 (2020). Google Scholar * Wang, K., Ho, C.-C., Zhang, C. &
Wang, B. A review on the 3D printing of functional structures for medical phantoms and regenerated tissue and organ applications. _Engineering_ 3, 653–662 (2017). Google Scholar * Ghazi, A.
A call for change. Can 3D printing replace cadavers for surgical training? _Urol. Clin._ 49, 39–56 (2022). Google Scholar * Bismuth, H. Surgical anatomy and anatomical surgery of the
liver. _World J. Surg._ 6, 3–9 (1982). Google Scholar * Rice, C. P. et al. Operative complications and economic outcomes of cholecystectomy for acute cholecystitis. _World J.
Gastroenterol._ 25, 6916 (2019). Google Scholar * Kumar, A., Zhou, A., Tucker, G. & Levine, S. Conservative q-learning for offline reinforcement learning. _Adv. Neural Inf. Process.
Syst._ 33, 1179–1191 (2020). Google Scholar * Yevgen, C. et al. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In _Conference on Robot Learning_
3909–3928 (PMLR, 2023). * Angelopoulos, A. N. & Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Preprint at
https://arxiv.org/abs/2107.07511 (2021). * Ren, A. Z. et al. Robots that ask for help: uncertainty alignment for large language model planners. In _Conference on Robot Learning_ 661–682
(PMLR, 2023). * Zhang, T. Toward automated vehicle teleoperation: vision, opportunities, and challenges. _IEEE Internet Things J._ 7, 11347–11354 (2020). Google Scholar * Lim, T., Hwang,
M., Kim, E. & Cha, H. Authority transfer according to a driver intervention intention considering coexistence of communication delay. _Computers_ 12, 228 (2023). Google Scholar *
Alhajj, H., Lamard, M., Conze, P.-h., Cochener, B. & Quellec, G. Cataracts. _IEEEDataPort_ https://doi.org/10.21227/ac97-8m18 (2021). * Schoeffmann, K. et al. Cataract-101: video dataset
of 101 cataract surgeries. In _Proc. 9th ACM Multimedia Systems Conference_ 421–425 (ACM, 2018). * Bouget, D. et al. Detecting surgical tools by modelling local appearance and global shape.
_IEEE Trans. Med. Imaging_ 34, 2603–2617 (2015). Google Scholar * Twinanda, A. P. et al. EndoNet: a deep architecture for recognition tasks on laparoscopic videos. _IEEE Trans. Med.
Imaging_ 36, 86–97 (2016). Google Scholar * Hong, W.-Y. et al. CholecSeg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on Cholec80. Preprint at
https://arxiv.org/abs/2012.12453 (2020). * Nwoye, C. I. et al. Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. _Med. Image Anal._ 78,
102433 (2022). Google Scholar * Maier-Hein, L. et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. _Sci. Data_ 8, 101 (2021). Google Scholar *
Valderrama, N. et al. Towards holistic surgical scene understanding. In _International Conference on Medical Image Computing and Computer-assisted Intervention_ 442–452 (Springer, 2022). *
Gao, Y. et al. Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. _MICCAI Workshop: M2cai_
https://api.semanticscholar.org/CorpusID:16185857 (2014). * Madapana, N. et al. Desk: a robotic activity dataset for dexterous surgical skills transfer to medical robots. In _2019 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS)_ 6928–6934 (IEEE, 2019). * Huaulmé, A. et al. Peg Transfer Workflow recognition challenge report: does multi-modal data
improve recognition? Preprint at https://arxiv.org/abs/2202.05821 (2022). * Rivas-Blanco, I., Del-Pulgar, C. J. P., Mariani, A., Tortora, G. & Reina, A. J. A surgical dataset from the da
Vinci research kit for task automation and recognition. In _2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)_ 1–6 (IEEE,
2023). * Goodman, E. D. et al. A real-time spatiotemporal AI model analyzes skill in open surgical videos. Preprint at https://arxiv.org/abs/2112.07219 (2021). * Yuan, K. et al. Learning
multi-modal representations by watching hundreds of surgical video lectures. Preprint at https://arxiv.org/abs/2307.15220 (2023). * Schmidgall, S., Cho, J., Zakka, C. & Hiesinger, W.
GP-VLS: a general-purpose vision language model for surgery. Preprint at https://arxiv.org/abs/2407.19305 (2024). * Kim, H.-S., Kim, D.-J. & Yoon, K.-H. Medical big data is not yet
available: why we need realism rather than exaggeration. _Endocrinol. Metab._ 34, 349–354 (2019). Google Scholar * Gabelica, M., Bojčić, R. & Puljak, L. Many researchers were not
compliant with their published data sharing statement: a mixed-methods study. _J. Clin. Epidemiol._ 150, 33–41 (2022). Google Scholar * Hamilton, D. G. et al. Prevalence and predictors of
data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. _BMJ_ 382, e075767 (2023). * Lin, J. et al. Automatic analysis
of available source code of top artificial intelligence conference papers. _Int. J. Softw. Eng. Knowl. Eng._ 32, 947–970 (2022). Google Scholar * Rives, A. et al. Biological structure and
function emerge from scaling unsupervised learning to 250 million protein sequences. _Proc. Natl Acad. Sci. USA_ 118, e2016239118 (2021). Google Scholar * Jumper, J. et al. Highly accurate
protein structure prediction with AlphaFold. _Nature_ 596, 583–589 (2021). Google Scholar * Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie, W. Towards generalist foundation model for
radiology. Preprint at https://arxiv.org/abs/2308.02463 (2023). * Wang, D. et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification. _Sci.
Data_ 10, 574 (2023). Google Scholar * Hsu, L. G. et al. Nonsurgical factors that influence the outcome of bariatric surgery: a review. _Psychosom. Med._ 60, 338–346 (1998). Google Scholar
* Benoist, S., Panis, Y., Alves, A. & Valleur, P. Impact of obesity on surgical outcomes after colorectal resection. _Am. J. Surg._ 179, 275–281 (2000). Google Scholar * Rosenberger,
P. H., Jokl, P. & Ickovics, J. Psychosocial factors and surgical outcomes: an evidence-based literature review. _J. Am. Acad. Orthop. Surg._ 14, 397–405 (2006). Google Scholar * Lam, K.
et al. Machine learning for technical skill assessment in surgery: a systematic review. _npj Digit. Med._ 5, 24 (2022). Google Scholar * Khalid, S., Goldenberg, M., Grantcharov, T., Taati,
B. & Rudzicz, F. Evaluation of deep learning models for identifying surgical actions and measuring performance. _JAMA Netw. Open_ 3, e201664–e201664 (2020). Google Scholar * Haque, T.
F. et al. An assessment tool to provide targeted feedback to robotic surgical trainees: development and validation of the end-to-end assessment of suturing expertise (EASE). _Urol. Pract._
9, 532–539 (2022). Google Scholar * Moon, M. R. Early-and late-career surgeon deficiencies in complex cases. _J. Thorac. Cardiovasc. Surg._ 164, 1023–1025 (2022). Google Scholar *
O’Sullivan, S. et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. _Int. J. Med. Robot. Comput.
Assist. Surg._ 15, e1968 (2019). Google Scholar * Van Norman, G. A. Drugs, devices, and the FDA: part 2: an overview of approval processes: FDA approval of medical devices. _JACC Basic
Transl. Sci._ 1, 277–287 (2016). Google Scholar * Kim, J. W. et al. Surgical robot transformer (SRT): imitation learning for surgical tasks. In _Conference on Robot Learning_ (PMLR, 2024).
* Beasley, R. A. Medical robots: current systems and research directions. _J. Robot._ 2012, 401613 (2012). Google Scholar * Lee, C. et al. A grip force model for the da Vinci end-effector
to predict a compensation force. _Med. Biol. Eng. Comput._ 5, 253–261 (2015). Google Scholar Download references ACKNOWLEDGEMENTS This material is based on work supported by the National
Science Foundation under grant numbers DGE 2139757, NSF/FRR 2144348, NIH R56EB033807 and ARPA-H AY1AX000023. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Electrical and
Computer Engineering, Johns Hopkins University, Baltimore, MD, USA Samuel Schmidgall * Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA Ji Woong Kim &
Axel Krieger * Kahlert School of Computing, University of Utah, Salt Lake City, UT, USA Alan Kuntz * Department of Urology, Johns Hopkins Medical Institute, Baltimore, MD, USA Ahmed Ezzat
Ghazi Authors * Samuel Schmidgall View author publications You can also search for this author inPubMed Google Scholar * Ji Woong Kim View author publications You can also search for this
author inPubMed Google Scholar * Alan Kuntz View author publications You can also search for this author inPubMed Google Scholar * Ahmed Ezzat Ghazi View author publications You can also
search for this author inPubMed Google Scholar * Axel Krieger View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to
Samuel Schmidgall. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Machine Intelligence_ thanks Francesco
Stella and Zhen Li for their contribution to the peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Schmidgall, S., Kim, J.W., Kuntz, A. _et al._
General-purpose foundation models for increased autonomy in robot-assisted surgery. _Nat Mach Intell_ 6, 1275–1283 (2024). https://doi.org/10.1038/s42256-024-00917-4 Download citation *
Received: 22 December 2023 * Accepted: 24 September 2024 * Published: 01 November 2024 * Issue Date: November 2024 * DOI: https://doi.org/10.1038/s42256-024-00917-4 SHARE THIS ARTICLE Anyone
you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by
the Springer Nature SharedIt content-sharing initiative