UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding

Seenivasan, L; Islam, M; Xu, M; Lim, CM; Ren, H; (2023) Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding. International Journal of Computer Assisted Radiology and Surgery , 18 pp. 921-928. 10.1007/s11548-022-02800-2. Green open access

[thumbnail of 2211.15327.pdf]
Preview
Text
2211.15327.pdf - Accepted Version

Download (645kB) | Preview

Abstract

PURPOSE: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments’ appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. METHODOLOGY: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian-based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. RESULTS: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. CONCLUSION:The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.

Type: Article
Title: Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding
Location: Germany
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11548-022-02800-2
Publisher version: https://doi.org/10.1007/s11548-022-02800-2
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: Curriculum learning, Domain generalization, Scene graph, Surgical scene understanding
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10164003
Downloads since deposit
40Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item