Seenivasan, L;
Islam, M;
Xu, M;
Lim, CM;
Ren, H;
(2023)
Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding.
International Journal of Computer Assisted Radiology and Surgery
, 18
pp. 921-928.
10.1007/s11548-022-02800-2.
Preview |
Text
2211.15327.pdf - Accepted Version Download (645kB) | Preview |
Abstract
PURPOSE: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments’ appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. METHODOLOGY: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian-based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. RESULTS: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. CONCLUSION:The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.
Type: | Article |
---|---|
Title: | Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding |
Location: | Germany |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1007/s11548-022-02800-2 |
Publisher version: | https://doi.org/10.1007/s11548-022-02800-2 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions. |
Keywords: | Curriculum learning, Domain generalization, Scene graph, Surgical scene understanding |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10164003 |
Archive Staff Only
![]() |
View Item |