Sayed, Mohamed;
(2023)
Getting the Most Out of Casually Captured Video.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
FinalThesis.pdf - Accepted Version Download (144MB) | Preview |
Abstract
When capturing images and video with a camera, there are many ways the capture could be ruined. The camera may be in the wrong place, the images may be blurry, or the subjects of interest could be out of frame. Not only do these errors result in footage of low aesthetic quality, but computers may not understand the world when looking at it through subpar glasses. While the camera operator shares responsibility for these errors, they are often involuntary. In this thesis, we aim to make the most of casual capture. Specifically, we explore relative camera motion's impact on three computer vision tasks: object detection, active tracking, and reconstruction. We address them through a combination of proposed analysis, awareness, and control of relative camera pose. First we tackle the problem of maintaining object detection accuracy under egomotion induced blur. We explore the space of motion blur kernels brought on by different camera motions, and create specialized networks for them. We provide new insight into the effect of spatial label modifications under blur augmentation, and provide our own state-of-the-art solution for the problem. We then create the LookOut system that actively orients a gimbaled camera to frame actors for cinematographic filmmaking. In LookOut, we develop robust tracking that's aware of occlusions brought on by the relative motion of actors and the camera, followed by innovations in aesthetically pleasing camera pose control, given noisy sensing and user direction. Finally, casual capture lacks diverse views, which accurate 3D reconstruction requires. For that, we teach a depth estimation model to selectively attend to the most informative views of those available in casual video. We make the insight that relative-pose-informed visual feature matching allows for better depth estimation in plane-sweep-stereo cost volumes. We combine this with improved cost volume regularization to build a new state-of-the-art model, SimpleRecon, for accurate depth estimation and mesh reconstruction.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Getting the Most Out of Casually Captured Video |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Computer Vision, Machine Learning, Robust Vision, Deep Learning, Interactive System |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10183235 |
Archive Staff Only
![]() |
View Item |