Parsonson, Christopher W. F.;
(2023)
Computer Network Optimisation with Artificial Intelligence and Optics.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
cwfp_phd_thesis.pdf - Accepted Version Download (20MB) | Preview |
Abstract
The last decade has seen a proliferation in data-intensive compute applications such as artificial intelligence (AI), genome sequencing, and the internet-of-things. The ever-growing throughput demand of these big-data jobs has coincided with a slow down in the development of powerful computer chips. Consequently, there has been a shift away from local computation with general-purpose CPUs towards remote pooling of specialised high-bandwidth processors in cloud data centres (DCs) and high-performance compute (HPC) clusters. Such computation relies on a computer network to facilitate data querying and parallel processing. The traditional Moore’s Law approach of evaluating compute power and cost purely in terms of individual end points is therefore no longer appropriate. Instead, compute must now be thought of as a system of interconnected resources which can be orchestrated to perform a task. However, there has been a lack of development in next-generation computer networks, leading to the performance bottleneck of these systems moving away from the end point processors themselves and into the network connecting them. Optical networking is a technology which can offer orders-of-magnitude improvement in computer network performance. For optical networks to be widely used in DCs and HPCs, several obstacles related to physical optical device characteristics and resource management must be overcome. In this thesis, we develop and evaluate novel AI approaches for addressing these challenges. The first part of the thesis looks at optimising the physical plane’s devices in an optical computer network. Concretely, three gradient-free AI signal control approaches (ant colony optimisation, a genetic algorithm, and particle swarm optimisation) are proposed to enable high-bandwidth, low-power optical switching technologies to operate on the sub-nanosecond timescales required to realise an optical circuit switched data centre network. The second part of the thesis considers the problem of optimising the orches- tration plane’s resource management methods used to control optical computer networks. A novel algorithm, retro branching, is proposed to improve the solve time performance of the canonical branch-and-bound exact solver using a graph neural network (GNN) trained with reinforcement learning (RL). State-of-the-art RL-for-branching results are achieved, opening the possibility for branch-and- bound to be applied to large NP-hard discrete optimisation problems such as those found in computer network resource management. We also propose another algorithm, PAC-ML (partitioning for asynchronous computing with machine learning), which trains a GNN with RL to automatically decide how much to distribute deep learning jobs in an optical HPC architecture in order to meet user-defined run time requirements, minimise the blocking rate, and maximise system throughput under dynamic scenarios; the first of its kind to consider such a problem setting. So far we have have considered optimising the devices in the physical plane and the resource managers in the orchestration plane of the computer network. These areas have both received research attention in prior works. However, what has not received much consideration is the underlying test bed in which physical and orchestration plane research and optimisation is typically conducted. Real DC and HPC environments are generally not available for research due to their proprietary nature and expensive cost of deployment. Consequently, researchers rely on simulated computer networks during novel system development. The fidelity, reproducibility, and flexibility of these simulations is therefore at least as important as the development and optimisation of the physical and orchestration systems for which they are used. Poor simulations will lead to the misguided development of network systems which do not perform as expected when deployed in real production environments. With this motivation, the third part of this thesis considers how to design and optimise the simulator used for computer network system research and development. A novel open source traffic generation framework and library, TrafPy, is presented, as well as a subsequent update to the generation algorithm to make it scalable to computer networks with thousands of nodes.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Computer Network Optimisation with Artificial Intelligence and Optics |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10178675 |
Archive Staff Only
![]() |
View Item |