UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

Huang, Y; Zhang, Z; Che, J; Yang, Z; Yang, Q; Wong, KK; (2023) Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems. Science China Information Sciences , 66 (6) , Article 162304. 10.1007/s11432-022-3542-6. Green open access

[thumbnail of SCIS_Self_Attention_Reinforcement_learning_Framework_for_Beam__Combining_in_Millimeter_Wave_MIMO_System.pdf]
Preview
Text
SCIS_Self_Attention_Reinforcement_learning_Framework_for_Beam__Combining_in_Millimeter_Wave_MIMO_System.pdf - Accepted Version

Download (7MB) | Preview

Abstract

Machine learning (ML) has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning (RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently. In this paper, we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave (mmWave) three-dimensional (3D) multi-input multi-output (MIMO) communication systems. The proposed scheme does not require perfect channel state information (CSI) or precise user location information which both are generally difficult to obtain in practice, and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users, multiple paths, and multiple 3D beams. In particular, a self-attention deep deterministic policy gradient (DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively. We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user. To this end, the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely, which leads to a near-optimal beamformer design. Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.

Type: Article
Title: Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11432-022-3542-6
Publisher version: https://doi.org/10.1007/s11432-022-3542-6
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: reinforcement learning (RL), deep deterministic policy gradient (DDPG), self-attention, pre-coding/combining, millimeter-wave (mmWave), multi-input multi-output (MIMO)
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10171234
Downloads since deposit
55Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item