Skip to main content

Improving Training of Deep Neural Network Sequence Models

Liza, Farhana Ferdousi (2019) Improving Training of Deep Neural Network Sequence Models. Doctor of Philosophy (PhD) thesis, University of Kent,. (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:81637)

PDF
Language: English

Restricted to Repository staff only until August 2022.
Contact us about this Publication
[thumbnail of 214main_thesis_moodle_submission.pdf]

Abstract

Sequence models, in particular, language models are fundamental building blocks of downstream applications including speech recognition, speech synthesis, information retrieval, machine translation, and question answering systems. Neural network language models are effective in generalising (i.e. perform efficiently with the data sparsity problem) compared to traditional N-grams models. However, neural network language models have several fundamental problems - the training of neural network language models is computationally inefficient and analysing the trained models is difficult. In this thesis, improvement techniques to reduce the computational complexity and an extensive analysis of the learned models are presented.

To reduce the computational complexity we have focused on the main computational bottleneck of neural training which is the softmax operation. Among different softmax approximation techniques, Noise Contrastive Estimation (NCE) is seen as a method that often does not work well with deep neural models for language modelling. A thorough investigation was done to find out the appropriate and novel integration mechanism of NCE with deep neural networks. We have also explained why the proposed specific hyperparameter settings could have an impact on the integration.

Existing analysis techniques are not sufficient to explain the training and learned models. Established wisdom on learning theory cannot explain the generalisation of over-parametrised deep neural networks. Therefore, we have proposed methods and analysis techniques to understand the generalisation and explain the regularisation. Furthermore, we have explained the impact of the stacked layers in deep neural networks.

The presented techniques have made the neural language models more accurate and computationally efficient. The empirical analysis techniques have helped us understand the model learning and improved our understanding of the generalisation and regularisation. The conducted experiments were based on publicly available benchmark datasets and standard evaluation frameworks.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Grześ, Marek
Thesis advisor: Freitas, Alex
Uncontrolled keywords: Deep Neural Networks, Deep Learning, Efficient Learning, Generalisation and Regularisation, Sequence Modelling, Language Modelling
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 10 Jun 2020 12:10 UTC
Last Modified: 16 Feb 2021 14:13 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/81637 (The current URI for this page, for reference purposes)
  • Depositors only (login required):