Liza, Farhana Ferdousi (2019) Improving Training of Deep Neural Network Sequence Models. Doctor of Philosophy (PhD) thesis, University of Kent,. (KAR id:81637)
PDF
Language: English |
|
Download this file (PDF/3MB) |
Preview |
Abstract
Sequence models, in particular, language models are fundamental building blocks of downstream applications including speech recognition, speech synthesis, information retrieval, machine translation, and question answering systems. Neural network language models are effective in generalising (i.e. perform efficiently with the data sparsity problem) compared to traditional N-grams models. However, neural network language models have several fundamental problems - the training of neural network language models is computationally inefficient and analysing the trained models is difficult. In this thesis, improvement techniques to reduce the computational complexity and an extensive analysis of the learned models are presented.
To reduce the computational complexity we have focused on the main computational bottleneck of neural training which is the softmax operation. Among different softmax approximation techniques, Noise Contrastive Estimation (NCE) is seen as a method that often does not work well with deep neural models for language modelling. A thorough investigation was done to find out the appropriate and novel integration mechanism of NCE with deep neural networks. We have also explained why the proposed specific hyperparameter settings could have an impact on the integration.
Existing analysis techniques are not sufficient to explain the training and learned models. Established wisdom on learning theory cannot explain the generalisation of over-parametrised deep neural networks. Therefore, we have proposed methods and analysis techniques to understand the generalisation and explain the regularisation. Furthermore, we have explained the impact of the stacked layers in deep neural networks.
The presented techniques have made the neural language models more accurate and computationally efficient. The empirical analysis techniques have helped us understand the model learning and improved our understanding of the generalisation and regularisation. The conducted experiments were based on publicly available benchmark datasets and standard evaluation frameworks.
Item Type: | Thesis (Doctor of Philosophy (PhD)) |
---|---|
Thesis advisor: | Grześ, Marek |
Thesis advisor: | Freitas, Alex |
Uncontrolled keywords: | Deep Neural Networks, Deep Learning, Efficient Learning, Generalisation and Regularisation, Sequence Modelling, Language Modelling |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
SWORD Depositor: | System Moodle |
Depositing User: | System Moodle |
Date Deposited: | 10 Jun 2020 12:10 UTC |
Last Modified: | 05 Nov 2024 12:47 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/81637 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):