Skip to main content

Data mining methods for the prediction of intestinal absorption using QSAR

Newby, Danielle Anne (2014) Data mining methods for the prediction of intestinal absorption using QSAR. Doctor of Philosophy (PhD) thesis, University of Kent, University of Greenwich. (KAR id:47600)

Language: English
Download (3MB)
XML Word Processing Document (DOCX) (C&RT Models in Chapter 7_DNewby)
Language: English
Download (636kB)
XML Word Processing Document (DOCX) (C&RT Models in Chapter 8_DNewby)
Language: English
Download (446kB)
XML Word Processing Document (DOCX) (C&RT Models in Chapter 10_DNewby)
Language: English
Download (298kB)
Microsoft Excel (Mol Descriptors selected by feature selection methods Chapter 8_DNewby)
Language: English
Download (14kB)
Microsoft Excel (References for datasets 2-4_DNewby)
Language: English
Download (248kB)
Microsoft Excel (Molecular descriptors for datasets 1-4)
Language: English
Download (13MB)


Oral administration is the most common route for administration of drugs. With the growing cost of drug discovery, the development of Quantitative Structure-Activity Relationships (QSAR) as computational methods to predict oral absorption is highly desirable for cost effective reasons. The aim of this research was to develop QSAR models that are highly accurate and interpretable for the prediction of oral absorption. In this investigation the problems addressed were datasets with unbalanced class distributions, feature selection and the effects of solubility and permeability towards oral absorption prediction. Firstly, oral absorption models were obtained by overcoming the problem of unbalanced class distributions in datasets using two techniques, under-sampling of compounds belonging to the majority class and the use of different misclassification costs for different types of misclassifications. Using these methods, models with higher accuracy were produced using regression and linear/non-linear classification techniques. Secondly, the use of several pre-processing feature selection methods in tandem with decision tree classification analysis – including misclassification costs – were found to produce models with better interpretability and higher predictive accuracy. These methods were successful to select the most important molecular descriptors and to overcome the problem of unbalanced classes. Thirdly, the roles of solubility and permeability in oral absorption were also investigated. This involved expansion of oral absorption datasets and collection of in vitro and aqueous solubility data. This work found that the inclusion of predicted and experimental solubility in permeability models can improve model accuracy. However, the impact of solubility on oral absorption prediction was not as influential as expected. Finally, predictive models of permeability and solubility were built to predict a provisional Biopharmaceutic Classification System (BCS) class using two multi-label classification techniques, binary relevance and classifier chain. The classifier chain method was shown to have higher predictive accuracy by using predicted solubility as a molecular descriptor for permeability models, and hence better final provisional BCS prediction. Overall, this research has resulted in predictive and interpretable models that could be useful in a drug discovery context.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Ghafourian, Taravat
Thesis advisor: Freitas, Alex A.
Uncontrolled keywords: Oral absorption, intestinal absorption, QSAR, data mining, classification, solubility, permeability, BCS
Subjects: R Medicine > RS Pharmacy and materia medica
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Faculties > Sciences > Medway School of Pharmacy
Depositing User: Users 1 not found.
Date Deposited: 10 Mar 2015 01:00 UTC
Last Modified: 04 Feb 2020 04:06 UTC
Resource URI: (The current URI for this page, for reference purposes)
  • Depositors only (login required):


Downloads per month over past year