Skip to main content
Kent Academic Repository

Improving the hierarchical classification of protein functions With swarm intelligence

Holden, Nicholas (2008) Improving the hierarchical classification of protein functions With swarm intelligence. Doctor of Philosophy (PhD) thesis, University of Kent. (doi:10.22024/UniKent/01.02.94422) (KAR id:94422)

PDF (Optical Character Recognition (OCR) of this thesis enables read aloud functionality of the text.)
Language: English

Download this file
[thumbnail of Optical Character Recognition (OCR) of this thesis enables read aloud functionality of the text.]
Official URL:


This thesis investigates methods to improve the performance of hierarchical classification. In terms of this thesis hierarchical classification is a form of supervised learning, where the classes in a data set are arranged in a tree structure. As a base for our new methods we use the TDDC (top-down divide-and-conquer) approach for hierarchical classification, where each classifier is built only to discriminate between sibling classes.

Firstly, we propose a swarm intelligence technique which varies the types of classifiers used at each divide within the TDDC tree. Our technique, PSO/ACO-CS (Particle Swarm Optimisation/Ant Colony Optimisation Classifier Selection), finds combinations of classifiers to be used in the TDDC tree using the global search ability of PSO/ACO.

Secondly, we propose a technique that attempts to mitigate a major drawback of the TDDC approach. The drawback is that if at any point in the TDDC tree an example is misclassified it can never be correctly classified further down the TDDC tree. Our approach, PSO/ACO-RO (PSO/ACO-Recovery Optimisation) decides whether to redirect examples at a given classifier node using, again, the global search ability of PSO/ACO.

Thirdly, we propose an ensemble based technique, HEHRS (Hierarchical Ensembles of Hierarchical Rule Sets), which attempts to boost the accuracy at each classifier node in the TDDC tree by using information from classifiers (rule sets) in the rest of that tree. We use Particle Swarm Optimisation to weight the individual rules within each ensemble.

We evaluate these three new methods in hierarchical bioinformatics datasets that we have created for this research. These data sets represent the real world problem of protein function prediction.

We find through extensive experimentation that the three proposed methods improve upon the baseline TDDC method to varying degrees. Overall the HEHRS and PSO/ACO- CS-RO approaches are most effective, although they are associated with a higher computational cost.

Item Type: Thesis (Doctor of Philosophy (PhD))
DOI/Identification number: 10.22024/UniKent/01.02.94422
Additional information: This thesis has been digitised by EThOS, the British Library digitisation service, for purposes of preservation and dissemination. It was uploaded to KAR on 25 April 2022 in order to hold its content and record within University of Kent systems. It is available Open Access using a Creative Commons Attribution, Non-commercial, No Derivatives ( licence so that the thesis and its author, can benefit from opportunities for increased readership and citation. This was done in line with University of Kent policies ( If you feel that your rights are compromised by open access to this thesis, or if you would like more information about its availability, please contact us at and we will seriously consider your claim under the terms of our Take-Down Policy (
Subjects: Q Science > QA Mathematics (inc Computing science)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
SWORD Depositor: SWORD Copy
Depositing User: SWORD Copy
Date Deposited: 25 Nov 2022 16:13 UTC
Last Modified: 25 Nov 2022 16:13 UTC
Resource URI: (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.