Automatically evolving rule induction algorithms with grammar-based genetic programming

Lobo Pappa, Gisele (2007) Automatically evolving rule induction algorithms with grammar-based genetic programming. Doctor of Philosophy (PhD) thesis, University of Kent. (doi:10.22024/UniKent/01.02.86357) (KAR id:86357)

PDF (445792.pdf) Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/32MB)
Official URL: https://doi.org/10.22024/UniKent/01.02.86357

Abstract

In the last 30 years, research in the field of rule induction algorithms produced a large number of algorithms. However, these algorithms are usually obtained from the combination of a basic rule induction algorithm (typically following the sequential covering approach) with new evaluation functions, pruning methods and stopping criteria for refining or producing rules, generating many "new" and more sophisticated sequential covering algorithms. We cannot deny that these attempts to improve the basic sequential covering approach have succeeded. Hence, if manually changing these major components of rule induction algorithms can result in new, significantly better ones, why not to automate this process to make it more cost-effective? This is the core idea of this work: to automate the process of designing rule induction algorithms by means of grammar-based genetic programming. Grammar-based Genetic Programming (GGP) is a special type of evolutionary algorithm used to automatically evolve computer programs. The most interesting feature of this type of algorithm is that it incorporates a grammar into its search mechanism, which expresses prior knowledge about the problem being solved. Since we have a lot of previous knowledge about how humans design rule induction algorithms, this type of algorithm is intuitively a suitable tool to automatically evolve rule induction algorithms. The grammar given to the proposed GGP system includes knowledge about how humans- design rule induction algorithms, and also presents some new elements which could work in rule induction algorithms, but to the best of our knowledge were not previously tested. The GG P system aims to evolve rule induction algorithms under two different frameworks, as follows. In the first framework, the GGP is used to evolve robust rule induction algorithms, i.e., algorithms which were designed to be applied to virtually any classification data set, like a manually-designed rule induction algorithm. In the second framework, the GGP is applied to evolve rule induction algorithms tailored to a specific application XVI domain, i.e., rule induction algorithms tailored to a single data set. Note that the latter framework is hardly feasible on a hard scale in the case of conventional, manually-designed algorithms, since the number of classification data sets greatly outnumbers the number of rule induction algorithms designers. However, it is clearly feasible on a large scale when using the proposed system, which automates the process of rule induction algorithm design and implementation. Overall, extensive computational experiments with 20 VCI data sets and 5 bioinformatics data sets showed that effective rule induction algorithms can be automatically generated using the GGP in both frameworks. Moreover, the automatically evolved rule induction algorithms were shown to be competitive with (and overall slightly better than) four well-known manually designed rule induction algorithms when comparing their predictive accuracies. The proposed GGP system was also compared to a grammar-based hillclimbing system, and experimental results showed that the GGP system is a more effective method to evolve rule induction algorithms than the grammar-based hillclimbing method. At last, a multi-objective version of the GGP (based on the concept of Pareto dominance) was also proposed, and experiments were performed to evolve robust rule induction algorithms which generate both accurate and simple models. The results showed that in most of the cases the GGP system can produce rule induction algorithms which are competitive in predictive accuracy to wellknown human-designed rule induction algorithms, but generate simpler classification modes - i.e., smaller rule sets, intuitively easier to be interpreted by the user.

Item Type:	Thesis (Doctor of Philosophy (PhD))
DOI/Identification number:	10.22024/UniKent/01.02.86357
Additional information:	This thesis has been digitised by EThOS, the British Library digitisation service, for purposes of preservation and dissemination. It was uploaded to KAR on 09 February 2021 in order to hold its content and record within University of Kent systems. It is available Open Access using a Creative Commons Attribution, Non-commercial, No Derivatives (https://creativecommons.org/licenses/by-nc-nd/4.0/) licence so that the thesis and its author, can benefit from opportunities for increased readership and citation. This was done in line with University of Kent policies (https://www.kent.ac.uk/is/strategy/docs/Kent%20Open%20Access%20policy.pdf). If you feel that your rights are compromised by open access to this thesis, or if you would like more information about its availability, please contact us at ResearchSupport@kent.ac.uk and we will seriously consider your claim under the terms of our Take-Down Policy (https://www.kent.ac.uk/is/regulations/library/kar-take-down-policy.html).
Uncontrolled keywords:	#ethos,
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
SWORD Depositor:	SWORD Copy
Depositing User:	SWORD Copy
Date Deposited:	29 Oct 2019 16:53 UTC
Last Modified:	15 Dec 2021 11:54 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/86357 (The current URI for this page, for reference purposes)

University of Kent Author Information

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.