Romao, Wesley, Freitas, Alex A., Gimenes, Itana M. de S. (2004) Discovering interesting knowledge from a science & technology database with a genetic algorithm. Applied Soft Computing, 4 (2). pp. 121-137. ISSN 1568-4946. (doi:10.1016/j.asoc.2003.10.002) (KAR id:14173)
PDF
Language: English |
|
Download this file (PDF/319kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://dx.doi.org/10.1016/j.asoc.2003.10.002 |
Abstract
Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a genetic algorithm (GA) designed specifically to discover interesting fuzzy prediction rules. The GA searches for prediction rules that are interesting in the sense of being new and surprising for the user. This is done adapting a technique little exploited in the literature, which is based on user-defined general impressions (subjective knowledge). More precisely, a prediction rule is considered interesting (or surprising) to the extent that it represents knowledge that not only was previously unknown by the user but also contradicts his original believes. In addition, the use of fuzzy logic helps to improve the comprehensibility of the rules discovered by the GA. This is due to the use of linguistic terms that are natural for the user. A prototype was implemented and applied to a real-world science & technology database, containing data about the scientific production of researchers. The GA implemented in this prototype was evaluated by comparing it with the J4.8 algorithm, a variant of the well-known C4.5 algorithm. Experiments were carried out to evaluate both the predictive accuracy and the degree of interestingness (or surprisingness) of the rules discovered by both algorithms. The predictive accuracy obtained by the proposed GA was similar to the one obtained by J4.8, but the former, in general, discovered rules with fewer conditions. In addition it works with natural linguistic terms, which leads to the discovery of more comprehensible knowledge. The rules discovered by the proposed GA and the best rules discovered by J4.8 were shown to a user (a University Director) in an interview who evaluated the degree of interestingness (surprisingness) of the rules to him. In general the user considered the rules discovered by the GA much more interesting than the rules discovered by J4.8.
Item Type: | Article |
---|---|
DOI/Identification number: | 10.1016/j.asoc.2003.10.002 |
Uncontrolled keywords: | genetic algorithms, data mining, classification |
Subjects: | Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Mark Wheadon |
Date Deposited: | 24 Nov 2008 18:02 UTC |
Last Modified: | 05 Nov 2024 09:48 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/14173 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):