Skip to main content

Interfacing knowledge discovery algorithms to large database management systems

Lavington, Simon H., Dewhurst, N., Wilkins, E., Freitas, Alex A. (1999) Interfacing knowledge discovery algorithms to large database management systems. Information and Software Technology, 41 (9). pp. 605-617. ISSN 0950-5849. (doi:10.1016/S0950-5849(99)00024-5) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:21713)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL
http://dx.doi.org/10.1016/S0950-5849(99)00024-5

Abstract

The efficient mining of large, commercially credible, databases requires a solution to at least two problems: (a) better integration between existing Knowledge Discovery algorithms and popular DBMS; (b) ability to exploit opportunities for computational speedup such as data parallelism. Both problems need to be addressed in a generic manner, since the stated requirements of end-users cover a range of data mining paradigms, DBMS, and (parallel) platforms. In this paper we present a family of generic, set-based, primitive operations for Knowledge Discovery in Databases (KDD). We show how a number of well-known KDD classification metrics, drawn from paradigms such as Bayesian classifiers, Rule-Induction/Decision Tree algorithms, Instance-Based Learning methods, and Genetic Programming, can all be computed via our generic primitives. We then show how these primitives may be mapped into SQL and, where appropriate, optimised for good performance in respect of practical factors such as client-server communication overheads. We demonstrate how our primitives can support C4.5, a widely-used rule induction system. Performance evaluation figures are presented for commercially available parallel platforms, such as the IBM SP/2.

Item Type: Article
DOI/Identification number: 10.1016/S0950-5849(99)00024-5
Uncontrolled keywords: data mining; parallelism; KDD primitives; decision trees; client-server
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Mark Wheadon
Date Deposited: 02 Sep 2009 12:09 UTC
Last Modified: 16 Feb 2021 12:32 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/21713 (The current URI for this page, for reference purposes)
Freitas, Alex A.: https://orcid.org/0000-0001-9825-4700
  • Depositors only (login required):