Skip to main content
Kent Academic Repository

Artificial Immune Systems for Web Content Mining: Focusing on the Discovery of Interesting Information

Secker, Andrew D. (2006) Artificial Immune Systems for Web Content Mining: Focusing on the Discovery of Interesting Information. Doctor of Philosophy (PhD) thesis, University of Kent. (doi:10.22024/UniKent/01.02.14473) (KAR id:14473)

Abstract

This thesis explores the way in which biological metaphors can be applied to web content mining and, more specifically, the identification of interesting information in web documents. Web content mining is the use of content found on the web, most usually the text found on web pages, for data mining tasks such as classification. Due to the nature of the search domain, i.e. the web content is noisy and undergoing constant change, an adaptive system is required. The discovery of interesting information is an advance on basic text mining in that it aims to identify text that is novel, unexpected or surprising to a user, whilst still being relevant. This thesis investigates the use of Artificial Immune Systems (AIS) applied to discovery of interesting information as AIS are thought to confer the adaptability and learning required for this task. Two novel Artificial Immune Systems are described and tested. AISEC (Artificial Immune System for Interesting E-mail Classification) is a novel, immune inspired system for the classification of e-mail. It is shown that AISEC performs with a predictive accuracy comparable to a naïve Bayesian algorithm when continually classifying e-mail collected from a real user. This section contributes to the understanding of how AIS react in a continuous learning scenario. Following from the knowledge gained by testing AISEC, AISIID (Artificial Immune system for Interesting Information Discovery) is then described. A study involving the subjective evaluation of the results by users is undertaken and AISIID is seen to discover pages rated more interesting by users than a comparative system. The results of this study also reveal AISIID performs with subjective quality similar to the well known search engine, Google. This leads to a contribution regarding a better understanding of the user's perception of interestingness and possible inadequacies in the current understanding of interestingness regarding text documents.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Freitas, Alex A.
Thesis advisor: Timmis, Jon
DOI/Identification number: 10.22024/UniKent/01.02.14473
Additional information: This thesis has been digitised by EThOS, the British Library digitisation service, for purposes of preservation and dissemination. It was uploaded to KAR on 25 April 2022 in order to hold its content and record within University of Kent systems. It is available Open Access using a Creative Commons Attribution, Non-commercial, No Derivatives (https://creativecommons.org/licenses/by-nc-nd/4.0/) licence so that the thesis and its author, can benefit from opportunities for increased readership and citation. This was done in line with University of Kent policies (https://www.kent.ac.uk/is/strategy/docs/Kent%20Open%20Access%20policy.pdf). If you feel that your rights are compromised by open access to this thesis, or if you would like more information about its availability, please contact us at ResearchSupport@kent.ac.uk and we will seriously consider your claim under the terms of our Take-Down Policy (https://www.kent.ac.uk/is/regulations/library/kar-take-down-policy.html).
Uncontrolled keywords: artificial immune system, web mining, interesting information, AISEC, AISIID
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Mark Wheadon
Date Deposited: 24 Nov 2008 18:04 UTC
Last Modified: 23 May 2023 10:34 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/14473 (The current URI for this page, for reference purposes)

University of Kent Author Information

Secker, Andrew D..

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.