Alterkawi, Laila (2022) Scaling Genetic Algorithms to Large Distributed Datasets. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.97569) (KAR id:97569)
PDF
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
|
|
Download this file (PDF/2MB) |
Preview |
Official URL: https://doi.org/10.22024/UniKent/01.02.97569 |
Resource title: | Parallelism and partitioning in large-scale GAs using spark |
---|---|
Resource type: | Publication |
DOI: | 10.1145/3321707.3321775 |
KDR/KAR URL: | |
External URL: | https://doi.org/10.1145/3321707.3321775 |
Abstract
Analysing large-scale data brings promises of new levels of scientific discovery and economic value. However, the fact that such volume of data is by its nature distributed and the need for new computational methods to be effective in the face of significant changes in data complexity and size has led to the need to develop large-scale data analytics. Genetic algorithms (GAs) have proven their flexibility in many application areas, and substantial research has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts, we reject approaches based on the centralisation of data in the main memory of a single node or requiring remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines.
In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models. We adopt the two models to distribute BioHEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. We investigate the effect of GA control parameters (population size and migration frequency).We study the accuracy, time performance and scalability of the
proposed models. Our results show that our distributed genetic
algorithm design provides a good tradeoff between accuracy and time.
We then extend the two models using automatic termination and population sizing to enhance the distributed genetic algorithm ease-of-use. Moreover, after testing this strategy on both models, we show that the applied automation offers a promising enhancement on the performance of the initially designed GA models.
Item Type: | Thesis (Doctor of Philosophy (PhD)) |
---|---|
Thesis advisor: | Migliavacca, Matteo |
DOI/Identification number: | 10.22024/UniKent/01.02.97569 |
Uncontrolled keywords: | Genetic Algorithms, Data partitioning, Spark, Classification |
Subjects: | Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
SWORD Depositor: | System Moodle |
Depositing User: | System Moodle |
Date Deposited: | 24 Oct 2022 14:10 UTC |
Last Modified: | 03 Jul 2023 15:44 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/97569 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):