Scaling Genetic Algorithms to Large Distributed Datasets

Alterkawi, Laila (2022) Scaling Genetic Algorithms to Large Distributed Datasets. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.97569) (KAR id:97569)

PDF Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/2MB)	Preview
Official URL: https://doi.org/10.22024/UniKent/01.02.97569

Related resource
Resource title:	Parallelism and partitioning in large-scale GAs using spark
Resource type:	Publication
DOI:	10.1145/3321707.3321775
KDR/KAR URL:
External URL:	https://doi.org/10.1145/3321707.3321775

Abstract

Analysing large-scale data brings promises of new levels of scientific discovery and economic value. However, the fact that such volume of data is by its nature distributed and the need for new computational methods to be effective in the face of significant changes in data complexity and size has led to the need to develop large-scale data analytics. Genetic algorithms (GAs) have proven their flexibility in many application areas, and substantial research has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts, we reject approaches based on the centralisation of data in the main memory of a single node or requiring remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines.

In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models. We adopt the two models to distribute BioHEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. We investigate the effect of GA control parameters (population size and migration frequency).We study the accuracy, time performance and scalability of the

proposed models. Our results show that our distributed genetic

algorithm design provides a good tradeoff between accuracy and time.

We then extend the two models using automatic termination and population sizing to enhance the distributed genetic algorithm ease-of-use. Moreover, after testing this strategy on both models, we show that the applied automation offers a promising enhancement on the performance of the initially designed GA models.

Item Type:	Thesis (Doctor of Philosophy (PhD))
Thesis advisor:	Migliavacca, Matteo
DOI/Identification number:	10.22024/UniKent/01.02.97569
Uncontrolled keywords:	Genetic Algorithms, Data partitioning, Spark, Classification
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
SWORD Depositor:	System Moodle
Depositing User:	System Moodle
Date Deposited:	24 Oct 2022 14:10 UTC
Last Modified:	20 May 2025 10:27 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/97569 (The current URI for this page, for reference purposes)

University of Kent Author Information

Alterkawi, Laila.

Creator's ORCID:
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.