Alterkawi, Laila, Migliavacca, Matteo (2019) Parallelism and partitioning in large-scale GAs using spark. In: GECCO '19 Proceedings of the Genetic and Evolutionary Computation Conference. . pp. 736-744. ACM, New York ISBN 978-1-4503-6111-8. (doi:10.1145/3321707.3321775) (KAR id:75345)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/661kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: https://dx.doi.org/10.1145/3321707.3321775 |
Abstract
Big Data promises new scientific discovery and economic value. Genetic algorithms (GAs) have proven their flexibility in many application areas and substantial research effort has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts we reject approaches that are based on the centralisation of data in the main memory of a single node or that require remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines.
In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models; we compare their performance in large-scale classification problems. We implement two distributed versions of Bio-HEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. In contrast to existing GA based on MapReduce, Spark allows a more efficient implementation of parallel GAs thanks to its simple, efficient iterative processing of partitioned datasets.
We study the accuracy, efficiency and scalability of the proposed models. Our results show that PDMS provides the same accuracy of traditional BioHEL and exhibit good scalability up to 64 cores, while PDMD provides substantial reduction of execution time at a minor loss of accuracy.
Item Type: | Conference or workshop item (Proceeding) |
---|---|
DOI/Identification number: | 10.1145/3321707.3321775 |
Uncontrolled keywords: | Genetic Algorithms, Big Data, Spark, Distributed Learning Classifier System, Distributed Data Mining |
Subjects: | Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76 Computer software |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Matteo Migliavacca |
Date Deposited: | 15 Jul 2019 11:27 UTC |
Last Modified: | 05 Nov 2024 12:38 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/75345 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):