Skip to main content

Parallelism and partitioning in large-scale GAs using spark

Alterkawi, Laila, Migliavacca, Matteo (2019) Parallelism and partitioning in large-scale GAs using spark. In: GECCO '19 Proceedings of the Genetic and Evolutionary Computation Conference. . pp. 736-744. ACM, New York ISBN 978-1-4503-6111-8. (doi:10.1145/3321707.3321775)

PDF - Author's Accepted Manuscript
Download (716kB) Preview
[img]
Preview
Official URL
https://dx.doi.org/10.1145/3321707.3321775

Abstract

Big Data promises new scientific discovery and economic value. Genetic algorithms (GAs) have proven their flexibility in many application areas and substantial research effort has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts we reject approaches that are based on the centralisation of data in the main memory of a single node or that require remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines.

In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models; we compare their performance in large-scale classification problems. We implement two distributed versions of Bio-HEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. In contrast to existing GA based on MapReduce, Spark allows a more efficient implementation of parallel GAs thanks to its simple, efficient iterative processing of partitioned datasets.

We study the accuracy, efficiency and scalability of the proposed models. Our results show that PDMS provides the same accuracy of traditional BioHEL and exhibit good scalability up to 64 cores, while PDMD provides substantial reduction of execution time at a minor loss of accuracy.

Item Type: Conference or workshop item (Proceeding)
DOI/Identification number: 10.1145/3321707.3321775
Uncontrolled keywords: Genetic Algorithms, Big Data, Spark, Distributed Learning Classifier System, Distributed Data Mining
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76 Computer software
Divisions: Faculties > Sciences > School of Computing > Data Science
Depositing User: Matteo Migliavacca
Date Deposited: 15 Jul 2019 11:27 UTC
Last Modified: 24 Jan 2020 04:11 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/75345 (The current URI for this page, for reference purposes)
Migliavacca, Matteo: https://orcid.org/0000-0002-5684-4865
  • Depositors only (login required):

Downloads

Downloads per month over past year