Evaluating Scalable Distributed Erlang for Scalability and Reliability

Chechina, Natalia, MacKenzie, Kenneth, Thompson, Simon, Trinder, Phil, Boudeville, Olivier, Fordos, Viktoria, Hoch, csaba, Ghaffari, Amir, Moro Hernandez, Mario (2017) Evaluating Scalable Distributed Erlang for Scalability and Reliability. IEEE Transactions on Parallel and Distributed Systems, 28 (8). pp. 2244-2257. ISSN 1045-9219. (doi:10.1109/TPDS.2017.2654246) (KAR id:60077)

PDF (Evaluating Scalable Distributed Erlang for Scalability and Reliability) Author's Accepted Manuscript Language: English
Download this file (PDF/1MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://dx.doi.org/10.1109/TPDS.2017.2654246

Abstract

Large scale servers with hundreds of hosts and tens of thousands of cores are becoming common. To exploit these platforms software must be both scalable and reliable, and distributed actor languages like Erlang are a proven technology in this area. While distributed Erlang conceptually supports the engineering of large scale reliable systems, in practice it has some scalability limits that force developers to depart from the standard language mechanisms at scale. In earlier work we have explored these scalability limitations, and addressed them by providing a Scalable Distributed (SD) Erlang library that partitions the network of Erlang Virtual Machines (VMs) into scalable groups (s groups). This paper presents the first systematic evaluation of SD Erlang s groups and associated tools, and how they can be used. We present a comprehensive evaluation of the scalability and reliability of SD Erlang using three typical benchmarks and a case study. We demonstrate that s groups improve the scalability of reliable and unreliable Erlang applications on up to 256 hosts (6144 cores). We show that SD Erlang preserves the class-leading distributed Erlang reliability model, but scales far better than the standard model. We present a novel, systematic, and tool-supported approach for refactoring distributed Erlang applications into SD Erlang. We outline the new and improved monitoring, debugging and deployment tools for large scale SD Erlang applications. We demonstrate the scaling characteristics of key tools on systems comprising up to 10K Erlang VMs.

Item Type:	Article
DOI/Identification number:	10.1109/TPDS.2017.2654246
Projects:	RELEASE
Uncontrolled keywords:	Scalability, Servers, Software reliability, Benchmark testing, Monitoring, Reliability engineering Erlang, Scalability, Reliability, Actors
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76 Computer software
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Funders:	Commission européenne (https://ror.org/00k4n6c32)
Depositing User:	Simon Thompson
Date Deposited:	25 Jan 2017 15:13 UTC
Last Modified:	22 Jul 2025 08:57 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/60077 (The current URI for this page, for reference purposes)

University of Kent Author Information

Thompson, Simon.

Creator's ORCID:	https://orcid.org/0000-0002-2350-301X
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.