The pragmatics of clone detection and elimination

Thompson, Simon, Li, Huiqing, Schumacher, Andreas (2017) The pragmatics of clone detection and elimination. The Art, Science, and Engineering of Programming, 1 (2). ISSN 2473-7321. (doi:10.22152/


The occurrence of similar code, or ‘code clones’, can make program code difficult to read, modify and maintain. This paper describes industrial case studies of clone detection and elimination, and were were performed in collaboration with engineers from Ericsson AB using the refactoring and clone detection tool Wrangler for Erlang. We use the studies to illustrate the complex set of decisions that have to be taken when performing clone elimination in practice; we also discuss how the studies have informed the design of the tool. However, the conclusions we draw are largely language-independent, and set out the pragmatics of clone detection and elimination in real-world projects as well as design principles for clone detection decision-support tools. Context. The context of this work is the fact that a software tool is designed to be used; the success of such a tool therefore depends on its suitability and usability in practice. The work proceeds by observing the use of a tool in particular case studies in detail, through a “partici- pant observer” approach, and drawing qualitative conclusions from these studies, rather than collecting and analysing quantitative data from a larger set of applications. Our conclusions help not only programmers but also the designers of software tools. Inquiry. Data collected in this way make two kinds of contribution. First, they provide the basis for deriving a set of questions that typically need to be answered by engineers in the process of removing clones from an application, and a set of heuristics that can be used to help answer these questions. Secondly, they provide feedback on existing features of software tools, as well as suggesting new features to be added to the tools. Approach. The work was undertaken by the tool designers and engineers from Ericsson AB, working to- gether on clone elimination for code from the company. Knowledge. The work led to a number of conclusions, at different levels of generality. At the top level, there is overwhelming evidence that the process of clone elimination cannot be entirely automated, and needs to include the input of engineers familiar with the domain in question. Furthermore, there is strong evidence that the automated tools are sensitive to a set of parameters, which will differ for different applications and programming styles, and that individual clones can be over- and under-identified: again, involving those with knowledge of the code and the domain is key to successful application. Grounding. The work is grounded in “participant observation” by the tool builders, who made detailed logs of the processes undertaken by the group. Importance. The work gives guidelines that assist an engineer in using clone detection and elimination in practice, as well as helping a tool developer to shape their tool building. Although the work was in the context of a particular tool and programming language, the authors would argue that the high-level knowledge gained applies equally well to other notions of clone, as well as other tools and programming languages.

Item Type: Article
DOI/Identification number: 10.22152/
Projects: Projects 215868 not found.
Uncontrolled keywords: similar code, clone detection, clone elimination, clone removal, refactoring, testing, program comprehension, generalisation, strategy, Erlang, Wrangler, ProTest project
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Faculties > Sciences > School of Computing > Programming Languages and Systems Group
Depositing User: Simon Thompson
Date Deposited: 03 Apr 2017 06:16 UTC
Last Modified: 09 Jul 2019 11:25 UTC
Resource URI: (The current URI for this page, for reference purposes)
  • Depositors only (login required):


Downloads per month over past year