Skip to main content

Corba: Crowdsourcing to Obtain Requirements from Regulations and Breaches

Guo, Hui, Kafalı, Özgur, Jeukeng, Anne-Liz, Williams, Laurie, Singh, Munindar P. (2019) Corba: Crowdsourcing to Obtain Requirements from Regulations and Breaches. Empirical Software Engineering, 25 . pp. 532-561. ISSN 1382-3256. (doi:10.1007/s10664-019-09753-2) (KAR id:75141)

PDF Author's Accepted Manuscript
Language: English
Download (624kB) Preview
[thumbnail of EMSE.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL


Context: Modern software systems are deployed in sociotechnical settings, combining social entities (humans and organizations) with technical entities (software and devices). In such settings, on top of technical controls that implement security features of software, regulations specify how users should behave in security-critical situations. No matter how carefully the software is designed and how well regulations are enforced, such systems are subject to breaches due to social (user misuse) and technical (vulnerabilities in software) factors. Breach reports, often legally mandated, describe what went wrong during a breach and how the breach was remedied. However, breach reports are not formally investigated in current practice, leading to valuable lessons being lost regarding past failures.

Objective: Our research aim is to aid security analysts and software developers in obtaining a set of legal, security, and privacy requirements, by developing a crowdsourcing methodology to extract knowledge from regulations and breach reports.

Method: We present Çorba, a methodology that leverages human intelligence via crowdsourcing, and extracts requirements from textual artifacts in the form of regulatory norms. We evaluate Çorba on the US healthcare regulations from the Health Insurance Portability and Accountability Act (HIPAA) and breach reports published by the US Department of Health and Human Services (HHS). Following this methodology, we have conducted a pilot and a final study on the Amazon Mechanical Turk crowdsourcing platform.

Results: Çorba yields high quality responses from crowd workers, which we analyze to identify requirements for the purpose of complementing HIPAA regulations. We publish a curated dataset of the worker responses and identified requirements.

Conclusions: The results show that the instructions and question formats presented to the crowd workers significantly affect the response quality regarding the identification of requirements. We have observed significant improvement from the pilot to the final study by revising the instructions and question formats. Other factors, such as worker types, breach types, or length of reports, do not have notable effect on the workers’ performance. Moreover, we discuss other potential improvements such as breach report restructuring and text highlighting with automated methods.

Item Type: Article
DOI/Identification number: 10.1007/s10664-019-09753-2
Uncontrolled keywords: Regulatory norms, Sociotechnical, systems HIPAA
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76.E95 Expert Systems (Intelligent Knowledge Based Systems)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ozgur Kafali
Date Deposited: 01 Jul 2019 07:50 UTC
Last Modified: 16 Feb 2021 14:05 UTC
Resource URI: (The current URI for this page, for reference purposes)
Kafalı, Özgur:
  • Depositors only (login required):


Downloads per month over past year