Skip to main content
Kent Academic Repository

Inferring causation from big data in the social sciences

Ghiara, Virginia (2019) Inferring causation from big data in the social sciences. Doctor of Philosophy (PhD) thesis, University of Kent,. (KAR id:75419)


The emergence of big data has become a central theme in scientific and philosophical discussions. A main tenor in the literature is that big data can drastically change the way in which causal studies are conducted. My thesis aims to explore how big data can be used to establish causal relationships in the social sciences. The beginning of the thesis will focus on data-driven studies and will investigate some of the limitations that characterise this type of study. This analysis will lead me to identify three key challenges of big data for causal studies in the social sciences. The first challenge is how to overcome the limitations of data-driven causal studies. This challenge is motivated by the observation that, regardless of how sophisticated they are, causal data-driven methods can suffer from bias. The second challenge is how to understand the role of ethnographic, qualitative data in causal studies based on big data. This challenge appears vital in the social sciences, where some researchers remain hesitant about the use of data-driven methods and try to defend the importance of qualitative, 'thick' data. The third challenge is how to use big data, in the social sciences, to obtain evidence of causality that goes beyond correlations. This challenge is strongly associated with the idea that, in order to establish causation, both the presence of a correlation between the cause and the effect, and the presence of a mechanism linking the cause and the effect need to be established. This idea, originally proposed by Russo and Williamson (2007) and known by the name of the Russo-Williamson thesis, will be discussed in detail to provide a solution to the first challenge. I will argue that researchers should comply with such a thesis to overcome the limitations of data-driven causal studies in the social sciences. Next, I shall examine the discussions on mixed methods research to claim that qualitative ethnographic data can be used both to collect evidence of social mechanisms, and to help researchers to obtain a comprehensive understanding of the phenomenon under study. Finally, I shall argue that big data can be used, in specific circumstances, to collect evidence of entities and activities constituting causal mechanisms, and that big data might be used to identify sociomarkers, the social version of biomarkers, to trace causal processes that evolve over time.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Williamson, Jon
Thesis advisor: Corfield, David
Uncontrolled keywords: Philosophy Causality Big data Mechanisms
Subjects: H Social Sciences > HA Statistics
Divisions: Divisions > Division of Arts and Humanities > School of Culture and Languages
Funders: Organisations -1 not found.
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 18 Jul 2019 12:10 UTC
Last Modified: 11 Dec 2022 17:46 UTC
Resource URI: (The current URI for this page, for reference purposes)

University of Kent Author Information

Ghiara, Virginia.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.