Tension in big data using machine learning: Analysis and applications

Wang, Huamao, Yao, Yumei, Salhi, Said (2020) Tension in big data using machine learning: Analysis and applications. Technological Forecasting and Social Change, 158 . Article Number 120175. ISSN 0040-1625. (doi:10.1016/j.techfore.2020.120175) (KAR id:82537)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/693kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1016/j.techfore.2020.120175

Abstract

The access of machine learning techniques in popular programming languages and the exponentially expanding big data from social media, news, surveys, and markets provide exciting challenges and invaluable opportunities for organizations and individuals to explore implicit information for decision making. Nevertheless, the users of machine learning usually find that these sophisticated techniques could incur a high level of tensions caused by the selection of the appropriate size of the training data set among other factors. In this paper, we provide a systematic way of resolving such tensions by examining practical examples of predicting popularity and sentiment of posts on Twitter and Facebook, blogs on Mashable, news on Google and Yahoo, the US house survey, and Bitcoin prices. Interesting results show that for the case of big data, using around 20% of the full sample often leads to a better prediction accuracy than opting for the full sample. Our conclusion is found to be consistent across a series of experiments. The managerial implication is that using more is not necessarily the best and users need to be cautious about such an important sensitivity as the simplistic approach may easily lead to inferior solutions with potentially detrimental consequences.

Item Type:	Article
DOI/Identification number:	10.1016/j.techfore.2020.120175
Uncontrolled keywords:	Big data, Machine learning, Data size, Prediction accuracy, Social media
Subjects:	H Social Sciences
Institutional Unit:	Schools > Kent Business School
Former Institutional Unit:	Divisions > Kent Business School - Division > Department of Analytics, Operations and Systems
Depositing User:	Said Salhi
Date Deposited:	21 Aug 2020 10:23 UTC
Last Modified:	28 Apr 2026 09:13 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/82537 (The current URI for this page, for reference purposes)

University of Kent Author Information

Salhi, Said.

Creator's ORCID:	https://orcid.org/0000-0002-3384-5240
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.