Skip to main content

Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews

Kavanagh, James, Greenhow, Keith, Jordanous, Anna (2023) Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews. In: 2023 IEEE 17th International Conference on Semantic Computing. . pp. 235-238. IEEE (doi:10.1109/ICSC56153.2023.00046) (KAR id:99390)

PDF Author's Accepted Manuscript
Language: English
Click to download this file (229kB) Preview
[thumbnail of IEEE_ICSC_Kavanagh_Greenhow_Jordanous__camera_ready_ (1).pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL:
https://doi.org/10.1109/ICSC56153.2023.00046

Abstract

With many options for text preprocessing techniques, choosing the most efficient methodology is important for both accuracy and computational expense. Online text often contains non-standard English, spelling errors, colloquialisms, emojis, slang and many other variations that affect current natural language processing tools, with no clear guidelines for preprocessing this type of text. In this work we analyse text preprocessing techniques using a data set of online reviews scraped from iTunes and Google Play store. The objective is to measure the efficacy of different combinations of these techniques to maximise the amount of detected sentiment in a dataset of 438,157 reviews. Sentiment detection was performed by two state-of-the-art sentiment analysers (RoBERTa and VADER). Statistical analysis of the results suggest preprocessing strategies for maximising sentiment detected within mental health app reviews and similar text formats.

Item Type: Conference or workshop item (Paper)
DOI/Identification number: 10.1109/ICSC56153.2023.00046
Additional information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled keywords: NLP, Language parsing and under- standing, Web text analysis, Sentiment analysis
Subjects: P Language and Literature > P Philology. Linguistics > P87 Communication. Mass Media
Q Science > Q Science (General) > Q335 Artificial intelligence
Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76 Computer software > QA76.76.I59 Interactive media, hypermedia
Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.9.H85 Human computer interaction
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4045 Electronic information resources
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Anna Jordanous
Date Deposited: 03 Jan 2023 19:14 UTC
Last Modified: 21 Sep 2023 09:00 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/99390 (The current URI for this page, for reference purposes)
Kavanagh, James: https://orcid.org/0000-0001-7822-4969
Greenhow, Keith: https://orcid.org/0000-0001-9263-5086
Jordanous, Anna: https://orcid.org/0000-0003-2076-8642
  • Depositors only (login required):

Downloads

Downloads per month over past year