Kavanagh, James, Greenhow, Keith, Jordanous, Anna (2023) Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews. In: 2023 IEEE 17th International Conference on Semantic Computing. . pp. 235-238. IEEE (doi:10.1109/ICSC56153.2023.00046) (KAR id:99390)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/229kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: https://doi.org/10.1109/ICSC56153.2023.00046 |
Abstract
With many options for text preprocessing techniques, choosing the most efficient methodology is important for both accuracy and computational expense. Online text often contains non-standard English, spelling errors, colloquialisms, emojis, slang and many other variations that affect current natural language processing tools, with no clear guidelines for preprocessing this type of text. In this work we analyse text preprocessing techniques using a data set of online reviews scraped from iTunes and Google Play store. The objective is to measure the efficacy of different combinations of these techniques to maximise the amount of detected sentiment in a dataset of 438,157 reviews. Sentiment detection was performed by two state-of-the-art sentiment analysers (RoBERTa and VADER). Statistical analysis of the results suggest preprocessing strategies for maximising sentiment detected within mental health app reviews and similar text formats.
Item Type: | Conference or workshop item (Paper) |
---|---|
DOI/Identification number: | 10.1109/ICSC56153.2023.00046 |
Additional information: | © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Uncontrolled keywords: | NLP, Language parsing and under- standing, Web text analysis, Sentiment analysis |
Subjects: |
P Language and Literature > P Philology. Linguistics > P87 Communication. Mass Media Q Science > Q Science (General) > Q335 Artificial intelligence Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76 Computer software > QA76.76.I59 Interactive media, hypermedia Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.9.H85 Human computer interaction Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4045 Electronic information resources |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Anna Jordanous |
Date Deposited: | 03 Jan 2023 19:14 UTC |
Last Modified: | 21 Sep 2023 09:00 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/99390 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):