Goes, Fabricio, Sawicki, Piotr, Grześ, Marek, Brown, Dan, Volpe, Marco (2023) Is GPT-4 good enough to evaluate jokes? In: International Conference for Computational Creativity. . , Waterloo, Canada (In press) (KAR id:101552)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/212kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader |
Abstract
In this paper, we investigate the ability of large language models (LLMs), specifically GPT-4, to assess the funniness of jokes in comparison to human ratings. We use a dataset of jokes annotated with human ratings and explore different system descriptions in GPT-4 to imitate human judges with various types of humour. We propose a novel method to create a system description using many-shot prompting, providing numerous examples of jokes and their evaluation scores. Additionally, we examine the performance of different system descriptions when given varying amounts of instructions and examples on how to evaluate jokes. Our main contributions include a new method for creating a system description in LLMs to evaluate jokes and a comprehensive methodology to assess LLMs' ability to evaluate jokes using rankings rather than individual scores.
Item Type: | Conference or workshop item (Poster) |
---|---|
Uncontrolled keywords: | Creativity; GPT-4; LLMs; NLP |
Subjects: | Q Science > Q Science (General) > Q335 Artificial intelligence |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Funders: | University of Kent (https://ror.org/00xkeyj56) |
Depositing User: | Piotr Sawicki |
Date Deposited: | 05 Jun 2023 17:28 UTC |
Last Modified: | 05 Nov 2024 13:07 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/101552 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):