Skip to main content

Automatic Information Extraction from Electronic Documents using Machine Learning

Kamaleson, Nishanthan, Chu, Dominique, Otero, Fernando E.B. (2021) Automatic Information Extraction from Electronic Documents using Machine Learning. In: Lecture Notes in Computer Science. 41st SGAI International Conference on Artificial Intelligence, AI 2021, Cambridge, UK, December 14–16, 2021, Proceedings. 13101. pp. 183-194. Springer ISBN 978-3-030-91099-0. E-ISBN 978-3-030-91100-3. (doi:10.1007/978-3-030-91100-3_16) (KAR id:91696)

PDF Author's Accepted Manuscript
Language: English


Download (241kB) Preview
[thumbnail of Kamaleson_SGAI2021_preprint.pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL
https://doi.org/10.1007/978-3-030-91100-3_16

Abstract

The digital processing of electronic documents is widely exploited across many domains to improve the efficiency of information extraction. However, paper documents are still largely being used in practice. In order to process such documents, a manual procedure is used to inspect them and extract the values of interest. As this task is monotonous and time consuming, it is prone to introduce human errors during the process. In this paper, we present an efficient and robust system that automates the aforementioned task by using a combination of machine learning techniques: optical character recognition, object detection and image processing techniques. This not only speeds up the process but also improves the accuracy of extracted information compared to a manual procedure.

Item Type: Conference or workshop item (Proceeding)
DOI/Identification number: 10.1007/978-3-030-91100-3_16
Uncontrolled keywords: OCR, Layout analysis, Image detection, Information extraction
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Fernando Otero
Date Deposited: 23 Nov 2021 11:00 UTC
Last Modified: 10 Feb 2022 11:44 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/91696 (The current URI for this page, for reference purposes)
Chu, Dominique: https://orcid.org/0000-0002-3706-2905
Otero, Fernando E.B.: https://orcid.org/0000-0003-2172-297X
  • Depositors only (login required):

Downloads

Downloads per month over past year