Bennett Jr, F.G.;
(2008)
Reconstructing financial statements.
Presented at: DESI II: Second International Workshop on Supporting Search and Sensemaking for Electronically Stored Information in Discovery Proceedings, University College London, UK.
![]() Preview |
PDF
9137.pdf Download (426kB) |
Abstract
This paper introduces a tool for the reconstruction and validation of categorized totals embedded in untrusted and unformatted text, such as OCR scans of nancial statements. The tool is a spino of academic research into the funding of Japanese third-sector organizations, the annual reports of which are frequently published reports in the form of PDF les containing document images. A number of techniques at string- line- and document-level are used to resolve ambiguities and obtain the greatest possible recovery rate for the underlying data, while excluding the content of untrustworthy documents from the nal sample. In a preliminary trial \in the wild", the tool has returned validated income totals for 47.9% of the documents in a heterogeous set of 2205 annual reports.
Type: | Conference item (Presentation) |
---|---|
Title: | Reconstructing financial statements |
Event: | DESI II: Second International Workshop on Supporting Search and Sensemaking for Electronically Stored Information in Discovery Proceedings |
Location: | University College London, UK |
Dates: | June 25, 2008 |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/DESI... |
Language: | English |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/9137 |
Archive Staff Only
![]() |
View Item |