Selecting textual analysis tools to classify sustainability information in corporate reporting

Maibaum, Frederik; Kriebel, Johannes; Foege, Johann Nils


Abstract

Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

Keywords
Sustainability; Natural language processing; Corporate reporting; Performance evaluation; ChatGPT



Publication type
Research article (journal)

Peer reviewed
Yes

Publication status
Published

Year
2024

Journal
Decision Support Systems

Volume
(in press)

Language
English

ISSN
0167-9236

DOI

Full text