Please use this identifier to cite or link to this item:
https://hdl.handle.net/11147/14206
Title: | Quote Detection: A New Task and Dataset for NLP | Authors: | Tekir, S. Güzel, A. Tenekeci, S. Haman, B.U. |
Keywords: | Computational linguistics Natural language processing systems Auto-regressive Extractive summarizations Fine tuning Gain insight News summarization Performance Qualitative analysis Random fields Sequence models Random processes |
Publisher: | Association for Computational Linguistics | Abstract: | Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics. | Description: | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793 | URI: | https://hdl.handle.net/11147/14206 | ISBN: | 9781959429548 |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
Show full item record
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.