Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14206
Title: Quote Detection: A New Task and Dataset for NLP
Authors: Tekir, S.
Güzel, A.
Tenekeci, S.
Haman, B.U.
Keywords: Computational linguistics
Natural language processing systems
Auto-regressive
Extractive summarizations
Fine tuning
Gain insight
News summarization
Performance
Qualitative analysis
Random fields
Sequence models
Random processes
Publisher: Association for Computational Linguistics
Abstract: Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics.
Description: 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793
URI: https://hdl.handle.net/11147/14206
ISBN: 9781959429548
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Show full item record



CORE Recommender

Page view(s)

18
checked on Apr 22, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.