Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14206
Title: Quote Detection: a New Task and Dataset for Nlp
Authors: Tekir, S.
Güzel, A.
Tenekeci, S.
Haman, B.U.
Keywords: Computational linguistics
Natural language processing systems
Auto-regressive
Extractive summarizations
Fine tuning
Gain insight
News summarization
Performance
Qualitative analysis
Random fields
Sequence models
Random processes
Publisher: Association for Computational Linguistics
Abstract: Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics.
Description: 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793
URI: https://hdl.handle.net/11147/14206
ISBN: 9781959429548
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Show full item record



CORE Recommender

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.