Please use this identifier to cite or link to this item:
https://hdl.handle.net/11147/14206
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tekir, S. | - |
dc.contributor.author | Güzel, A. | - |
dc.contributor.author | Tenekeci, S. | - |
dc.contributor.author | Haman, B.U. | - |
dc.date.accessioned | 2024-01-06T07:22:37Z | - |
dc.date.available | 2024-01-06T07:22:37Z | - |
dc.date.issued | 2023 | - |
dc.identifier.isbn | 9781959429548 | - |
dc.identifier.uri | https://hdl.handle.net/11147/14206 | - |
dc.description | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793 | en_US |
dc.description.abstract | Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Association for Computational Linguistics | en_US |
dc.relation.ispartof | EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023 | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Computational linguistics | en_US |
dc.subject | Natural language processing systems | en_US |
dc.subject | Auto-regressive | en_US |
dc.subject | Extractive summarizations | en_US |
dc.subject | Fine tuning | en_US |
dc.subject | Gain insight | en_US |
dc.subject | News summarization | en_US |
dc.subject | Performance | en_US |
dc.subject | Qualitative analysis | en_US |
dc.subject | Random fields | en_US |
dc.subject | Sequence models | en_US |
dc.subject | Random processes | en_US |
dc.title | Quote Detection: a New Task and Dataset for Nlp | en_US |
dc.type | Conference Object | en_US |
dc.institutionauthor | … | - |
dc.department | İzmir Institute of Technology | en_US |
dc.identifier.startpage | 21 | en_US |
dc.identifier.endpage | 27 | en_US |
dc.identifier.scopus | 2-s2.0-85175428867 | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.authorscopusid | 16234844500 | - |
dc.authorscopusid | 58675151700 | - |
dc.authorscopusid | 57340107000 | - |
dc.authorscopusid | 58675886200 | - |
item.openairetype | Conference Object | - |
item.cerifentitytype | Publications | - |
item.fulltext | No Fulltext | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.grantfulltext | none | - |
item.languageiso639-1 | en | - |
crisitem.author.dept | 03.04. Department of Computer Engineering | - |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.