Automatic Quote Detection From Literary Work

Güzel Altıntaş, Aybüke

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/13444

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Tekir, Selma	-
dc.contributor.author	Güzel Altıntaş, Aybüke	-
dc.date.accessioned	2023-04-28T08:57:38Z	-
dc.date.available	2023-04-28T08:57:38Z	-
dc.date.issued	2022-12	-
dc.identifier.uri	https://hdl.handle.net/11147/13444	-
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=r4I1HnmXxFQovUpyAyUmxD_By-wyuZ5iIR4eAYzKEKDF-XbBZ7f_cEs8cjN_jZcf	-
dc.description	Thesis (Master)--İzmir Institute of Technology, Computer Engineering, Izmir, 2022	en_US
dc.description	Includes bibliographical references (leaves. 44-46)	en_US
dc.description	Text in English; Abstract: Turkish and English	en_US
dc.description.abstract	Literature inspires readers, and readers tend to share quotes from a literary work. The reader underlines the quotes in the book and shares them on social media, or on an online platform used by book readers. The definition of a quote is a span in a written text that is interesting for many readers and readers can use the quote in different contexts. In this study, a novel task in the field of Natural Language Processing is proposed: the Quote Detection Task. Also, an original dataset was formed from the Goodreads and Gutenberg websites with web scraping. Quotes are Goodreads data sourced from Kaggle and data that has been voted by 10 or more users are selected. These quotes have been validated with the books on the Project Gutenberg website. The final dataset consists of 4554 rows. The dataset contains quotes with their book spans. The span of a quote consists of the previous 10 sentences of the quote, the quote itself, and the following 10 sentences of the quote. Conditional Random Field (CRF) and Extractive Summarization as Text Matching (MatchSum) were run as two different baselines for quote detection. The Quote Detection Task is span detection that can be modeled with sequence labeling solutions and Neural extractive summarization systems in the literature. For this sequence tagging problem, the statistics-based CRF was run as first baseline. Extractive Summarization as Text Matching baseline is the second baseline chosen for the experimental part. Rouge-1 scores of 27.24% and 40.54%, respectively, were obtained from these baselines.	en_US
dc.description.abstract	Edebiyat, okuyuculara ilham verir ve okuyucular bir edebi eserdeki özlü sözleri paylaşma eğilimindedirler. Okuyucular bu bölümlerin altını çizer, sosyal medyada ya da kitap okuyucularının kullandığı çevrimiçi bir platformda paylaşır. Bu çalışmadaki alıntı kelimesinin tanımı, yazılı bir metinde birçok okuyucu için ilginç olan bir aralıktır ve okuyucular alıntıyı farklı bağlamlarda kullanabilir. Bu çalışmada, doğal dil işleme alanında alıntı tespit etme görevi önerilmektedir. Bu çalışmada ayrıca, web kazıma yolu ile Goodreads ve Gutenberg web sitelerinden özgün bir veri kümesi derlenmiştir. Alıntılar Kaggle web sitesinden elde edilmiş Goodreads verisidir ve minimum kullanıcı tarafından oylanmış olan veriler seçilmiştir. Bu quote'lar Project Gutenberg web sitesindeki kitaplar ile valide edilmiştir. Final veriseti 4554 satırdan oluşmaktadır. Oluşturulan veri kümesi, alıntı ve alıntıların geçtikleri bağlamları içermektedir. Bir alıntı, alıntıdan önceki 10 cümle, alıntının kendisi ve alıntıdan sonraki 10 cümleden oluşur. Koşullu Rasgele Alanlar (KRA) ve Metin Eşleştirme olarak Çıkarımsal Özet (MatchSum), alıntı çıkarımı için iki farklı dayanak (baseline) olarak çalıştırıldı. Alıntı çıkarma görevi, literatürdeki doğal dil işleme görevlerinden dizi etiketleme görevi altında değerlendirilebilir. Bu dizi etiketleme problemi için, istatistik tabanlı KRA ilk dayanak (baseline) olarak çalıştırılmıştır. Metin Eşleştirme olarak Çıkarımsal Özet dayanağı, bu çalışmanın deneysel kısmı için seçilen ikinci dayanaktır. Bu dayanaklardan sırasıyla %27,24 ve %40,54 Rouge-1 skorları elde edilmiştir.	en_US
dc.format.extent	ix, 60 leaves	-
dc.language.iso	en	en_US
dc.publisher	01. Izmir Institute of Technology	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Natural language processing	en_US
dc.subject	Quote detection	en_US
dc.subject	Span detection	en_US
dc.title	Automatic Quote Detection From Literary Work	en_US
dc.title.alternative	Edebi eserlerden otomatik söz tespiti	en_US
dc.type	Master Thesis	en_US
dc.authorid	0000-0002-8994-7000	-
dc.department	Thesis (Master)--İzmir Institute of Technology, Computer Engineering	en_US
dc.relation.publicationcategory	Tez	en_US
dc.identifier.wosquality	N/A	-
dc.identifier.scopusquality	N/A	-
dc.identifier.yoktezid	780227	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.openairetype	Master Thesis	-
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
item.cerifentitytype	Publications	-
Appears in Collections:	Master Degree / Yüksek Lisans Tezleri

Files in This Item:

File	Description	Size	Format
10515028.pdf		1.58 MB	Adobe PDF	View/Open

Show simple item record

CORE Recommender

Page view(s)

248

checked on Mar 31, 2025

Download(s)

110

checked on Mar 31, 2025

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM