Uzun metin belgesi sınıflandırması için yerel dikkat haritalamalarını kullanan transformatörler

Haman, Bekir Ufuk

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14501

Title:	Uzun metin belgesi sınıflandırması için yerel dikkat haritalamalarını kullanan transformatörler Transformers using local attention mappings for long text document classification
Authors:	Haman, Bekir Ufuk
Advisors:	Tekir, Selma
Keywords:	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Computer Engineering and Computer Science and Control
Abstract:	Transformer models are powerful and flexible encoder-decoder structures that have proven their success in many fields, including natural language processing. Although they are especially successful in working with textual input, classifying texts, answering questions, and producing text, they have difficulty processing long texts. Current leading transformer models such as BERT limit input lengths to 512 tokens. The most prominent reason for this limitation is that the self-attention operation, which forms the backbone of the transformer structure, requires high processing power. This processing power requirement, which increases quadratically with the input length, makes it impossible for transformers to process long texts. However, new transformer structures that use various local attention mapping methods have begun to be proposed to overcome the text length challenge. This study first proposes two alternative local attention mapping methods to make transformer models capable of processing long texts. In addition, it presents the 'Refined Patents' dataset consisting of 200,000 patent documents, specifically prepared for the long text document classification task. The proposed attention mapping methods, Term Frequency - Inverse Document Frequency (TF-IDF) and Point Mutual Information (PMI), create a sparse version of the self-attention matrix based on the occurrence statistics of words and word pairs. These methods were implemented based on the Longformer and Big Bird models, and tested on the Refined Patents dataset. Test results show that both proposed approaches are acceptable local attention mapping alternatives and can be used to enable long text processing in transformers. Transformatör modelleri doğal dil işleme dahil olmak üzere, pek çok alanda başarılarını kanıtlamış güçlü ve esnek kodlayıcı çözücü yapılarıdır. Özellikle metinsel girdilerle çalışmak, metinleri sınıflandırmak, soru cevaplamak, metin üretmek konusunda başarılı olsalar da uzun metinleri işlemekte zorlanırlar. BERT gibi mevcut önde gelen transformer modelleri, girdi uzunluklarını 512 kelime ile sınırlamıştır. Bu durumun en öne çıkan sebebi, transformatör yapısının bel kemiğini oluşturan öz dikkat operasyonunun yüksek işlem gücüne ihtiyaç duyuyor olmasıdır. Girdi uzunluğu ile karesel oranda artan bu işlem gücü ihtiyacı, transformerlar için uzun metinlerin işlenmesini imkansız hale getirmektedir. Ancak metin uzunluğu sorununun üstesinden gelmek için çeşitli yerel dikkat haritalandırma yöntemleri kullanan yeni transformatör yapıları önerilmeye başlanmıştır. Bu çalışma öncelikle transformatör modellerini uzun metinleri işleyebilir hale getirmek için iki alternatif lokal dikkat haritalandırması yöntemi önermektedir. Buna ek olarak, uzun metin sınıflandırma görevi için özel olarak hazırlamış ve 200.000 patent dokümanından oluşan 'Refined Patents' verisetini sunar. Önerilen dikkat haritalandırması yöntemleri, Terim Frekansı - Tersine Doküman Frekansı (TF-IDF) ve Noktasal Karşılıklı Bilgi (PMI), kelime ve kelime çiftlerinin görülme istatistiklerinden yola çıkarak öz dikkat matrisinin seyrek halini oluşturarak transformatör modellerinin uzun metinleri işleyebilmesine olanak sağlar. Bu yöntemler, türünün öncü örneklerinden Longformer ve Big Bird modelleri temel alınarak uygulanmış ve Refined Patents veriseti üzerinde üzerinde test edilmiştir. Test sonuçları önerilen iki yaklaşımın da kabul edilebilir lokal dikkat haritalandırması alternatifi olduklarını ve transformatörlerde uzun metin işlenmesini mümkün kılmak için kullanılabileceklerini göstermektedir
URI:	https://hdl.handle.net/11147/14501
Appears in Collections:	Master Degree / Yüksek Lisans Tezleri

Show full item record

CORE Recommender

Page view(s)

6

checked on May 6, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM