Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/15405
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAcar, A.-
dc.contributor.authorTekir, S.-
dc.date.accessioned2025-02-25T20:01:08Z-
dc.date.available2025-02-25T20:01:08Z-
dc.date.issued2025-
dc.identifier.issn2375-4699-
dc.identifier.urihttps://doi.org/10.1145/3706105-
dc.identifier.urihttps://hdl.handle.net/11147/15405-
dc.description.abstractCounterfactual statements are examples of causal reasoning as they describe events that did not happen and, optionally, those events' consequences if they happened. SemEval-2020 introduces the counterfactual detection (CFD) task and shares an English dataset. Since then, a set of datasets has been released in English, German, and Japanese as part of Amazon product reviews. This work releases the first Turkish corpus of counterfactuals (TRCD). The data collection process is driven by a clue phrase list of counterfactuals, mainly in the form of verb inflections in Turkish. We use clue phrase-based filtering to collect sentences from the Turkish National Corpus (TNC). On the other hand, half of the collection is subject to random word filtering to avoid selection bias due to clue phrases. After the human annotation process with an Inter Annotator Agreement of 0.65, we have 5000 sentences, of which 12.8% contain counterfactual statements. Furthermore, we provide a comprehensive baseline of transformer-based models by testing the effect of clue phrases, cross-lingual performance comparisons using the available CFD datasets, and zero-shot cross-lingual classification experiments using fine-tuning on the different combinations of the existing datasets. The results confirm that TRCD is compatible with the other CFD datasets. Moreover, fine-tuning a Turkish-specific model (BERTurk) performs better than the multilingual alternatives (mBERT and XLM-R). BERTurk is more robust to clue phrase masking. This result emphasizes the importance of a language-specific tokenizer for contextual understanding, especially for low-resource languages. Finally, our qualitative analysis gives insights into errors by different models. © 2025 Copyright held by the owner/author(s).en_US
dc.language.isoenen_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.ispartofACM Transactions on Asian and Low-Resource Language Information Processingen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectBerturken_US
dc.subjectCorpusen_US
dc.subjectCounterfactual Detectionen_US
dc.subjectMultilingual Transformersen_US
dc.subjectTurkishen_US
dc.titleRecognition of Counterfactual Statements in Turkishen_US
dc.typeArticleen_US
dc.departmentİzmir Institute of Technologyen_US
dc.identifier.volume24en_US
dc.identifier.issue1en_US
dc.identifier.scopus2-s2.0-85216341068-
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.doi10.1145/3706105-
dc.authorscopusid57349914100-
dc.authorscopusid16234844500-
dc.identifier.wosqualityQ4-
dc.identifier.scopusqualityQ2-
item.languageiso639-1en-
item.fulltextNo Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.grantfulltextnone-
item.openairetypeArticle-
crisitem.author.dept03.04. Department of Computer Engineering-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Show simple item record



CORE Recommender

Page view(s)

10
checked on Mar 3, 2025

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.