Recognition of Counterfactual Statements in Turkish

Acar, Ali; Tekir, Selma

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/15405

Full metadata record

DC Field	Value	Language
dc.contributor.author	Acar, Ali	-
dc.contributor.author	Tekir, Selma	-
dc.date.accessioned	2025-02-25T20:01:08Z	-
dc.date.available	2025-02-25T20:01:08Z	-
dc.date.issued	2025	-
dc.identifier.issn	2375-4699	-
dc.identifier.issn	2375-4702	-
dc.identifier.uri	https://doi.org/10.1145/3706105	-
dc.description.abstract	Counterfactual statements are examples of causal reasoning as they describe events that did not happen and, optionally, those events' consequences if they happened. SemEval-2020 introduces the counterfactual detection (CFD) task and shares an English dataset. Since then, a set of datasets has been released in English, German, and Japanese as part of Amazon product reviews. This work releases the first Turkish corpus of counterfactuals (TRCD). The data collection process is driven by a clue phrase list of counterfactuals, mainly in the form of verb inflections in Turkish. We use clue phrase-based filtering to collect sentences from the Turkish National Corpus (TNC). On the other hand, half of the collection is subject to random word filtering to avoid selection bias due to clue phrases. After the human annotation process with an Inter Annotator Agreement of 0.65, we have 5000 sentences, of which 12.8% contain counterfactual statements. Furthermore, we provide a comprehensive baseline of transformer-based models by testing the effect of clue phrases, cross-lingual performance comparisons using the available CFD datasets, and zero-shot cross-lingual classification experiments using fine-tuning on the different combinations of the existing datasets. The results confirm that TRCD is compatible with the other CFD datasets. Moreover, fine-tuning a Turkish-specific model (BERTurk) performs better than the multilingual alternatives (mBERT and XLM-R). BERTurk is more robust to clue phrase masking. This result emphasizes the importance of a language-specific tokenizer for contextual understanding, especially for low-resource languages. Finally, our qualitative analysis gives insights into errors by different models.	en_US
dc.language.iso	en	en_US
dc.publisher	Assoc Computing Machinery	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Turkish	en_US
dc.subject	Corpus	en_US
dc.subject	Counterfactual Detection	en_US
dc.subject	Multilingual Transformers	en_US
dc.subject	Berturk	en_US
dc.title	Recognition of Counterfactual Statements in Turkish	en_US
dc.type	Article	en_US
dc.department	İzmir Institute of Technology	en_US
dc.identifier.volume	24	en_US
dc.identifier.issue	1	en_US
dc.identifier.wos	WOS:001416741200007	-
dc.identifier.scopus	2-s2.0-85216341068	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.doi	10.1145/3706105	-
dc.identifier.wosquality	Q4	-
dc.identifier.scopusquality	Q2	-
dc.description.woscitationindex	Science Citation Index Expanded	-
item.grantfulltext	none	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.fulltext	No Fulltext	-
item.cerifentitytype	Publications	-
item.languageiso639-1	en	-
item.openairetype	Article	-
crisitem.author.dept	01. Izmir Institute of Technology	-
crisitem.author.dept	03.04. Department of Computer Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show simple item record

CORE Recommender

Page view(s)

84

checked on Jul 7, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM