Automating modern code review processes with code similarity measurement

Kartal,Y.; Akdeniz,E.K.; Özkan,K.

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14571

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kartal,Y.	-
dc.contributor.author	Akdeniz,E.K.	-
dc.contributor.author	Özkan,K.	-
dc.date.accessioned	2024-06-19T14:29:41Z	-
dc.date.available	2024-06-19T14:29:41Z	-
dc.date.issued	2024	-
dc.identifier.issn	9505-849	-
dc.identifier.uri	https://doi.org/10.1016/j.infsof.2024.107490	-
dc.identifier.uri	https://hdl.handle.net/11147/14571	-
dc.description.abstract	Context: Modern code review is a critical component in software development processes, as it ensures security, detects errors early and improves code quality. However, manual reviews can be time-consuming and unreliable. Automated code review can address these issues. Although deep-learning methods have been used to recommend code review comments, they are expensive to train and employ. Instead, information retrieval (IR)-based methods for automatic code review are showing promising results in efficiency, effectiveness, and flexibility. Objective: Our main objective is to determine the optimal combination of the vectorization method and similarity to measure what gives the best results in an automatic code review, thereby improving the performance of IR-based methods. Method: Specifically, we investigate different vectorization methods (Word2Vec, Doc2Vec, Code2Vec, and Transformer) that differ from previous research (TF-IDF and Bag-of-Words), and similarity measures (Cosine, Euclidean, and Manhattan) to capture the semantic similarities between code texts. We evaluate the performance of these methods using standard metrics, such as Blue, Meteor, and Rouge-L, and include the run-time of the models in our results. Results: Our results demonstrate that the Transformer model outperforms the state-of-the-art method in all standard metrics and similarity measurements, achieving a 19.1% improvement in providing exact matches and a 6.2% improvement in recommending reviews closer to human reviews. Conclusion: Our findings suggest that the Transformer model is a highly effective and efficient approach for recommending code review comments that closely resemble those written by humans, providing valuable insight for developing more efficient and effective automated code review systems. © 2024 Elsevier B.V.	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier B.V.	en_US
dc.relation.ispartof	Information and Software Technology	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Code similarity	en_US
dc.subject	Information retrieval	en_US
dc.subject	Modern code review	en_US
dc.subject	Vectorization	en_US
dc.title	Automating modern code review processes with code similarity measurement	en_US
dc.type	Article	en_US
dc.department	Izmir Institute of Technology	en_US
dc.identifier.volume	173	en_US
dc.identifier.wos	WOS:001245336700001	-
dc.identifier.scopus	2-s2.0-85193900630	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.doi	10.1016/j.infsof.2024.107490	-
dc.authorscopusid	24490853600	-
dc.authorscopusid	58635463000	-
dc.authorscopusid	15081108900	-
dc.identifier.wosquality	N/A	-
dc.identifier.scopusquality	N/A	-
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
item.openairetype	Article	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show simple item record

CORE Recommender

Page view(s)

76

checked on Nov 18, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM