Reproducibility Assessment of Research Code Repositories

Akdeniz, Eyüp Kaan

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/13922

Title:	Reproducibility Assessment of Research Code Repositories
Other Titles:	Araştırma kod depolarının yeniden üretilebilirlik değerlendirmesi
Authors:	Akdeniz, Eyüp Kaan
Advisors:	Tekir, Selma
Keywords:	Natural language processing Machine learning Source codes Code repositories
Publisher:	01. Izmir Institute of Technology
Abstract:	The growth in machine learning research has not been accompanied by a corresponding improvement in the reproducibility of the results. This thesis presents a novel, fully-automated end-to-end system that evaluates the reproducibility of machine learning studies based on the content of the associated GitHub project's Readme file. This evaluation relies on a readme template derived from an analysis of popular repositories. The template suggests a structure that promotes reproducibility. Our system generates a reproducibility score for each Readme file assessed, and it employs two distinct models, one based on section classification and the other on hierarchical transformers. The experimental outcomes indicate that the system based on section similarity outperforms the hierarchical transformer model. Furthermore, it has a superior edge concerning explainability, as it allows for a direct correlation of the scores with the respective sections of the Readme files. The proposed framework provides an important tool for improving the quality of code sharing and ultimately helps to increase reproducibility in machine learning research. Makine öğrenimi araştırmalarındaki büyümeye, sonuçların tekrar üretilebilirliğinde buna karşılık gelen bir gelişme eşlik etmemiştir. Bu tez, ilişkili GitHub projesinin Readme dosyasının içeriğine dayalı olarak makine öğrenmesi çalışmalarının yeniden üretilebilirliğini değerlendiren yeni, tam otomatik bir uçtan uca sistem sunmaktadır. Bu değerlendirme, popüler depoların analizinden türetilen bir readme şablonuna dayanmaktadır. Şablon, yeniden üretilebilirliği teşvik eden bir yapıyı önerir. Sistemimiz, değerlendirilen her Readme dosyası için bir yeniden üretilebilirlik puanı üretir ve biri bölüm sınıflandırmasına, diğeri hiyerarşik dönüştürücülere dayanan iki farklı model kullanır. Deneysel sonuçlar, bölüm benzerliğine dayalı sistemin hiyerarşik dönüştürücü modelinden daha iyi performans gösterdiğini göstermektedir. Ayrıca, skorların Readme dökümanlarının ilgili bölümleriyle doğrudan ilişkilendirilebilmesi açısından üstün bir açıklanabilirliğe sahiptir. Önerilen çerçeve, kod paylaşımının kalitesini artırmak için önemli bir araç sunmakta ve sonuçta makine öğrenimi araştırmalarında yeniden üretilebilirliğin arttırılmasına yardımcı olmaktadır.
Description:	Thesis (Master)--İzmir Institute of Technology, Computer Engineering, Izmir, 2023 Includes bibliographical references (leaves. 47-56) Text in English; Abstract: Turkish and English
URI:	https://hdl.handle.net/11147/13922
Appears in Collections:	Master Degree / Yüksek Lisans Tezleri

Files in This Item:

File	Description	Size	Format
10562893.pdf	Master Thesis	960.56 kB	Adobe PDF	View/Open

Show full item record

CORE Recommender

Page view(s)

170

checked on Apr 14, 2025

Download(s)

136

checked on Apr 14, 2025

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM