Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/6764
Full metadata record
DC FieldValueLanguage
dc.contributor.authorYousef, Malik-
dc.contributor.authorNigatu, Dawit-
dc.contributor.authorLevy, Dalit-
dc.contributor.authorAllmer, Jens-
dc.contributor.authorHenkel, Werner-
dc.date.accessioned2018-01-30T08:10:12Z-
dc.date.available2018-01-30T08:10:12Z-
dc.date.issued2017-12-
dc.identifier.citationYousef, M., Nigatu, D., Levy, D., Allmer, J., and Henkel, W. (2017). Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. Eurasip Journal on Advances in Signal Processing, 2017(1). doi:10.1186/s13634-017-0506-8en_US
dc.identifier.issn1687-6180-
dc.identifier.urihttp://doi.org/10.1186/s13634-017-0506-8-
dc.identifier.urihttp://hdl.handle.net/11147/6764-
dc.description.abstractBackground: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species’ pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species’ pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.en_US
dc.description.sponsorshipScientific and Technological Research Council of Turkey (113E326); Zefat Academic College; German Research Foundation (DFG)en_US
dc.language.isoenen_US
dc.publisherSpringer Verlagen_US
dc.relationinfo:eu-repo/grantAgreement/TUBITAK/EEEAG/113E326en_US
dc.relation.ispartofEurasip Journal on Advances in Signal Processingen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectInformation theoryen_US
dc.subjectMicroRNAsen_US
dc.subjectMachine learningen_US
dc.subjectSequence motifsen_US
dc.subjectRNAen_US
dc.titleCategorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mersen_US
dc.typeArticleen_US
dc.authoridTR107974en_US
dc.institutionauthorAllmer, Jens-
dc.departmentİzmir Institute of Technology. Molecular Biology and Geneticsen_US
dc.identifier.volume2017en_US
dc.identifier.issue1en_US
dc.identifier.wosWOS:000412913000001en_US
dc.identifier.scopus2-s2.0-85032857843en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.doi10.1186/s13634-017-0506-8-
dc.relation.doi10.1186/s13634-017-0506-8en_US
dc.coverage.doi10.1186/s13634-017-0506-8en_US
dc.identifier.wosqualityQ3-
dc.identifier.scopusqualityQ2-
item.fulltextWith Fulltext-
item.openairetypeArticle-
item.cerifentitytypePublications-
item.grantfulltextopen-
item.languageiso639-1en-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
crisitem.author.dept04.03. Department of Molecular Biology and Genetics-
Appears in Collections:Molecular Biology and Genetics / Moleküler Biyoloji ve Genetik
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Sürdürülebilir Yeşil Kampüs Koleksiyonu / Sustainable Green Campus Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File Description SizeFormat 
6764.pdfMakale1.64 MBAdobe PDFThumbnail
View/Open
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

14
checked on Apr 5, 2024

WEB OF SCIENCETM
Citations

11
checked on Mar 27, 2024

Page view(s)

236
checked on Apr 15, 2024

Download(s)

124
checked on Apr 15, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.