Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/5277
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDoğan, Tunca-
dc.contributor.authorKaraçalı, Bilge-
dc.date.accessioned2017-04-10T12:55:50Z
dc.date.available2017-04-10T12:55:50Z
dc.date.issued2013-09
dc.identifier.citationDoğan, T., and Karaçalı, B. (2013). Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences. PLoS One, 8(9). doi:10.1371/journal.pone.0075458en_US
dc.identifier.issn1932-6203
dc.identifier.issn1932-6203-
dc.identifier.urihttp://doi.org/10.1371/journal.pone.0075458
dc.identifier.urihttp://hdl.handle.net/11147/5277
dc.description.abstractIdentifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.en_US
dc.language.isoenen_US
dc.publisherPublic Library of Scienceen_US
dc.relation.ispartofPLoS Oneen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectSequence analysisen_US
dc.subjectProteinsen_US
dc.subjectGenome analysisen_US
dc.subjectGenetic databaseen_US
dc.subjectReceiver operating characteristicen_US
dc.titleAutomatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequencesen_US
dc.typeArticleen_US
dc.authoridTR11527en_US
dc.institutionauthorDoğan, Tunca-
dc.institutionauthorKaraçalı, Bilge-
dc.departmentİzmir Institute of Technology. Electrical and Electronics Engineeringen_US
dc.identifier.volume8en_US
dc.identifier.issue9en_US
dc.identifier.wosWOS:000326240100126en_US
dc.identifier.scopus2-s2.0-84884176982en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.doi10.1371/journal.pone.0075458-
dc.identifier.pmid24069417en_US
dc.relation.doi10.1371/journal.pone.0075458en_US
dc.coverage.doi10.1371/journal.pone.0075458en_US
dc.identifier.wosqualityQ2-
dc.identifier.scopusqualityQ1-
item.fulltextWith Fulltext-
item.grantfulltextopen-
item.languageiso639-1en-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.openairetypeArticle-
crisitem.author.dept03.05. Department of Electrical and Electronics Engineering-
Appears in Collections:Electrical - Electronic Engineering / Elektrik - Elektronik Mühendisliği
PubMed İndeksli Yayınlar Koleksiyonu / PubMed Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File Description SizeFormat 
5277.PDFMakale2.53 MBAdobe PDFThumbnail
View/Open
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

9
checked on Nov 15, 2024

WEB OF SCIENCETM
Citations

7
checked on Nov 9, 2024

Page view(s)

702
checked on Nov 18, 2024

Download(s)

210
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.