Realtime Access Map
Evaluation of protein secondary structure prediction algorithms on a new advanced benchmark dataset
Starting from 1970s, researchers have been studying secondary structure prediction. However the accuracy of state-of art methods reach to approximately 80- 85%. One of the reasons for that is related with the limitations in respect to datasets used for training or testing the algorithm. A number of databases with n number of experimentally determined proteins, which also contain the knowledge of functionality, biochemical properties and location annotation of proteins, will directly show us how the algorithms work on certain groups of proteins. This also ensures opportunity to users to determine the quality of algorithms on those datasets and to decide on which algorithm can be used for which type of proteins. In this thesis, the objective is set through the development of a new and advanced protein benchmark database which contains functional and biochemical information of experimentally defined 64872 proteins in S2C database derived by ProteinDataBank (PDB). With this database, the seven available predictors are evaluated in respect to their performances on different datasets in terms of functionality and subcellular localization of proteins in the benchmark database. According to the results obtained on proposed benchmark datasets in compare to results on one of existing dataset, RS126, it was shown that grouping proteins into functions in their subcellular localizations have a great impact on deciding the accuracies of existing algorithms.