Abstract:
Researchers in the field of protein secondary structure prediction use typical three states of secondary structures, namely: alpha helices (H) beta strands (E), and coils (C). The series of amino acids polymers linked together into adjacent chains are known as proteins. Protein secondary structure prediction is a fundamental step in determining the final structure and functions of a protein. In this work we developed a prediction machine for protein secondary structure. By investigating the amino acids benchmark data sets, it was observed that the data is grouped into two distinct states or groups almost 50% each. In this scheme, researchers classify any state which is not classified as helix or strands as coils. Hence, in this work a new way of looking to the data set is adopted. For this type of data, the Receiver Operating Characteristic (ROC) analysis is considered for analysing and interpreting the results of assessing the protein secondary structure classifier. The results revealed that ROC analysis showed similar results to that obtained using other non ROC classification methods. The ROC curves were able to discriminate the coil states from non-coil states by 72% prediction accuracy with very small standard error.