Abstract:
Infectious Disease Ontology (IDO) and its variants have been highly successful in implementing provides a consistent terminology, hierarchy, and logical representation for the domain of infectious and parasitic diseases. ICD’s coverage of the domain in terms of types of infectious diseases is broad, but information about other aspects of infectious disease is limited and thus the scope of ICD-10 is considered narrow.
The great numbers, size, and complexity of biomedical ontologies make it difficult to choose appropriate ontology more adequate for given domain. The users will compare the ontologies and select higher quality ontology from more available ontologies for a single domain. Reference dataset are essential tools to check quality of any knowledge source. Currently there is no reference dataset to evaluate the quality of ontology from the perspective of semantic similarity measure, and there is no well defined reference dataset in the biomedical domain.
In this research, we proposed an approach that aids the development of a methodology for infectious and parasitic diseases. It based on biomedical domain ontology concepts/classes to compare between them using semantic similarity measure (SemDist) measure. The research approach consists of four interrelated components: select a semantic similarity measure, build reference dataset using SemDist measure, evaluate our reference dataset, and compare our reference dataset to two different ontologies. In the first part of this research, assessment of the applicability of using some measures from semantic similarity techniques has been investigated. This research builds biomedical domain taxonomy/hierarchy to be used by these measures. Several experiments have been conducted to select the best measure among all these measures. The experimental results validate the efficiency of the SemDist technique in single ontology and across ontologies, and demonstrate that the SemDist semantic similarity measure, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings. The reference dataset is built using ICD-10 “V1.0” ontology, infectious and parasitic diseases, named for Infectious and Parasitic DO-Reference dataset . We evaluate the approach according to a human expert in Human Disease Ontology by comparing his diseases diagnosis to those of the reference dataset, reference dataset showed good accuracy in the results were 80.6% compare to document physicians answers. We evaluate the (doid) ontology within Unified Medical Language System (UMLS) framework it indicate that the accuracy of using Infectious and Parasitic DO- Reference dataset at lexical level and conceptual level is 69cocepts (52.6) and 75% respectively. When, we evaluate the (SNOMED-CT) ontology within UMLS framework, it indicate that the accuracy of using Infectious and Parasitic DO- Reference dataset at lexical level and conceptual level is 81cocepts (62.8) and 86.3% respectively. In addition, we use the feature “compare ontologies tools” in protégé to insure the accuracy of results.