Abstract:
Storing data is one of the most critical issues in a shared storage environment, there is an obvious challenge in keeping users’ data protected within such environment. Risks and security threats that could happen should be considered. Such risks include challenges in handling privileged user access, ensuring privacy and ensuring data segregation. The customer’s data could be exposed to disclosure, this would be harmful, especially if the information of rival businesses from the same business domain stored in the same storage location. This research proposes a technique to classify business competitors in order to store their data in a safe location far from each other. However, there are many segregation techniques developed by other researchers, this research contributes to the body of knowledge by introducing the idea of classification according to business type, by measuring the similarity between businesses for which, the data will be stored in the shared environment. The technique uses an existing similarity measure to calculate the logical distance between these competitors. An ontological taxonomy is needed to calculate this logical distance. The classification technique developed in this work, is based on data segregation, which has been used to prevent storing the data that belong to rival companies in nearby locations. The new technique, segregates tenants’ databases based on business domain competitiveness. The competitors’ data will be segregated according to the logical distance of their business domains. The farthest distance will be given to the closest business domains. Using business domain classification enhances the security characteristic of data and, reduces the risk of disclosure of user information by segregating business competitors. There are various measures to calculate the logical distance between business domains. Several experiments have been performed to choose the one that best suites our segregation technique.
The applicability of using some measures from semantic similarity measures has been evaluated. This research builds business domain taxonomy that classifies business domains to be used by these measures. This taxonomy, which mimics WordNet taxonomy, is used in the experiments performed in this research. To be able to classify business domain using segregation method, the proposed data segregation method is based on data semantic similarity value. Businesses are classified based on the fact that, the similarity among business
v
domains will decide the distance that should be kept between the storage locations allocated for businesses. The highest similarity value leads to the long distance. Several experiments have been conducted to define the best measure among all these measures. The results showed that, the shortest-path similarity measure is the best measure to calculate the logical distance between business domains with minimal error.
The experiments have evaluated the different techniques for data distribution among several storage locations. These techniques are the random distribution, distribution by semantic similarity segregation and least similarity distribution segregation. The results of the different distribution techniques are evaluated by comparing them with the manual distribution of a group of human experts to judge their reliability.