Comment
Author: Admin | 2025-04-28
Of different techniques are evaluated within this scope. Advertisement2. Data mining with privacyPrivacy Protected Data Mining (PPDM) techniques have been developed to allow the extraction of information from data sets while preventing the disclosure of data subjects’ identities or sensitive information. In addition, PPDM allows more than one researcher to collaborate on a dataset [11, 12]. Also PPDM can be defined as performing data mining on data sets to be obtained from databases containing sensitive and confidential information in a multilateral environment without disclosing the data of each party to other parties [13].In order to protect privacy in data mining, statistical and cryptographic based approaches have been proposed. The vast majority of these approaches operate on original data to protect privacy. This is referred to as the natural trade-off between data quality and privacy level.PPDM methods are being studied on to perform effective data mining by guaranteeing a certain level of privacy. Several different taxonomies have been proposed for these methods. In the literature, based on data life cycle stages (data collection, data publishing, data distribution and output of data mining) [10] or they are classified based on the method used (Anonymization based, Perturbation based, Randomization based, Condensation based and Cryptography based) [14].In this study, PPDM approaches are examined with a simple taxonomy as methods applied to input data and processed data (output information) that is subject to data mining.2.1 Methods applied to input DataThis section includes the methods suggested for collecting, cleaning, integration, selection and transformation phases of input data that will be subject to data mining.Although it varies according to the application used or the state of trust to the institution collecting the data, it is recommended that the original values not be stored and used only in the conversion process in order to prevent disclosure of privacy. For example, the data collected with sensors, which are now widely used with internet of things, can be transformed at the stage it collects, randomizing the obtained values and transforming the raw data before being used in data mining.In this section, data perturbation, randomization, suppression, data swapping, anonymity, cryptography and differential privacy methods are discussed.2.1.1 Data perturbationThe creation of data resistant to privacy attacks can be done by perturbation significantly preserving the statistical integrity of the data [15, 16]. Randomization of the original data is widely used in data perturbation [17, 18, 19]. Another approach is the Microaggregation method [20].In the randomization method, noise signals are added to the data with a known statistical distribution, so when data mining methods are applied, the original data distribution can be reconstructed without accessing the original data. For this, data providers first randomize their data and then transmit them to the data recipient. Then, receiving this random data, the data receiver calculates the distribution using distribution reconstruction methods.During the data collection phase, it can be calculated independently for each data, and after the original distribution is reconstructed, the statistical properties of the data are preserved. For example; the result of the randomization of
Add Comment