Michael Khavkin had presented his research at the 24th Industrial Engineering & Management Conference on April 28th.
The title and the abstract of the talk:
The impact of privacy guarantees and user decisions on the performance of machine learning models
The growing demand for data to enhance Machine Learning (ML) performance has increased reliance on user-contributed data, which often includes personal and sensitive information. Differential Privacy (DP) is one of the leading privacy mechanisms for user data, achieved by adding calibrated noise, proportional to the privacy budget parameter ε, either directly to the data, to analysis outputs, or during ML training. Organizations now use Federated Learning combined with DP to collaboratively train privacy-compliant AI models without centralizing user data, enhancing industrial processes such as customer service and sales. Moreover, this approach is now extending to Large Language Models (LLMs), enabling privacy-preserving advancements in human tasks and decision-making. DP is often analyzed through privacy-utility trade-offs, in which the addition of more noise provides better privacy guarantees but reduces the utility of the data and the accuracy of ML models built on it. However, this trade-off has predominantly been investigated through technical lenses, neglecting the impact of users` disclosure decisions, even if users are able to consent to data collection. These decisions are particularly important in voluntary opt-in settings, such as when Google requests user permission for location-based services. In such cases, users weigh the benefits of sharing their data against potential privacy risks. Moreover, despite extensive research on privacy behavior, privacy decision-making in the context of DP and ML remains underexplored. In this paper, we analyzed how user decisions under varying levels of DP protection affect collected data and, consequently, ML model performance. To that end, we designed a user study where N=718 participants were offered to share their location under varying DP levels and different types of incentives. We then investigated the contribution of participants` disclosure decisions to the performance of a ML model applied on users` data. We evaluated the model’s performance across three localization levels, with varying location weights. Our findings highlight the need to account for user decisions in individual-level data collection. We show that the elasticity of user behavior directly influences ML performance, emphasizing its critical role. Consequently, we provide a user-centered perspective for evaluating ML models trained on individual-level data within differentially private systems.
Comments are closed.