The objective of the project DAPSA – Data Access with Privacy, Security, and Accountability is to analyze how privacy, security, and accountability challenges can be managed to increase access to data science resources by researchers and the public.

The project is funded by the Data Science Excellence program at the Israel Council for Higher Education through TAD – Center for Artificial Intelligence & Data Science at Tel Aviv University. Collaborators within the project include the Ran Gilad-Bachrach’s MLwell Lab at the Faculty of EngineeringMahmood Sharif’s Group at the School of Computer Science, The Michal Feldman’s Economics and Computer Science research groupAnalia Schlosser’s Group at the School of Economics, and Michael Birnhack at the School of Law.

Publications

  1. Eilat Lev Ari, Maayan Roichman, Eran Toch, Strategies of Product Managers: Negotiating Social Values in Digital Product Design, Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024. Info. Download.
  2. Danielle Movsowitz-Davidow, Yacov Manevich, and Eran Toch. Privacy-Preserving Payment System With Verifiable Local Differential Privacy. 5th Conference on Advances in Financial Technologies (AFT), 2023. Info. Download. Code.
  3. Maya Benarous, Eran Toch, Irad Ben-Gal, Synthesis of Longitudinal Human Location Sequences: Balancing Utility and Privacy. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6), 2022. Info. Code.

Code

Open source code created as part of DAPSA include:

  • synthesize_mobility_chains: The repository contains code that analyze long mobility chains repositories (such as the location traces of an app user for several weeks) and synthesizes location traces using short-term memory networks (LSTMs), Markov Chains (MC), and variable-order Markov models
    (VMMs).
  • evaluate_synthetic_time_series: The repository contains code that evaluates measures for synthetic time series data, such as synthesis of the locations a set of people visit over a couple of weeks. The scripts compare the synthetic data to the original data and analyze how well the synthesis preserves the privacy of the original data subjects, the statistical similarity, the per-instance similarity, and the diversity of the synthetic data.
  • GPTalyze open-source Github repository: The library utilizes ChatGPT’s API to analyze short textual snippets, such as tweets, employing ChatGPT’s zero-shot-like abilities to summarize the discussed topics in a textual corpus and perform other Natural Language Processing (NLP) tasks, such as sentiment analysis and emotion detection.