When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Mobility data is the cornerstone of crucial applications, including traffic monitoring, crowdsourcing, and social networks. However, research shows that publishing accurate mobility data aggregate may jeopardize the participants' privacy. As a robust and rigorous technique, differential privacy provides a quantifiable protection guarantee by injecting enough noise into the aggregates to make them resilient to privacy attacks while allowing learning and analysis.
The application of differential privacy raises two challenges that stem from mobility data characteristics. First, mobility data is usually spread across multiple organizations, whereas standard differential privacy relies on a centralized trusted curator. Secondly, mobility data is typically sequential, while the guarantee provided by differential privacy degrades with consecutive aggregating of the sensitive data. This thesis tackles these challenges for two application scenarios: decentralized mobility aggregate sharing and forecasting.
We leverage a distributed variant of differential privacy to enable decentralized mobility aggregate sharing where each organization obfuscates its dataset locally before sending it to the data curator. We use a sliding window approach to allocate the privacy budget to tackle the consecutive data access challenge. Moreover, we design an approximation strategy to calculate the closest private statistics to the current timestamp. We formally prove the privacy guarantee of our algorithms. Finally, we demonstrate that our solution enables decentralized statistical release with a robust privacy guarantee on two datasets.
Before addressing the privacy aspect of distributed mobility forecasting, we design a mobility vertical federated forecasting (MVFF) framework that allows the learning process to be jointly conducted over vertically partitioned data belonging to multiple organizations. Since each organization only holds a location domain subset, none can tackle a forecasting model that covers the whole location domain. Moreover, distributed mobility data compromises the spatio-temporal correlation between locations hindering learning. Hence, reducing the forecasting accuracy. MVFF uses a local learning model for each organization to extract the embedded spatio-temporal correlation between its locations. A global model synchronizes with the local models to incorporate the correlation between all the organizations' locations. We investigate the performance of MVFF under four variations of local and global models. We compare the MVFF's performance to two other federated frameworks on real-life datasets: New York Bike and Yelp reviews, achieving better performances.
Finally, we design two adaptive differential privacy budget algorithms for each organization participating in collaborative mobility forecasting. We define a new metric to assess the different organizations' participation levels in the learning task and adjust the privacy budget accordingly. Then, we adapt each organization's privacy protection level (privacy budget) to the accuracy dynamics of the learning task. Lastly, we empirically evaluate our adaptive differential privacy budget algorithms using MVFF and two real-world datasets: a trajectory dataset collected in New York and Beijing over multiple months and a Yelp business review dataset.