Author: "Domingo-Ferrer, Josep" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Domingo-Ferrer, Josep"' showing total 1,807 results

Start Over Author "Domingo-Ferrer, Josep"

1,807 results on '"Domingo-Ferrer, Josep"'

1. Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

Author: Blanco-Justicia, Alberto, Jebreel, Najeeb, Manzanares, Benet, Sánchez, David, Domingo-Ferrer, Josep, Collell, Guillem, and Tan, Kuan Eeik
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, 68, K.4.1, I.2.6, I.2.7
Abstract: The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks., Comment: 70 pages
Published: 2024

2. Conciliating Privacy and Utility in Data Releases via Individual Differential Privacy and Microaggregation

Author: Soria-Comas, Jordi, Sánchez, David, Domingo-Ferrer, Josep, Martínez, Sergio, and Del Vasto-Terrientes, Luis
Subjects: Computer Science - Cryptography and Security
Abstract: $\epsilon$-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees. However, when applied to data releases, DP significantly deteriorates the analytical utility of the protected outcomes. To keep data utility at reasonable levels, practical applications of DP to data releases have used weak privacy parameters (large $\epsilon$), which dilute the privacy guarantees of DP. In this work, we tackle this issue by using an alternative formulation of the DP privacy guarantees, named $\epsilon$-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects. We enforce iDP in data releases by relying on attribute masking plus a pre-processing step based on data microaggregation. The goal of this step is to reduce the sensitivity to record changes, which determines the amount of noise required to enforce iDP (and DP). Specifically, we propose data microaggregation strategies designed for iDP whose sensitivities are significantly lower than those used in DP. As a result, we obtain iDP-protected data with significantly better utility than with DP. We report on experiments that show how our approach can provide strong privacy (small $\epsilon$) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis., Comment: 17 pages, 6 figures
Published: 2023

3. Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

Author: Haffar, Rami, Sánchez, David, and Domingo-Ferrer, Josep
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Human facial data hold tremendous potential to address a variety of classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, such as the EU General Data Protection Regulation and others, have restricted the ways in which human images may be collected and used for research. As a result, several previously published data sets containing human faces have been removed from the internet due to inadequate data collection methods that failed to meet privacy regulations. Data sets consisting of synthetic data have been proposed as an alternative, but they fall short of accurately representing the real data distribution. On the other hand, most available data sets are labeled for just a single task, which limits their applicability. To address these issues, we present the Multi-Task Faces (MTF) image data set, a meticulously curated collection of face images designed for various classification tasks, including face recognition, as well as race, gender, and age classification. The MTF data set has been ethically gathered by leveraging publicly available images of celebrities and strictly adhering to copyright regulations. In this paper, we present this data set and provide detailed descriptions of the followed data collection and processing procedures. Furthermore, we evaluate the performance of five deep learning (DL) models on the MTF data set across the aforementioned classification tasks. Additionally, we compare the performance of DL models over the processed MTF data and over raw data crawled from the internet. The reported results constitute a baseline for further research employing these data. The MTF data set can be accessed through the following link (please cite the present paper if you use the data set): https://github.com/RamiHaf/MTF_data_set, Comment: 21 pages, 2 figures, 9 Tables
Published: 2023

4. An Examination of the Alleged Privacy Threats of Confidence-Ranked Reconstruction of Census Microdata

Author: Sánchez, David, Jebreel, Najeeb, Muralidhar, Krishnamurty, Domingo-Ferrer, Josep, and Blanco-Justicia, Alberto
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: The threat of reconstruction attacks has led the U.S. Census Bureau (USCB) to replace in the Decennial Census 2020 the traditional statistical disclosure limitation based on rank swapping with one based on differential privacy (DP), leading to substantial accuracy loss of released statistics. Yet, it has been argued that, if many different reconstructions are compatible with the released statistics, most of them do not correspond to actual original data, which protects against respondent reidentification. Recently, a new attack has been proposed, which incorporates the confidence that a reconstructed record was in the original data. The alleged risk of disclosure entailed by such confidence-ranked reconstruction has renewed the interest of the USCB to use DP-based solutions. To forestall a potential accuracy loss in future releases, we show that the proposed reconstruction is neither effective as a reconstruction method nor conducive to disclosure as claimed by its authors. Specifically, we report empirical results showing the proposed ranking cannot guide reidentification or attribute disclosure attacks, and hence fails to warrant the utility sacrifice entailed by the use of DP to release census statistical data., Comment: In Lecture Notes in Artificial Intelligence, vol. 14915, pp. 213-224. Vol. Privacy in Statistical Databases (PSD 2024), Antibes Juan-les-Pins, France, Sep. 25-27, 2024. 20 pages, 5 figures, 4 tables
Published: 2023

5. Federated learning-based natural language processing: a systematic literature review

Author: Khan, Younas, Sánchez, David, and Domingo-Ferrer, Josep
Published: 2024
Full Text: View/download PDF

6. Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Author: Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, and Li, Yiming
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our defense., Comment: This paper is accepted by PAKDD 2023
Published: 2023

7. Database Reconstruction Is Not So Easy and Is Different from Reidentification

Author: Muralidhar, Krishnamurty and Domingo-Ferrer, Josep
Subjects: Computer Science - Cryptography and Security, Computer Science - Databases, 68P27 Privacy of data, H.2, G.3
Abstract: In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this paper, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures., Comment: Journal of Official Statistics (to appear)
Published: 2023

8. GRAIMATTER Green Paper: Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)

Author: Jefferson, Emily, Liley, James, Malone, Maeve, Reel, Smarti, Crespi-Boixader, Alba, Kerasidou, Xaroula, Tava, Francesco, McCarthy, Andrew, Preen, Richard, Blanco-Justicia, Alberto, Mansouri-Benssassi, Esma, Domingo-Ferrer, Josep, Beggs, Jillian, Chuter, Antony, Cole, Christian, Ritchie, Felix, Daly, Angela, Rogers, Simon, and Smith, Jim
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: TREs are widely, and increasingly used to support statistical analysis of sensitive data across a range of sectors (e.g., health, police, tax and education) as they enable secure and transparent research whilst protecting data confidentiality. There is an increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These complex AI models require more information to describe and reproduce, increasing the possibility that sensitive personal data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all TREs to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training. GRAIMATTER has developed a draft set of usable recommendations for TREs to guard against the additional risks when disclosing trained AI models from TREs. The development of these recommendations has been funded by the GRAIMATTER UKRI DARE UK sprint research project. This version of our recommendations was published at the end of the project in September 2022. During the course of the project, we have identified many areas for future investigations to expand and test these recommendations in practice. Therefore, we expect that this document will evolve over time.
Published: 2022
Full Text: View/download PDF

9. Enhanced Security and Privacy via Fragmented Federated Learning

Author: Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, and Sanchez, David
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: In federated learning (FL), a set of participants share updates computed on their local data with an aggregator server that combines updates into a global model. However, reconciling accuracy with privacy and security is a challenge to FL. On the one hand, good updates sent by honest participants may reveal their private local information, whereas poisoned updates sent by malicious participants may compromise the model's availability and/or integrity. On the other hand, enhancing privacy via update distortion damages accuracy, whereas doing so via update aggregation damages security because it does not allow the server to filter out individual poisoned updates. To tackle the accuracy-privacy-security conflict, we propose {\em fragmented federated learning} (FFL), in which participants randomly exchange and mix fragments of their updates before sending them to the server. To achieve privacy, we design a lightweight protocol that allows participants to privately exchange and mix encrypted fragments of their updates so that the server can neither obtain individual updates nor link them to their originators. To achieve security, we design a reputation-based defense tailored for FFL that builds trust in participants and their mixed updates based on the quality of the fragments they exchange and the mixed updates they send. Since the exchanged fragments' parameters keep their original coordinates and attackers can be neutralized, the server can correctly reconstruct a global model from the received mixed updates without accuracy loss. Experiments on four real data sets show that FFL can prevent semi-honest servers from mounting privacy attacks, can effectively counter poisoning attacks and can keep the accuracy of the global model., Comment: IEEE Transactions on Neural Networks and Learning Systems (To Appear)
Published: 2022
Full Text: View/download PDF

10. Bistochastic privacy

Author: Ruiz, Nicolas and Domingo-Ferrer, Josep
Subjects: Computer Science - Cryptography and Security
Abstract: We introduce a new privacy model relying on bistochastic matrices, that is, matrices whose components are nonnegative and sum to 1 both row-wise and column-wise. This class of matrices is used to both define privacy guarantees and a tool to apply protection on a data set. The bistochasticity assumption happens to connect several fields of the privacy literature, including the two most popular models, k-anonymity and differential privacy. Moreover, it establishes a bridge with information theory, which simplifies the thorny issue of evaluating the utility of a protected data set. Bistochastic privacy also clarifies the trade-off between protection and utility by using bits, which can be viewed as a natural currency to comprehend and operationalize this trade-off, in the same way than bits are used in information theory to capture uncertainty. A discussion on the suitable parameterization of bistochastic matrices to achieve the privacy guarantees of this new model is also provided., Comment: To be published in Lecture Notes in Artificial Intelligence vol 13408, Modeling Decisions for Artificial Intelligence 19th International Conference MDAI 2022, Sant Cugat, Catalonia, August 30 - 2 September 2022
Published: 2022

11. Defending against the Label-flipping Attack in Federated Learning

Author: Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Sánchez, David, and Blanco-Justicia, Alberto
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.
Published: 2022

12. FL-Defender: Combating Targeted Attacks in Federated Learning

Author: Jebreel, Najeeb and Domingo-Ferrer, Josep
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Federated learning (FL) enables learning a global machine learning model from local data distributed among a set of participating workers. This makes it possible i) to train more accurate models due to learning from rich joint training data, and ii) to improve privacy by not sharing the workers' local private data with others. However, the distributed nature of FL makes it vulnerable to targeted poisoning attacks that negatively impact the integrity of the learned model while, unfortunately, being difficult to detect. Existing defenses against those attacks are limited by assumptions on the workers' data distribution, may degrade the global model performance on the main task and/or are ill-suited to high-dimensional models. In this paper, we analyze targeted attacks against FL and find that the neurons in the last layer of a deep learning (DL) model that are related to the attacks exhibit a different behavior from the unrelated neurons, making the last-layer gradients valuable features for attack detection. Accordingly, we propose \textit{FL-Defender} as a method to combat FL targeted attacks. It consists of i) engineering more robust discriminative features by calculating the worker-wise angle similarity for the workers' last-layer gradients, ii) compressing the resulting similarity vectors using PCA to reduce redundant information, and iii) re-weighting the workers' updates based on their deviation from the centroid of the compressed similarity vectors. Experiments on three data sets with different DL model sizes and data distributions show the effectiveness of our method at defending against label-flipping and backdoor attacks. Compared to several state-of-the-art defenses, FL-Defender achieves the lowest attack success rates, maintains the performance of the global model on the main task and causes minimal computational overhead on the server.
Published: 2022

13. A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

Author: Blanco-Justicia, Alberto, Sanchez, David, Domingo-Ferrer, Josep, and Muralidhar, Krishnamurty
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning, I.2.6
Abstract: We review the use of differential privacy (DP) for privacy protection in machine learning (ML). We show that, driven by the aim of preserving the accuracy of the learned models, DP-based ML implementations are so loose that they do not offer the ex ante privacy guarantees of DP. Instead, what they deliver is basically noise addition similar to the traditional (and often criticized) statistical disclosure control approach. Due to the lack of formal privacy guarantees, the actual level of privacy offered must be experimentally assessed ex post, which is done very seldom. In this respect, we present empirical results showing that standard anti-overfitting techniques in ML can achieve a better utility/privacy/efficiency trade-off than DP., Comment: ACM Computing Surveys (to appear)
Published: 2022
Full Text: View/download PDF

14. MemberShield: A framework for federated learning with membership privacy

Author: Ahmed, Faisal, Sánchez, David, Haddi, Zouhair, and Domingo-Ferrer, Josep
Published: 2025
Full Text: View/download PDF

15. Circuit-Free General-Purpose Multi-Party Computation via Co-Utile Unlinkable Outsourcing

Author: Domingo-Ferrer, Josep and Manjón, Jesús
Subjects: Computer Science - Cryptography and Security, 68M25, 68M14, 68P27, K.6.5, C.2.4
Abstract: Multiparty computation (MPC) consists in several parties engaging in joint computation in such a way that each party's input and output remain private to that party. Whereas MPC protocols for specific computations have existed since the 1980s, only recently general-purpose compilers have been developed to allow MPC on arbitrary functions. Yet, using today's MPC compilers requires substantial programming effort and skill on the user's side, among other things because nearly all compilers translate the code of the computation into a Boolean or arithmetic circuit. In particular, the circuit representation requires unrolling loops and recursive calls, which forces programmers to (often manually) define loop bounds and hardly use recursion. We present an approach allowing MPC on an arbitrary computation expressed as ordinary code with all functionalities that does not need to be translated into a circuit. Our notion of input and output privacy is predicated on unlinkability. Our method leverages co-utile computation outsourcing using anonymous channels via decentralized reputation, makes a minimalistic use of cryptography and does not require participants to be honest-but-curious: it works as long as participants are rational (self-interested), which may include rationally malicious peers (who become attackers if this is advantageous to them). We present example applications, including e-voting. Our empirical work shows that reputation captures well the behavior of peers and ensures that parties with high reputation obtain correct results., Comment: IEEE Transactions on Dependable and Secure Computing, to appear
Published: 2021

16. Secure and Privacy-Preserving Federated Learning via Co-Utility

Author: Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, Manjón, Jesús, and Sánchez, David
Subjects: Computer Science - Cryptography and Security, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, 68P27, 68Txx, 91, I.2.11, K.6.5
Abstract: The decentralized nature of federated learning, that often leverages the power of edge devices, makes it vulnerable to attacks against privacy and security. The privacy risk for a peer is that the model update she computes on her private data may, when sent to the model manager, leak information on those private data. Even more obvious are security attacks, whereby one or several malicious peers return wrong model updates in order to disrupt the learning process and lead to a wrong model being learned. In this paper we build a federated learning framework that offers privacy to the participating peers as well as security against Byzantine and poisoning attacks. Our framework consists of several protocols that provide strong privacy to the participating peers via unlinkable anonymity and that are rationally sustainable based on the co-utility property. In other words, no rational party is interested in deviating from the proposed protocols. We leverage the notion of co-utility to build a decentralized co-utile reputation management system that provides incentives for parties to adhere to the protocols. Unlike privacy protection via differential privacy, our approach preserves the values of model updates and hence the accuracy of plain federated learning; unlike privacy protection via update aggregation, our approach preserves the ability to detect bad model updates while substantially reducing the computational overhead compared to methods based on homomorphic encryption., Comment: IEEE Internet of Things Journal, to appear
Published: 2021

17. Achieving Security and Privacy in Federated Learning Systems: Survey, Research Challenges and Future Directions

Author: Blanco-Justicia, Alberto, Domingo-Ferrer, Josep, Martínez, Sergio, Sánchez, David, Flanagan, Adrian, and Tan, Kuan Eeik
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: Federated learning (FL) allows a server to learn a machine learning (ML) model across multiple decentralized clients that privately store their own training data. In contrast with centralized ML approaches, FL saves computation to the server and does not require the clients to outsource their private data to the server. However, FL is not free of issues. On the one hand, the model updates sent by the clients at each training epoch might leak information on the clients' private data. On the other hand, the model learnt by the server may be subjected to attacks by malicious clients; these security attacks might poison the model or prevent it from converging. In this paper, we first examine security and privacy attacks to FL and critically survey solutions proposed in the literature to mitigate each attack. Afterwards, we discuss the difficulty of simultaneously achieving security and privacy protection. Finally, we sketch ways to tackle this open problem and attain both security and privacy., Comment: 40 pages, 19 figures
Published: 2020

18. LFighter: Defending against the label-flipping attack in federated learning

Author: Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Sánchez, David, and Blanco-Justicia, Alberto
Published: 2024
Full Text: View/download PDF

19. The Limits of Differential Privacy (and its Misuse in Data Release and Machine Learning)

Author: Domingo-Ferrer, Josep, Sánchez, David, and Blanco-Justicia, Alberto
Subjects: Computer Science - Cryptography and Security
Abstract: Differential privacy (DP) is a neat privacy definition that can co-exist with certain well-defined data uses in the context of interactive queries. However, DP is neither a silver bullet for all privacy problems nor a replacement for all previous privacy models. In fact, extreme care should be exercised when trying to extend its use beyond the setting it was designed for. This paper reviews the limitations of DP and its misuse for individual data collection, individual data release, and machine learning., Comment: Communications of the ACM, to appear
Published: 2020

20. Multi-Dimensional Randomized Response

Author: Domingo-Ferrer, Josep and Soria-Comas, Jordi
Subjects: Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms
Abstract: In our data world, a host of not necessarily trusted controllers gather data on individual subjects. To preserve her privacy and, more generally, her informational self-determination, the individual has to be empowered by giving her agency on her own data. Maximum agency is afforded by local anonymization, that allows each individual to anonymize her own data before handing them to the data controller. Randomized response (RR) is a local anonymization approach able to yield multi-dimensional full sets of anonymized microdata that are valid for exploratory analysis and machine learning. This is so because an unbiased estimate of the distribution of the true data of individuals can be obtained from their pooled randomized data. Furthermore, RR offers rigorous privacy guarantees. The main weakness of RR is the curse of dimensionality when applied to several attributes: as the number of attributes grows, the accuracy of the estimated true data distribution quickly degrades. We propose several complementary approaches to mitigate the dimensionality problem. First, we present two basic protocols, separate RR on each attribute and joint RR for all attributes, and discuss their limitations. Then we introduce an algorithm to form clusters of attributes so that attributes in different clusters can be viewed as independent and joint RR can be performed within each cluster. After that, we introduce an adjustment algorithm for the randomized data set that repairs some of the accuracy loss due to assuming independence between attributes when using RR separately on each attribute or due to assuming independence between clusters in cluster-wise RR. We also present empirical work to illustrate the proposed methods., Comment: IEEE Transactions on Knowledge and Data Engineering, to appear. (First version submitted on May 8, 2019 as TKDE-2019-05-0430; first revision submitted on July 13, 2020 as TKDE-2019-05-0430.R1; second revision submitted on Nov. 5, 2020 as TKDE-2019-05-0430.R2 and accepted without changes on Dec. 16, 2020)
Published: 2020

21. General Confidentiality and Utility Metrics for Privacy-Preserving Data Publishing Based on the Permutation Model

Author: Domingo-Ferrer, Josep, Muralidhar, Krishnamurty, and Bras-Amorós, Maria
Subjects: Computer Science - Cryptography and Security
Abstract: Anonymization for privacy-preserving data publishing, also known as statistical disclosure control (SDC), can be viewed under the lens of the permutation model. According to this model, any SDC method for individual data records is functionally equivalent to a permutation step plus a noise addition step, where the noise added is marginal, in the sense that it does not alter ranks. Here, we propose metrics to quantify the data confidentiality and utility achieved by SDC methods based on the permutation model. We distinguish two privacy notions: in our work, anonymity refers to subjects and hence mainly to protection against record re-identification, whereas confidentiality refers to the protection afforded to attribute values against attribute disclosure. Thus, our confidentiality metrics are useful even if using a privacy model ensuring an anonymity level ex ante. The utility metric is a general-purpose metric that can be conveniently traded off against the confidentiality metrics, because all of them are bounded between 0 and 1. As an application, we compare the utility-confidentiality trade-offs achieved by several anonymization approaches, including privacy models (k-anonymity and $\epsilon$-differential privacy) as well as SDC methods (additive noise, multiplicative noise and synthetic data) used without privacy models., Comment: IEEE Transactions on Dependable and Secure Computing (to appear)
Published: 2020

22. Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Author: Nanni, Mirco, Andrienko, Gennady, Barabási, Albert-László, Boldrini, Chiara, Bonchi, Francesco, Cattuto, Ciro, Chiaromonte, Francesca, Comandé, Giovanni, Conti, Marco, Coté, Mark, Dignum, Frank, Dignum, Virginia, Domingo-Ferrer, Josep, Ferragina, Paolo, Giannotti, Fosca, Guidotti, Riccardo, Helbing, Dirk, Kaski, Kimmo, Kertesz, Janos, Lehmann, Sune, Lepri, Bruno, Lukowicz, Paul, Matwin, Stan, Jiménez, David Megías, Monreale, Anna, Morik, Katharina, Oliver, Nuria, Passarella, Andrea, Passerini, Andrea, Pedreschi, Dino, Pentland, Alex, Pianesi, Fabio, Pratesi, Francesca, Rinzivillo, Salvatore, Ruggieri, Salvatore, Siebes, Arno, Trasarti, Roberto, Hoven, Jeroen van den, and Vespignani, Alessandro
Subjects: Computer Science - Computers and Society, Computer Science - Social and Information Networks
Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society., Comment: Revised text. Additional authors
Published: 2020

23. Generation of Synthetic Trajectory Microdata from Language Models

Author: Blanco-Justicia, Alberto, Jebreel, Najeeb Moharram, Manjón, Jesús A., Domingo-Ferrer, Josep, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Domingo-Ferrer, Josep, editor, and Laurent, Maryline, editor
Published: 2022
Full Text: View/download PDF

24. Tit-for-Tat Disclosure of a Binding Sequence of User Analyses in Safe Data Access Centers

Author: Domingo-Ferrer, Josep, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Domingo-Ferrer, Josep, editor, and Laurent, Maryline, editor
Published: 2022
Full Text: View/download PDF

25. Fair detection of poisoning attacks in federated learning on non-i.i.d. data

Author: Singh, Ashneet Khandpur, Blanco-Justicia, Alberto, and Domingo-Ferrer, Josep
Published: 2023
Full Text: View/download PDF

26. Explaining predictions and attacks in federated learning via random forests

Author: Haffar, Rami, Sánchez, David, and Domingo-Ferrer, Josep
Published: 2023
Full Text: View/download PDF

27. Secure, accurate and privacy-aware fully decentralized learning via co-utility

Author: Manjón, Jesús, Domingo-Ferrer, Josep, Sánchez, David, and Blanco-Justicia, Alberto
Published: 2023
Full Text: View/download PDF

28. The future of statistical disclosure control

Author: Elliot, Mark and Domingo-Ferrer, Josep
Subjects: Computer Science - Cryptography and Security, 94A60, K.4.1, D.4.6, H.2.0
Abstract: Statistical disclosure control (SDC) was not created in a single seminal paper nor following the invention of a new mathematical technique, rather it developed slowly in response to the practical challenges faced by data practitioners based at national statistical institutes (NSIs). SDC's subsequent emergence as a specialised academic field was an outcome of three interrelated socio-technical changes: (i) the advent of accessible computing as a research tool in the 1980s meant that it became possible - and then increasingly easy - for researchers to process larger quantities of data automatically; this naturally increased demand for such data; (ii) it became possible for data holders to process and disseminate detailed data as digital files and (iii) the number of organisations holding data about individuals proliferated. This also meant the number of potential adversaries with the resources to attack any given dataset increased exponentially. In this article, we describe the state of the art for SDC and then discuss the core issues and future challenges. In particular, we touch on SDC and big data, on SDC and machine learning, and on SDC and anti-discrimination., Comment: A contributing article to the National Statistician's Quality Review into Privacy and Data Confidentiality Methods
Published: 2018

29. Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Author: Jebreel, Najeeb Moharram, primary, Domingo-Ferrer, Josep, additional, and Li, Yiming, additional
Published: 2023
Full Text: View/download PDF

30. Measuring Fairness in Machine Learning Models via Counterfactual Examples

Author: Haffar, Rami, Singh, Ashneet Khandpur, Domingo-Ferrer, Josep, Jebreel, Najeeb, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Torra, Vicenç, editor, and Narukawa, Yasuo, editor
Published: 2022
Full Text: View/download PDF

31. FL-Defender: Combating targeted attacks in federated learning

Author: Jebreel, Najeeb Moharram and Domingo-Ferrer, Josep
Published: 2023
Full Text: View/download PDF

32. How to Avoid Reidentification with Proper Anonymization

Author: Sánchez, David, Martínez, Sergio, and Domingo-Ferrer, Josep
Subjects: Computer Science - Cryptography and Security, 68, K.4.1
Abstract: De Montjoye et al. claimed that most individuals can be reidentified from a deidentified transaction database and that anonymization mechanisms are not effective against reidentification. We demonstrate that anonymization can be performed by techniques well established in the literature., Comment: 5 pages
Published: 2018
Full Text: View/download PDF

33. Connecting Randomized Response, Post-Randomization, Differential Privacy and t-Closeness via Deniability and Permutation

Author: Domingo-Ferrer, Josep and Soria-Comas, Jordi
Subjects: Computer Science - Cryptography and Security, 68P99, H.2.7, K.4.1
Abstract: We explore some novel connections between the main privacy models in use and we recall a few known ones. We show these models to be more related than commonly understood, around two main principles: deniability and permutation. In particular, randomized response turns out to be very modern in spite of it having been introduced over 50 years ago: it is a local anonymization method and it allows understanding the protection offered by $\epsilon$-differential privacy when $\epsilon$ is increased to improve utility. A similar understanding on the effect of large $\epsilon$ in terms of deniability is obtained from the connection between $\epsilon$-differential privacy and t-closeness. Finally, the post-randomization method (PRAM) is shown to be viewable as permutation and to be connected with randomized response and differential privacy. Since the latter is also connected with t-closeness, it follows that the permutation principle can explain the guarantees offered by all those models. Thus, calibrating permutation is very relevant in anonymization, and we conclude by sketching two ways of doing it., Comment: Submitted manuscript
Published: 2018

34. Detecting Bad Answers in Survey Data Through Unsupervised Machine Learning

Author: Jebreel, Najeeb Moharram, Haffar, Rami, Singh, Ashneet Khandpur, Sánchez, David, Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Domingo-Ferrer, Josep, editor, and Muralidhar, Krishnamurty, editor
Published: 2020
Full Text: View/download PDF

35. -Differential Privacy for Microdata Releases Does Not Guarantee Confidentiality (Let Alone Utility)

Author: Muralidhar, Krishnamurty, Domingo-Ferrer, Josep, Martínez, Sergio, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Domingo-Ferrer, Josep, editor, and Muralidhar, Krishnamurty, editor
Published: 2020
Full Text: View/download PDF

36. Differentially private publication of database streams via hybrid video coding

Author: Parra-Arnau, Javier, Strufe, Thorsten, and Domingo-Ferrer, Josep
Published: 2022
Full Text: View/download PDF

37. Decentralized [formula omitted]-anonymization of trajectories via privacy-preserving tit-for-tat

Author: Domingo-Ferrer, Josep, Martínez, Sergio, and Sánchez, David
Published: 2022
Full Text: View/download PDF

38. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees

Author: Soria-Comas, Jordi, Domingo-Ferrer, Josep, Sánchez, David, and Megías, David
Subjects: Computer Science - Cryptography and Security
Abstract: Differential privacy is a popular privacy model within the research community because of the strong privacy guarantee it offers, namely that the presence or absence of any individual in a data set does not significantly influence the results of analyses on the data set. However, enforcing this strict guarantee in practice significantly distorts data and/or limits data uses, thus diminishing the analytical utility of the differentially private results. In an attempt to address this shortcoming, several relaxations of differential privacy have been proposed that trade off privacy guarantees for improved data utility. In this work, we argue that the standard formalization of differential privacy is stricter than required by the intuitive privacy guarantee it seeks. In particular, the standard formalization requires indistinguishability of results between any pair of neighbor data sets, while indistinguishability between the actual data set and its neighbor data sets should be enough. This limits the data controller's ability to adjust the level of protection to the actual data, hence resulting in significant accuracy loss. In this respect, we propose individual differential privacy, an alternative differential privacy notion that offers em the same privacy guarantees as standard differential privacy to individuals (even though not to groups of individuals). This new notion allows the data controller to adjust the distortion to the actual data set, which results in less distortion and more analytical accuracy. We propose several mechanisms to attain individual differential privacy and we compare the new notion against standard differential privacy in terms of the accuracy of the analytical results.
Published: 2016
Full Text: View/download PDF

39. Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

Author: Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Torra, Vicenç, editor, and Narukawa, Yasuo, editor
Published: 2021
Full Text: View/download PDF

40. Explaining Image Misclassification in Deep Learning via Adversarial Examples

Author: Haffar, Rami, Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Sánchez, David, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Torra, Vicenç, editor, and Narukawa, Yasuo, editor
Published: 2021
Full Text: View/download PDF

41. The counterfactual framework in Jarmin et al. is not a measure of disclosure risk of respondents

Author: Muralidhar, Krishnamurty, primary, Ruggles, Steven, additional, Domingo-Ferrer, Josep, additional, and Sánchez, David, additional
Published: 2024
Full Text: View/download PDF

42. Privacy-Preserving Technologies

Author: Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, Gordijn, Bert, Series Editor, Roeser, Sabine, Series Editor, Birnbacher, Dieter, Editorial Board Member, Brownsword, Roger, Editorial Board Member, Chadwick, Ruth, Editorial Board Member, Dempsey, Paul Stephen, Editorial Board Member, Froomkin, Michael, Editorial Board Member, Gutwirth, Serge, Editorial Board Member, Ten Have, Henk, Editorial Board Member, Holm, Søren, Editorial Board Member, Khushf, George, Editorial Board Member, Kirby, Justice Michael, Editorial Board Member, Knoppers, Bartha, Editorial Board Member, Krieger, David, Editorial Board Member, Laurie, Graeme, Editorial Board Member, Oosterlinck, René, Editorial Board Member, Weckert, John, Editorial Board Member, Christen, Markus, editor, and Loi, Michele, editor
Published: 2020
Full Text: View/download PDF

43. Privacy-Preserving Computation of the Earth Mover’s Distance

Author: Blanco-Justicia, Alberto, Domingo-Ferrer, Josep, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Susilo, Willy, editor, Deng, Robert H., editor, Guo, Fuchun, editor, Li, Yannan, editor, and Intan, Rolly, editor
Published: 2020
Full Text: View/download PDF

44. Explaining Misclassification and Attacks in Deep Learning via Random Forests

Author: Haffar, Rami, Domingo-Ferrer, Josep, Sánchez, David, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Torra, Vicenç, editor, Narukawa, Yasuo, editor, Nin, Jordi, editor, and Agell, Núria, editor
Published: 2020
Full Text: View/download PDF

45. Efficient Detection of Byzantine Attacks in Federated Learning Using Last Layer Biases

Author: Jebreel, Najeeb, Blanco-Justicia, Alberto, Sánchez, David, Domingo-Ferrer, Josep, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Torra, Vicenç, editor, Narukawa, Yasuo, editor, Nin, Jordi, editor, and Agell, Núria, editor
Published: 2020
Full Text: View/download PDF

46. Achieving security and privacy in federated learning systems: Survey, research challenges and future directions

Author: Blanco-Justicia, Alberto, Domingo-Ferrer, Josep, Martínez, Sergio, Sánchez, David, Flanagan, Adrian, and Tan, Kuan Eeik
Published: 2021
Full Text: View/download PDF

47. Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Author: Nanni, Mirco, Andrienko, Gennady, Barabási, Albert-László, Boldrini, Chiara, Bonchi, Francesco, Cattuto, Ciro, Chiaromonte, Francesca, Comandé, Giovanni, Conti, Marco, Coté, Mark, Dignum, Frank, Dignum, Virginia, Domingo-Ferrer, Josep, Ferragina, Paolo, Giannotti, Fosca, Guidotti, Riccardo, Helbing, Dirk, Kaski, Kimmo, Kertesz, Janos, Lehmann, Sune, Lepri, Bruno, Lukowicz, Paul, Matwin, Stan, Jiménez, David Megías, Monreale, Anna, Morik, Katharina, Oliver, Nuria, Passarella, Andrea, Passerini, Andrea, Pedreschi, Dino, Pentland, Alex, Pianesi, Fabio, Pratesi, Francesca, Rinzivillo, Salvatore, Ruggieri, Salvatore, Siebes, Arno, Torra, Vicenc, Trasarti, Roberto, Hoven, Jeroen van den, and Vespignani, Alessandro
Published: 2021
Full Text: View/download PDF

48. Flexible Attribute-Based Encryption Applicable to Secure E-Healthcare Records

Author: Qin, Bo, Deng, Hua, Wu, Qianhong, Domingo-Ferrer, Josep, Naccache, David, and Zhou, Yunya
Subjects: Computer Science - Cryptography and Security
Abstract: In e-healthcare record systems (EHRS), attribute-based encryption (ABE) appears as a natural way to achieve fine-grained access control on health records. Some proposals exploit key-policy ABE (KP-ABE) to protect privacy in such a way that all users are associated with specific access policies and only the ciphertexts matching the users' access policies can be decrypted. An issue with KP-ABE is that it requires an a priori formulation of access policies during key generation, which is not always practicable in EHRS because the policies to access health records are sometimes determined after key generation. In this paper, we revisit KPABE and propose a dynamic ABE paradigm, referred to as access policy redefinable ABE (APR-ABE). To address the above issue, APR-ABE allows users to redefine their access policies and delegate keys for the redefined ones; hence a priori precise policies are no longer mandatory. We construct an APR-ABE scheme with short ciphertexts and prove its full security in the standard model under several static assumptions.
Published: 2015
Full Text: View/download PDF

49. Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

Author: D'Acquisto, Giuseppe, Domingo-Ferrer, Josep, Kikiras, Panayiotis, Torra, Vicenç, de Montjoye, Yves-Alexandre, and Bourka, Athena
Subjects: Computer Science - Cryptography and Security, 94A60, K.4.1, D.4.6, H.2.0
Abstract: The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals' private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguards in the analytics value chain. ENISA, with the current report, supports this approach and the position that the challenges of technology (for big data) should be addressed by the opportunities of technology (for privacy). We first explain the need to shift from "big data versus privacy" to "big data with privacy". In this respect, the concept of privacy by design is key to identify the privacy requirements early in the big data analytics value chain and in subsequently implementing the necessary technical and organizational measures. After an analysis of the proposed privacy by design strategies in the different phases of the big data value chain, we review privacy enhancing technologies of special interest for the current and future big data landscape. In particular, we discuss anonymization, the "traditional" analytics technique, the emerging area of encrypted search and privacy preserving computations, granular access control mechanisms, policy enforcement and accountability, as well as data provenance issues. Moreover, new transparency and access tools in big data are explored, together with techniques for user empowerment and control. Achieving "big data with privacy" is no easy task and a lot of research and implementation is still needed. Yet, it remains a possible task, as long as all the involved stakeholders take the necessary steps to integrate privacy and data protection safeguards in the heart of big data, by design and by default., Comment: 80 pages. European Union Agency for Network and Information Security (ENISA) report, December 2015, ISBN 978-92-9204-160-1. https://www.enisa.europa.eu/activities/identity-and-trust/library/deliverables/big-data-protection/
Published: 2015
Full Text: View/download PDF

50. t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Author: Soria-Comas, Jordi, Domingo-Ferrer, Josep, Sánchez, David, and Martínez, Sergio
Subjects: Computer Science - Cryptography and Security
Abstract: Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate $k$-anonymous data sets, where the identity of each subject is hidden within a group of $k$ subjects. Unlike generalization, microaggregation perturbs the data and this additional masking freedom allows improving data utility in several ways, such as increasing data granularity, reducing the impact of outliers and avoiding discretization of numerical data. $k$-Anonymity, on the other side, does not protect against attribute disclosure, which occurs if the variability of the confidential values in a group of $k$ subjects is too small. To address this issue, several refinements of $k$-anonymity have been proposed, among which $t$-closeness stands out as providing one of the strictest privacy guarantees. Existing algorithms to generate $t$-close data sets are based on generalization and suppression (they are extensions of $k$-anonymization algorithms based on the same principles). This paper proposes and shows how to use microaggregation to generate $k$-anonymous $t$-close data sets. The advantages of microaggregation are analyzed, and then several microaggregation algorithms for $k$-anonymous $t$-closeness are presented and empirically evaluated.
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,807 results on '"Domingo-Ferrer, Josep"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources