In April 2025, the European Data Protection Board published a Report providing guidance on managing privacy risks associated with AI systems based on Large Language Models (LLMs). The document sets out a practical approach across the lifecycle of such systems, identifying risks such as exposure of sensitive data, data bias, and insufficient safeguards. It recommends measures including algorithmic audits, regularly updated risk registers, and incident response mechanisms, with the methodology illustrated through practical case studies. The Report also underscores the importance of conducting Data Protection Impact Assessments (DPIAs) and implementing dedicated compliance programmes for LLMs.
I. Context
In April 2025, the European Data Protection Board (EDPB), in the context of the “Support Pool of Experts” program, published a Report (the “Report”)prepared by Isabel Barbera1 with recommendations for the identification, management and mitigation of privacy and data protection risks associated with artificial intelligence (AI) systems based on Large Language Models (LLM).
This Report, although non-binding, is of particular relevance to providers, deployers and users of LLMs, as well as to Supervisory Authorities, since it presents a practical and structured approach to risk management that covers the entire life cycle of the model, also using illustrative case studies. In other words, it consolidates the normative, technical and organizational criteria with a relevant impact on the interpretation and application of the General Data Protection Regulation (GDPR) in the context of generative AI systems and large-scale natural language processing, and is a support (and never a substitute) for data protection impact assessment
First of all, we would like to highlight the proposed definition of what an LLM is and its distinction from the concept of an AI system. The brief reference to “Agentic AI”, as well as the effort to define it and identify its benefits and risks, also demonstrates the marked growth of this type of system.
Equally useful is the list of LLM performance indicators (which can help those responsible for implementation to evaluate and select them) and the type of deployment generally associated with LLMs – LLM as a service, “off-the shelf” LLMs and ’self-developed LLMs”.2
II. Privacy Risks Throughout the Life Cycle
The life cycle of these LLM systems is marked by various phases with associated data flows. A number of risks are identified that may occur in each of these phases, among which we highlight:
Phase 1: inception and design
- In this phase, decisions are made about data requirements, collection methods and processing strategies.
Risk: the selection of data sources can pose risks if it includes personal or sensitive data without adequate safeguards.
Phase 2: data preparation and pre-treatment
- Raw data is collected, cleaned, sometimes anonymized, and prepared for training or fine-tuning. Sources include web data, public repositories, proprietary data or data obtained through partnerships.
Risks: - Inadvertent inclusion of personal and/or sensitive data;
- Violation of the principles of purpose limitation, data minimization and lawful processing;
- Bias in the data can result in unfair or discriminatory predictions.
Phase 3: model training
- The prepared data is used to train the model in a large-scale process.
Risk: the model can memorize sensitive data and massively expose it in the results, violating privacy.
Phase 4: verification and validation
- Evaluation of the model with test sets, often based on real scenarios.
Risk: real data can expose sensitive information if not anonymized.
Phase 5: deployment
- The model begins to interact with real-time data from users and other systems.
Risks: - Collection and processing of user inputs which may contain personal data and/or sensitive data;
- Risk of inference of personal data, even without direct access to identifying attributes.
Phase 6: operation and monitoring
- Continuous input of data for performance monitoring and optimization.
Risk: records of interactions may retain personal data, increasing the risk of leakage or misuse.
Phase 7: re-evaluation, maintenance and updates
- New data may be collected to update or improve the model.
Risk: using real user data without consent may violate privacy principles.
Phase 8: retirement
- Data associated with the model is archived or deleted.
Risk: failure to properly delete personal data can cause long-term vulnerabilities.
III. Risk Identification and Assessment
The Report identifies various risk factors for the privacy and protection of personal data, including:
- The particularly sensitive nature of the data processed (e.g., special category data, such as biometric data, and data of vulnerable persons, such as minors).
- The volume of data processed;
- The low quality of the data used for input and for training the system;
- Insufficient security measures.
The risk assessment methodology should be based on two main vectors:
- Severity of the potential impact on data subjects (taking into account criteria such as the intensity, duration and reversibility of the impact).
- Probability of occurrence (taking into account criteria such as frequency of use, degree of autonomy of the system, existence of human supervision, context of use).
IV. Recommended Technical and Organizational Measures
In particular, risk control strategies and various detailed mitigation measures are proposed, specifying which are the responsibility of the providers and those responsible for implementing these systems. Among the main guidelines, the following stand out:
- Documentation of all the data sources used to train the models;
- Implementation of algorithmic auditing and bias detection systems;
- Establishment of periodically updated risk registers that include, among other things, details of Data Protection Impact Assessments (DPIA) carried out and mitigation measures adopted;
- Creation of an incident response mechanism;
- Adoption of an active risk governance model, with continuous cycles of assessment, testing and adaptation of control measures.
The Report also applies this methodology to three practical cases, namely: (i) a chatbot for consumer issues; (ii) an LLM system for monitoring student progress; and (iii) an AI assistant for booking trips. Section 10 of the report also compiles a number of useful tools and benchmarks for providers and users of LLM systems.
V. Conclusions and Recommendations
The Report represents a significant step forward in consolidating guiding principles for the development and use of LLMs in the light of data protection rules. This topic was also addressed by the CEDP in Opinion 28/2024 of December 18, 2024, analysed here (only available in Portuguese).
In this sense, in order to ensure compliance with the Recommendations, it is essential to carry out:
- Risk assessments and specific DPIAs for LLMs;
- Definition of internal policies and risk mitigation procedures;
- Adopting internal AI compliance programs.
The Intellectual Property, Technology and Personal Data team continues to monitor developments in this area.
_______________________
1Isabel Barberá | LinkedIn
2See full description on page 27 of the Report.