Surveys have found that patients' opinions of how their data should be protected fall along a continuum, and although most can be classified as cautious regarding the use of their healthcare record (EHR) data for research, the majority of patients are not averse to the idea.1 The most prevalent reason for keeping EHR data private is the perceived risk to patients' personal lives. Stigmatizing health conditions appearing in the EHR can threaten social relationships and status.2 It should not be taken for granted that EHR data be used for anything other than caring for the specific patient from whom it was collected, but when patients' legitimate concerns are dealt with in a sensitive manner, it is possible to work with EHR data in ethical ways while promoting clinical research. We have identified three areas of importance in maintaining patient privacy: de-identification of data, the patients' trust in the researcher and in the research and the technical data security of the computer system. Is there a form of de-identification appropriate for all-comers? Algorithms have been developed for structured data to prevent the disclosure of sensitive information even when distributed at a detailed, non-aggregated, line-item patient level using a method known as ‘k-level anonymity’,3 4 where k represents the number of peoples' records that must be indistinguishable from another record in the set if it is to pass scrutiny. When a patient record exceeds this level of uniqueness, data values are removed until the records are no longer unique. Although superficially such methods seem like an adequate solution to the de-identification problem, they have been shown to be subject to ‘reverse engineering’, an undoing of the obfuscation.5 Furthermore, they often remove critical attributes from the data.6 Another popular form of de-identification used for medical records is the ‘scrubbing’ of textual medical reports.7–9 Computer programs search the text and attempt to remove patient names, dates, locations and other potentially identifying information. These programs perform to various levels of accuracy, and involve similar trade-offs as those described above for structured data. To ensure that the data are de-identified and ‘unmatchable’ to the original record, sentence structure and other important attributes of the data must often be removed.10 The failure of technology alone to offer a foolproof de-identification solution is not surprising. People are extremely resourceful at solving challenging puzzles, such as the re-identification of de-identified data. However, the true risk may be greatly overemphasized by these demonstrations,11 and results in two not entirely satisfactory approaches to de-identification: one that produces de-identified output that has become stripped of meaningful data, and another that maintains germane information using methods that can be breached if they fall into the wrong hands. In attempts to solve this paradox, illogical decisions can be made about patient privacy solutions. For example, at Marshfield Clinic there has been an enormous investment in a bank of over 20 000 consented patients who are genotyped using donated blood and tissue. These genotypes are combined with de-identified phenotypic data from the Marshfield Clinic electronic medical record.12 A well-intentioned policy was put into place to keep the people who view identified phenotypic data from having access to the associated de-identified genomic data, with the reasoning that a person who could see both datasets might find a way to tie them together. As all the physicians at Marshfield Clinic must have access to the EHR, the outcome is that many Marshfield physicians who are investigators can not look at the data from their studies. One approach to resolving such privacy management discordance is to match the level of data de-identification with the trustworthiness of the data recipients, in which the more identified the data, the more ‘trustworthy’ the recipients are required to be, and vice versa. This solution necessitates that the idea of trustworthiness be quantified and governed by established socially acceptable processes, such as criminal history checks, letters of reference and credentialing systems that have been used in many scenarios in society to perform objective trust assessments. Specific methods used at Partners Healthcare and Harvard University will be described later in the paper. The level of trust for a data recipient becomes a critical factor in determining what data may be seen by that person. We also need to consider the technical protection of the patient data itself, for which the Health Information Technology for Economic and Clinical Health (HITECH) Act requires covered entities to conduct a risk analyses and implement physical, administrative and technical safeguards that each covered entity determines are reasonable and appropriate.13 Technical safeguards to consider consist of user access and authentication controls, assignment of privileges, maintaining file and system integrity, back-ups, monitoring processes, log-keeping, auditing and physically securing the data. A range of possible solutions exists for managing the technical protection of the data and represents different choices of the competing aspects of risk, cost and flexibility. A solution at the University of California at San Francisco (UCSF) was to create an exclusive, protected area for data and analysis inside a specially firewalled area for the research community. The incentive to use the protected area is that legal coverage is provided should a data breach occur within the protected area. This solution guarantees that the technical safeguards implemented by the institution within the protected area, such as firewalls and network intrusion detection, virtual private networks and disk encryption are followed by the researchers. However, this requires a high resource commitment from the institution to maintain the protected area, and the use of specialized software on privately funded platforms is not supported. With more responsibility and trust given to the researchers, institutions such as Partners Healthcare have similar policies as UCSF; however, researchers are free to use most areas behind the institutional firewalls. Researchers must prove their knowledge of the security policies by taking a certified course on human subject research protection and by specifying the technical protections of the patient data in their institutional review board (IRB) applications. The researchers have more freedom to use their local computational platforms and software, but the institution loses the ‘guarantee’ of a flawless implementation of its technical security policies as there would be in the UCSF solution. Therefore, the more liberal solution at institutions such as Partners Healthcare leads to more attention needing to be paid to data de-identification or encryption, and better determination of the trustworthiness and abilities of its data recipients to set up a technically safe environment. Our objective was to create the i2b2 software platform so that it complied with real-world use cases for how patient privacy solutions were implemented, but given that no solution would be perfect, represents a balance between the data de-identification technology, the safety of the technical platform, and the requirement of various levels of trust in the researchers. The use cases were simplified to five patient privacy levels with clear requirements in each of these three components, not because the situation was simple, but because of the complexity of keeping the platform consistent across the data protection levels. Of course, as i2b2 is open source it can be adapted to satisfy the patient privacy requirements of a local site; however, careful attention must be paid to a consistent data protection formulation throughout the platform.