MuleSoft Accelerator for Healthcare icon

MuleSoft Accelerator for Healthcare

(1 review)

Use case 9 - Patient de-identification solution

Remove patient identifying information to power research and AI initiatives


In today's healthcare landscape, digital transformation has significantly increased the amount of patient data available for clinical research and AI applications. These large datasets are used to drive new discoveries that can improve patient's lives and to power AI solutions that streamline processes for clinicians and supplement their medical knowledge. To ensure that patient privacy and safety standards are upheld when working with such datasets, certain personal and identifiable information about patients must be de-identified.

Use case description

This use case provides organizations with a method to connect to their EHR systems, extract patient data and remove targeted identifiable information to maintain privacy for subsequent analysis. The solution provides users with the ability to configure a FHIR-based de-identification application through two primary components.

First, this solution provides a grouping of implementation templates and API specifications including pre-built mappings to invoke a FHIR server, de-identify the received dataset and produce a de-identified FHIR output to be sent to a data lake of the organization’s choosing. The solution has been designed to support Patient, Condition, and Observation FHIR resources; however, it can be extended to include others.

The second component of the solution is a web-based configuration tool that provides the user with the ability to make selections and apply specific business rules that tend to differ across de-identification implementations. As an example, in some pediatric research settings, a full birth date is required in the output as the exact number of days can be relevant in research while in other studies of more rare diseases the inclusion of a full birth date can increase likelihood of re-identification. The combination of the reference implementation and configuration tool provides customers with both a foundation in which to start their de-identification efforts while also giving them the flexibility to apply logic as required by the various efforts they support.

Functional view

Patient De-identification functional view


De-identificationA process to remove information that can identify an individual.
PHIProtected Health Information is any information in a patient medical record that can identify an individual.
EHRAn Electronic Health Record is a digital version of a patient's paper chart. It contains the patient's medical history, diagnoses, medications, treatment plans, immunization dates, allergies, radiology images, and laboratory test results.
FHIRFast Healthcare Interoperability Resources is an HL7 specification for healthcare interoperability. It is a JSON-based standard describing data formats, elements and an application programming interface for exchanging electronic health records.
Amazon HealthLakeHIPAA-eligible service offering healthcare and life sciences companies a complete view of individual or patient population health data for query and analytics at scale.

High-level architecture

Patient De-identification high level architecture


End-to-end scenarios

  • Patient information is extracted from EHR systems on a configured interval. Default and user configured rules are then applied to de-identify the data and make it available in the Amazon HealthLake system.

Processing logic

The Patient De-Identification Process API orchestrates the patient data de-identification process as follows:

  1. A scheduled process is triggered to initiate the flow.
  2. Retrieve the list of data export configurations from a data store and determine that they are ready for export.
  3. For each of the data export configurations, initiate the export job by invoking the Generic FHIR Client to retrieve the patient clinical information from the backend EHR system. Upon successful initiation, the job details are stored in the data store, and a message is published in Anypoint MQ Queue for asynchronous processing.
  4. Subscribe the message from Anypoint MQ Queue and invoke the Generic FHIR Client to confirm the export job has completed successfully.
  5. Once the data export job is completed successfully, invoke the Generic FHIR Client to retrieve the data files for each dataset, such as Patients, Conditions, and Observations.
  6. De-identify and transform the data, per the defined rules, into a FHIR Bundle.
  7. Invoke Amazon HealthLake System API to ingest the FHIR Bundle data into Amazon HealthLake data store.
  8. Invoke the Generic FHIR Client to delete the generated data file from the EHR system, per EHR vendor guidance.
  9. Update the transaction record in the data store and acknowledge the message on Anypoint MQ Queue.

Activity diagram

The activity diagram illustrates the sequence of processing to export and de-identify the patient data, and then ingest it from the organization's EHR to Amazon HealthLake.

Patient De-identification Activity diagram

Success conditions

The following conditions are met upon successful completion:

  • A de-identified data set for each data export configuration is available in Amazon HealthLake
  • The data export status for each configuration is updated in the data store

Assumptions and constraints

  • Data export implementation is based on HL7 FHIR Bulk Data IG
  • Support the data export for a Group of Patients
  • Allow one data export configuration per FHIR Group Id (Group of Patients)
  • EHR FHIR Servers allow multiple jobs to be initiated
  • The application user will review the predefined de-identified rules in IDE and adjust the code as needed
  • The de-identification process applies both the user-selected rules and the predefined rules that are generally applicable to ensure the data sets are properly de-identified
  • The scope does not include authentication based on user accounts within the configurator tool
  • Organizations will ensure that the data store used in de-identification solution is secured
  • Transient failures are marked as Failed in the data store and the next scheduled run exports the data from the last successful run
  • Amazon HealthLake only supports the batch FHIR BundleType
  • Amazon HealthLake can ingest up to 160 individual resource types in a single Bundle operation
  • Amazon HealthLake does not consider resource instance failures for individual messages within the FHIR Bundle

Before you begin

bulb.png The Getting Started with MuleSoft Accelerators guide provides general information on getting started with the accelerator components. This includes instructions on setting up your local workstation for configuring and deploying the applications.

Downloadable assets

Process APIs

System APIs


Here are some links to related and supporting documentation.

back to top


Published by
MuleSoft Solutions
Published onMay 14, 2024
Asset overview

Asset versions for 2.23.x

Asset versions