Article Text

Download PDFPDF
64 Generating synthetic electronic patient records
  1. Christopher Tan,
  2. Huawei Jian,
  3. Ben Margetts
  1. Great Ormond Street Hospital


Introduction Following the publication of the GOSH digital strategy, electronic patient record (EPR) data has been recognised as an ever-growing cornerstone of the hospital infrastructure. Because EPR data remains the property of the patient and the NHS, access to it is rightly controlled by stringent governance processes which introduce a hurdle for researchers requiring the use of these data. In-short, these restrictions can make it challenging to run short-term clinical research projects, to publish reproducible results without compromising patient anonymity, and for researchers to train on complex EPR data before applying to access it.

Methods The GOSH DRE team have worked with collaborators in UCL and NHS Digital to develop generative statistical and deep learning (AI) models that learn the structure and statistical properties of EPR data. These models have the capability to generate synthetic EPR data without reproducing individual patient records. To facilitate this work, de-identified EPR data from all patients treated at GOSH between January 2016 and January 2019 were extracted from the DRE data lake and modelled using the PyTorch, TensorFlow, and Scikit learn Python libraries.

Results A range of privacy-preserving models were developed on admissions, ward movements, demographics, drug administrations, laboratory tests, vital signs, and microbiological isolate datasets. On inspection, each model was capable of generating realistic EPR data. A method for automatically evaluating each model’s output was developed through the use of stringent statistical similarity and disclosure control metrics that are automatically applied to generated data.

Discussion Integrating the models into a programmatic interface has enabled the production of realistic, consistent EPR records that are not related to any single patient, yet generate high-quality data in the same format as the clinical data made available through the GOSH Digital Research Environment (DRE). This will enable future data-driven developments in GOSH, DRIVE, and the DRE.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.