Objective Electronic health records (EHRs) are routinely used to identify family violence, yet reliable evidence of their validity remains limited. We conducted a systematic review and meta-analysis to evaluate the positive predictive values (PPVs) of coded indicators in EHRs for identifying intimate partner violence (IPV) and child maltreatment (CM), including prenatal neglect.
Methods We searched 18 electronic databases between January 1980 and May 2020 for studies comparing any coded indicator of IPV or CM including prenatal neglect defined as neonatal abstinence syndrome (NAS) or fetal alcohol syndrome (FAS), against an independent reference standard. We pooled PPVs for each indicator using random effects meta-analyses.
Results We included 88 studies (3 875 183 individuals) involving 15 indicators for identifying CM in the prenatal period and childhood (0–18 years) and five indicators for IPV among women of reproductive age (12–50 years). Based on the International Classification of Disease system, the pooled PPV was over 80% for NAS (16 studies) but lower for FAS (<40%; seven studies). For young children, primary diagnoses of CM, specific injury presentations (eg, rib fractures and retinal haemorrhages) and assaults showed a high PPV for CM (pooled PPVs: 55.9%–87.8%). Indicators of IPV in women had a high PPV, with primary diagnoses correctly identifying IPV in >85% of cases.
Conclusions Coded indicators in EHRs have a high likelihood of correctly classifying types of CM and IPV across the life course, providing a useful tool for assessment, support and monitoring of high-risk groups in health services and research.
- child abuse
- health services research
- drug withdrawal
- data collection
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known on this topic?
Electronic health records (EHRs) are readily available and are increasingly used to identify different forms of family violence in practice and public health.
Few studies of EHRs provide comprehensive estimates on the positive predictive values (PPVs) for coded indicators of family violence including child maltreatment, prenatal neglect (neonatal abstinence syndrome or fetal alcohol syndrome) and intimate partner violence.
What this study adds?
This comprehensive meta-analysis provides PPVs of coded indicators in EHRs for different forms of family violence based on external independent reference standards.
We show that routinely coded indicators of family violence have high predictive value for identifying at-risk groups who may benefit from targeted interventions.
Findings emphasise that improving the quality and use of available coded indicators for identifying groups affected by family violence across data systems should be a public health priority.
Intimate partner violence (IPV) and child maltreatment (CM) are forms of family violence that often go unnoticed by services,1–3 despite repeated recommendations by the WHO to improve monitoring efforts.4 5 CM and IPV refer to any act of commission or omission that causes biopsychosocial harm to a child, a future child or partner.6 7 Statutory definitions of CM in the UK include fetal alcohol syndrome (FAS) and neonatal abstinence syndrome (NAS) due to neglect or harm during pregnancy.8
Assessing health records for detailed information on family violence is time consuming and expensive. Instead, studies and services are increasingly using routinely coded electronic health records (EHRs) for assessing family violence.9 10 Coded EHRs allow for longitudinal population-based assessments, automated early warning systems and identification of high-risk populations for targeted interventions at relatively low costs.11–13 However, the potential utility of EHRs to support surveillance and clinical decisions is often undermined by reported quality issues and coded conditions are rarely validated externally.14 15 Unable to check the data themselves for accuracy, large-scale studies (eg, Global Burden of Disease Study) and services rely on routinely coded indicators based on the International Classification of Diseases (ICD) with unknown predictive values.10 16 17 The validity of coded indicators are compounded by varying case definitions, ad hoc classifications by coders5 15 18–20 and under-recording due to clinician fears of potential harm or lack of awareness.18 21
To our knowledge, no previous review has estimated the positive predictive values (PPVs) of coded indicators for different forms of family violence including CM, prenatal neglect (NAS or FAS) and IPV based on external independent reference standards (ie, not using other codes in the EHR to validate indicators). This meta-analysis provides a comprehensive summary of PPVs for multiple coded indicators in EHRs aimed at identifying family violence in general healthcare settings, compared with an independent reference standard.
We followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses for Diagnostic Test Accuracy Studies and the Meta-analysis of Observational Studies in Epidemiology guidelines.22 23 The review protocol was published in the PROSPERO registry (CRD 42019139300),24 with any protocol deviations provided in online supplementary table S1.
We searched 18 electronic databases and 20 selected journals for studies published between 1 January 1970 and 24 May 2020 (complete search strategy in online supplementary table S2). Reference lists of eligible full-text articles were hand-searched and 18 frequently cited authors of eligible studies were contacted for article recommendations.
Three independent reviewers (SS, RA and MS) screened abstracts, full texts and corresponding reference lists of articles using Covidence’s systematic review software.25 Any disagreement over study inclusions between reviewers was resolved by a family violence expert (RG). We included: (1) studies with data to calculate the PPV of a coded EHR indicator for a specific family violence outcome as reported by at least three other eligible studies,26 (2) studies from primary care, paediatric units (including trauma centres) or general hospital settings, (3) studies published in English/Swedish/German and (4) studies distinguishing family violence cases from non-cases (tables 1–2; online supplementary table S3-S4).
Indicators and outcomes of family violence
An overview of key definitions of indicators and outcomes is provided in tables 1–2. Briefly, indicators were defined as any coded marker or risk factor for family violence, ranging from specific primary diagnostic codes (eg, ‘T74.1 physical abuse’) to injuries (eg, rib fractures and retinal haemorrhages), assaults and combinations of adversity-related codes (table 2). For IPV, we predominately included indicators for women (eg, >80% of study sample were women), as the health consequences for women are higher, and the prevalence among men is significantly lower.27 In terms of family violence, men’s parental status is also more difficult to ascertain than women’s. Outcomes were defined as mutually exclusive categories of family violence according to different life periods: NAS or FAS (representing the prenatal period),8 any form of CM (representing childhood) or IPV (representing women of reproductive age), respectively. Outcomes had to be ascertained and verified by an independent reference standard such as recorded multidisciplinary decisions of family violence extracted from chart reviews (table 1).
Data extraction and quality assessments
Using a piloted standardised extraction form, three reviewers independently extracted relevant study characteristics. If studies reported separate estimates for multiple codes, reference standards or age criteria, we extracted all estimates and prioritised those based on the criterion most frequently used by other studies to increase overall homogeneity. We requested additional information from 25 authors, 6 of whom responded within the 2-month deadline and were included. The risk of bias was assessed by the same three reviewers using a revised version of the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2; online supplementary table S4).28 We rated each study's reference standard according to assessment quality and exclusion criteria based on previous reviews (table 1).29
The PPVs for each family violence outcome were calculated as the proportion of identified cases by an indicator validated as true cases by an independent reference standard. For studies that reported only sensitivity and specificity, we obtained PPVs using Bayes’ theorem.30 PPVs were pooled using random effects intercept logistic regression models with the logit transformation when at least three studies were available for the same outcome.31 The model accounts for potentially misleading back transformations of PPVs when pooling studies with highly variable sample sizes.31 Where applicable, we also examined documentation quality and coding errors by pooling the proportions of coded medical charts with missing key information (full procedures in online supplementary table S5).
We measured the extent to which the PPVs varied between studies (ie, between-study heterogeneity).32 As the PPV is a measure of proportions, we measured the between-study heterogeneity using the I2 statistic (>75%=indicates substantial heterogeneity),32 standard χ2 tests, prediction intervals (where the PPV is expected for 95% of similar future studies)32 and subgroup analyses when at least four studies were available for each subgroup.24 We used random effects meta-regression to assess for the impact of publication year on the between-study heterogeneity. The influence of individual studies was explored by serially omitting different studies from the overall estimates.
Publication bias occurs when studies with favourable results (ie, high PPV) are more likely to be published than unfavourable results.33 We explored publication bias by plotting the PPV against the SEs of individual estimates using funnel plots for indicators with at least 10 studies for the same outcome. To test for statistically significant differences in funnel plot asymmetry (ie, publication bias), we used Egger’s test and Begg’s and Mazumdar rank correlation test.33 We used R (V.3.6.1) and the ‘meta’ package with the ‘metaprop’ command to perform the analyses.34 35
Details of the included study characteristics are provided in online supplementary tables S6–S8 and tables S12–15. In total, 65 cross-sectional and 23 longitudinal studies (81 unique publications), involving 20 indicators and 3 875 183 individuals from 11 different countries met the inclusion criteria (figure 1). Overall, 13 studies provided indicators for NAS,36–45 7 for FAS,46–52 50 for CM (0–18 years)14 53–98 and 18 for IPV among women (12–50 years).99–116 Most studies were from the USA (72 studies, 81%), with a minority from Australia (8 studies, 9%) and Europe (4 studies, 4%). The majority of indicators comprised different ICD-9 coding clusters (64 studies) or modified versions of ICD-10 (14 studies). Chart reviews with predefined family violence criteria were the most frequent type of reference standard for all outcomes (75 studies, median sample size: 301 participants). The smallest study included 38 children,94 involving all ICD-9 coded subdural haematomas (SDH) at one hospital over 10 years. The study was retained as it involved all potential cases presenting to a generic children’s hospital and as SDHs are deemed to be extremely rare in young children and an important indicator for CM.117 We included one unpublished dissertation from 2019,59 specifically assessing ICD codes for CM.
Study quality assessment
Individual QUADAS-2 scores for studies are provided in online supplementary table S7. Overall, 31 studies (35%) were rated as high risk of bias and lower quality in most domains, and 57 studies (65%) were rated as low risk of bias and higher quality in most domains. The majority of studies were rated lower as they did not mask the outcome when reviewing coded EHRs (95%) or used a lower rated reference standard (42%; a rating <4 for CM; rating <3 for NAS, FAS or IPV).
Pooled estimates of PPVs
Table 3 depicts details for each pooled PPV and between-study heterogeneity by indicator and outcome. Individual forest plots of PPVs and study-specific ICD codes are provided in online supplementary figures S1-S19 and tables S6-S8.
Neonatal abstinence syndrome
The pooled PPV of primary ICD diagnoses for NAS was 80.9% (95% CI 71.0% to 87.9%), and the PPV ranged from 31.8% to 98.2%, with substantial between-study heterogeneity. Most studies (54%) used the Finnegan scale (a validated 21-item scale for documented symptom severity) to determine the accuracy of the coded diagnosis,118 with a cut-off score of 8 as a reference standard (except for one study, cut-off score of 4).41 The remaining studies used recorded clinician diagnosis or required pharmacological NAS treatment as reference standards.
Fetal alcohol syndrome
For FAS, we found a low pooled PPV of 39.3% (95% CI 25.3% to 55.4%), and the PPV ranged from 14.0% to 66.9%, with large between-study heterogeneity. All studies used ICD-9 codes to identify cases (eg, alcohol affecting fetus) and required 3–5 prespecified criteria to be met in chart reviews, including facial anomalies, prenatal growth deficiencies and maternal alcohol exposure (online supplementary table S6).
The pooled PPV of primary diagnoses for CM was 87.8% (95% CI 83.4% to 91.2%) among children aged 0–18 years across 19 studies, with significant between-study heterogeneity. Individual PPVs from studies ranged from 65.0% to 100.0%.
Overall, 37 studies assessed 11 different indicators of CM (online supplementary figures S3-S14). The pooled PPVs ranged from 88.3% (95% CI 55.2% to 97.9%) for rib fractures to 19.6% (95% CI 8.9% to 37.9%) for multiple burn injuries in children under 5 years. The between-study heterogeneity ranged from small to large (I2 range 0.0%–98.5%). Four studies also assessed the PPV of poisonings (range 7.0%–95.0%),78 88 96 98 but the extreme heterogeneity precluded reliable pooling (prediction interval: 0%–100%). The majority of studies assessing injury indicators of CM actively excluded transport injuries (28 studies, 65%), metabolic bone diseases (10 studies) and birth injuries (9 studies) from the CM group.
Subgroup analyses for NAS, FAS and CM
Details of each subgroup analysis are provided in online supplementary tables S9–S11. For primary diagnoses of CM, we found significantly higher PPVs of CM in studies from inpatient settings (PPV=90.6%) compared with studies from emergency departments (EDs; PPV=80.8%) and in studies that applied a lower rated reference standard (89.4%; a rating <4) relative to a higher rated reference standard (70.8%; a rating ≥4). We found no significant differences when comparing subgroups by coding systems (ICD-9 vs ICD-10), age (younger vs older children) or publication year (p>0.179).
Intimate partner violence
Three studies assessed primary ICD diagnoses of IPV among women presenting to EDs (main age range: 12–55 years), with a pooled PPV of 86.1% (95% CI 72.2% to 93.6%). Individual PPVs from studies ranged from 73.6% to 94.4%. Two studies used the Flitcraft criteria as a reference standard (eg, prespecified IPV criteria) and included both violence by ex and current partners as cases. One study from Hong Kong included only IPV within cohabiting couples as cases and used medical chart reviews to verify that diagnoses matched any documented disclosure of IPV.115
We found 16 studies (10 in EDs/6 inpatient) that assessed four different injury-related presentations of IPV among predominantly women (primary age range: 12–55 years). The pooled PPVs were low, ranging from 31.6% (22.3%–42.7%) for assault-related codes to 3.3% (2.2%–5.0%) for orbital floor fractures. All studies used ICD-9 codes to identify presentations, except for one Finnish study that used ICD-10 codes.108 The between-study heterogeneity ranged from low to substantial across indicators (I 2 range=0.0%–99.4%). For assaults and facial fractures, meta-regressions showed that more recent studies were associated with a lower PPV of IPV (p<0.001). Three studies involved a small proportion of men (4%–17%),108 113 114 as estimates could not be separated from women. No study focused on pregnant or elderly women.
Coding errors for NAS, CM and IPV
The proportion of misclassifications (false positives) due to coding errors were on average 2.1% (95% CI 0.8% to 5.6%) across nine studies on NAS, CM, IPV and assaults (range I 2=97.3%–99.7%; online supplementary figure S20).
Documentation quality of assault indicators for IPV
We found that on average 28.0% (95% CI 14.7% to 46.8%; 18074/31224 charts; six studies) of assault coded cases among women had no recorded perpetrator information in the underlying medical charts (ie, missing data), preventing any classification/coding of IPV (online supplementary figure S21; table S5).
Serially excluding individual studies revealed that no study significantly impacted the between-study heterogeneity, and it remained substantial across all pooled PPVs (online supplementary tables S9-S10).
The funnel plot of studies reporting on the PPV for primary CM diagnoses was asymmetric (Egger’s test p=0.001, slope=0.999, rank correlation test: p=0.248; online supplementary figure S23), meaning that studies with higher PPVs were potentially more likely to be published. We found no evidence of funnel plot asymmetry for all other indicators with at least 10 or more studies (Egger’s test: p>0.366, rank correlation test: p>0.190; online supplementary figures S22-25).
This is the largest meta-analysis to investigate the predictive value of indicators for family violence in EHRs, involving over 3.8 million individuals across 11 countries. Despite the large between-study heterogeneity, the results highlight that EHRs provide consistently high PPV for CM or IPV. We found that more than 8 in 10 coded primary diagnoses were confirmed as cases of NAS, CM and IPV. The findings also indicate that 8 in 10 recorded rib fractures and retinal haemorrhages met criteria for CM, and 1 in 3 assault-related presentations among women met criteria for IPV. Given the consistent recommendations for improved surveillance of violence,119 our findings underscore the utility of using commonly available coded medical data to evaluate services for at-risk groups across the life course. However, estimates varied depending on indicator and outcome, with substantial heterogeneity.
Compared with all other indicators of CM, FAS showed the lowest PPV. This most likely reflects the poor availability of specific ICD-9 codes for FAS, along with the complexity of the diagnosis. All included studies used the ICD-9 code ‘760.71’ focusing on fetuses being affected by alcohol without further description. Yet, the applied reference standard across studies required additional FAS criteria to be met, including facial anomalies and growth deficiencies.46–52 There is also an absence of FAS criteria that are widely recognised by clinicians,120 which may explain a higher underlying rate of misclassifications. Further studies on the accuracy of specific FAS codes using the ICD-10 are needed and may yield higher validity.
The high PPV of coded high-risk injuries of CM aligns with findings from previous reviews based on clinician diagnoses across paediatric healthcare settings. Compared with Kemp et al’s meta-analyses of 32 studies on fractures,29 our results showed higher PPVs for rib fractures (88.3% vs 70.9%) and similar estimates for lower limb fractures but lower estimates for skull (22.1% vs 30.1%) and upper limb fractures (38.5% vs 47.6%). Our PPV for retinal haemorrhages and burn injuries were also consistent with previous reviews of children across settings.121 122 While there are significant methodological differences between this study and previous work (eg, overall age criteria), the consistency between findings suggests that some coded injury patterns could be considered as a broader measure of CM in EHRs to aid identification of high-risk groups.
Coded injury patterns of IPV, such as orbital fractures, provided relatively low PPVs. This is not surprising as the PPV is related to the underlying prevalence of IPV. For example, most studies on IPV were conducted in large populations (eg, EDs) and investigated injury patterns applicable to a wide range of causes. Women are also known to under-report IPV because of safety reasons and stigma.123 124 The under-reporting is consistent with the findings that more than 1 in 4 pooled female assault coded records were missing perpetrator information in the underlying medical charts. Still, broader assault-related presentations and upper body contusions showed higher PPVs of IPV (>25%),125 and their utility in combination with other risk factors might yield comparable predictive accuracy in further studies.
This review has important limitations. First, NAS and FAS are not always recognised as forms of prenatal neglect or CM, and the categorisation should be considered with caution to prevent stigma and barriers to help-seeking. Women who misuse substances may be unaware of their pregnancy, and opioids can be prescribed during pregnancy by clinicians for pain or opioid addiction treatment, increasing the risk of NAS.
Second, PPVs were analysed without accounting for study prevalences and within-study correlations of family violence.126 As a result, pooled estimates from studies with higher underlying prevalence estimates will generally lead to higher PPVs. However, we aimed to minimise variation of underlying prevalences by only including studies from general hospitals or paediatric settings.
Third, we were unable to obtain adequate data to reliably pool estimates on specificity and sensitivity. However, these absolute accuracy measures were not the focus of this review. The high volume of eligible patients presenting to healthcare combined with the rare occurrence and under-reporting of all outcomes limits the feasibility to apply a reference standard to non-coded cases to ascertain false negatives. Reliable measures on sensitivity are therefore unlikely to be obtained.
Fourth, the between-study heterogeneity of the pooled PPVs was substantial. As in most large meta-analyses, several factors not examined in the subgroup analyses may have influenced our results. Many estimates were ascertained from lower quality reference standards, prone to circularity bias (ie, using suspicion of CM as a reference standard in the absence of other explanations).127 Still, the consistency of studied ICD codes, combined with the larger samples and the high PPV across indicators, suggests that these estimates are valid and merit further study.
Finally, identifying CM and IPV in practice is complex. Obtaining information is a difficult task, as patients often under-report their experiences or symptoms, and high-risk groups such as infants cannot communicate. Similarly, some symptoms addressed by the reference standard (eg, linkage to social service assessments) may not have been conveyed to the clinician. EHRs may thus lead to potentially missed diagnoses or misclassifications. Estimates of indicators, therefore, reflect the ‘best data’ available and should be viewed in terms of routinely recorded indicators to help inform decisions about the likelihood of abuse, rather than definitive diagnoses.
More than a billion children and women aged 0–45 years globally reported being victims of abuse in 2014.128 129 Yet, in the UK, studies show that only about 1 in 3 violence-related ED visits for children and adults appear in police records, and self-report studies reveal that 1 in 5 affected women feel reluctant to report abuse to healthcare.3 130 131 In response to WHO’s priorities on addressing gaps in violence prevention, our findings highlight the potential to improve targeted care using routine EHRs to identify, prevent and support high-risk groups of family violence. On a service level, coded indicators of co-occurring family violence have the potential to be incorporated into computerised clinical decision support systems or risk prediction models to flag potential at-risk individuals.132 In the UK, linkage of family members EHRs could also allow for a ‘Think-Family’ approach, where indicators have the potential to identify vulnerable children through mothers or vice versa.133 Despite these potential implications, it remains unknown whether the benefits of using automated EHR systems to identify at-risk individuals outweigh potential harms including stigma, legal consequences, trust and reduced help-seeking.
First, we are very grateful to Dr Erin Shriver (University of Iowa Hospitals and Clinics), Professor John Leventhal (Yale School of Medicine), Dr Jennifer N. Lind (Centers for Disease Control and Prevention), Peggy Brozicevic (Vermont Department of Health), Luigi Garcia Saavedra (New Mexico Department of Health), Dr Jane Fornoff (Illinois Department of Public Health), Dr Ulf Höglund (Uppsala University), Björn Tingberg (Linköping University) and Dr Melissa O'Donnell (The University of Western Australia) for responding to additional data requests to complete our analyses. Second, this research benefits from and contributes to the National Institute for Health Research (NIHR) Children and Families Policy Research Unit but was not commissioned by the NIHR Policy Research Programme. The study also benefitted from funding for infrastructure through the NIHR GOSH Biomedical Research Centre and Health Data Research UK. Finally, we express our deepest appreciation to the children and families worldwide for their contribution and participation in interpersonal violence research.
Contributors Concept: SS and RG. Design: SS. Drafting of the manuscript: SS, RA, RG and LL. Literature search and screening: SS, RA and MS. Acquisition, analysis or interpretation of data: all authors contributed equally. Statistical analysis: SS and LL. Critical revision of the manuscript for important intellectual content: all authors contributed equally. Study supervision: RG and LL.
Funding The corresponding author had full access to all of the data and had final responsibility to submit for publication.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.