In recent years, there have been significant advances in genetic technologies, evolving the field of genomics from genetics. This has huge diagnostic potential, as genomic testing increasingly becomes part of mainstream medicine. However, there are numerous potential pitfalls in the interpretation of genomic data. It is therefore essential that we educate clinicians more widely about the appropriate interpretation and utilisation of genomic testing.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known?
Genomic testing has the potential to significantly increase diagnostic rates in all areas of medicine.
Genomic data can be overwhelming in their quantity and complexity and produce results which are sometimes difficult to interpret and/or unexpected in any given patient.
What this study adds?
This paper brings together the various practical aspects of genomic testing in an accessible way for the non-genetic specialist.
This is particularly relevant in the context of ongoing initiatives such as Genomics England’s 100, 000 Genomes Project in the UK and the associated drive for genomic testing to be incorporated into mainstream medicine.
It is clearly important to make accurate diagnoses to inform treatment, prognosis and recurrence. With advances in technology and increased availability of genome-wide tests, there is the potential to stop the lengthy and costly diagnostic odyssey in more and more families with rare diseases (see box 1). This is relevant because rare diseases collectively are estimated to affect 1 in 17 or 7% of the population.1
UK Chief Medical Officer quote
“Genomic medicine has the potential to save costs and improve quality of care by targeting treatment, maximising benefit and reducing side effects. For patients with rare diseases, it can shorten their ‘diagnostic odyssey’ helping to identify therapeutic options faster and improve outcomes. The new science of genomics is opening up better diagnoses for patients, better and safer treatments, opportunities for screening and the possibilities for prevention. These will all improve as we learn more about genomes and their relation with illness and treatment response.”44
Diagnostic genetics was once generally based on a one-gene-equals-one-disease concept: a clinical diagnosis was considered and sequencing of a single gene instigated to confirm or refute such. In the era of genomics, next-generation sequencing (NGS), coupled with high resolution array comparative genomic hybridisation (Array CGH), gives clinicians the power to make diagnoses in patients with more complex, atypical or non-specific phenotypes.2
We will provide an overview of the classification of germline variants, with the potential pitfalls in the interpretation of such, using clinical vignettes to illustrate certain points. We will introduce some associated consent and communication issues and offer a taster of potential future developments and challenges. There will inevitably be a degree of technical jargon, but we have tried to keep this to a minimum and have also provided a glossary (box 2).
Genomics: the study of the entire genome (cf. genetics which generally refers to the study of a particular gene).
Genome: the entirety of the genetic material of an organism (cf. exome which is the portion of such containing expressed sequence, ie. which codes for proteins).
Phenotype: an individual’s observable traits, including normal variation, as well as diseases or conditions (the genetic contribution to which is called the genotype).
Germline: a lineage of cells that culminates in the formation of germ cells (eggs and sperm) and whose genetic material is therefore passed on to successive generations (cf. somatic cells which are every other cell).
Variant: a locus (place) in the genome where an individual differs from the reference sequence (itself an amalgamation of the genome sequence of more than one individual). This can be subdivided in terms of size into: structural variants; copy number variants (CNVs); and single nucleotide variants (SNVs).
Microdeletion: a missing piece of chromosome too small to be seen by karyotyping using traditional light microscopy, typically less than three megabases, detectable only with fluorescent probes (fluorescent in situ hybridisation) or array comparative genome hybridisation (Array CGH or Microarray analysis). Microduplications are extra pieces of chromosome of similar order of magnitude, collectively referred to as CNVs.
Polygenic: a phenotype influenced by more than one genetic factor.
Multifactorial: a phenotype influenced by the interaction between environmental factors (known or otherwise) and one or more genetic factors.
de novo: a variant that is newly arisen in an individual, as opposed to one that has been inherited from a parent.
Genetic heterogeneity: this can be either allelic heterogeneity when a similar phenotype is produced by different alleles (variants) within the same gene; or locus heterogeneity when a similar phenotype is produced by variants within different genes.
Homozygosity: the state of possessing two identical alleles (forms) of a particular gene, one inherited from each parent (cf. heterozygosity which is the state of possessing two different alleles of a particular gene).
Consanguinity: the property of being from the same kinship (ie. descended from the same ancestor) as another person. In other words, the reproductive union between two partners previously related, usually cousins.
Haploinsufficiency: this is when the loss of function of one allele has a deleterious effect, due to the other allele alone not being able to produce sufficient functional product.
Expression: this is when a gene has a functional effect, the main mechanism for which is transcription, followed by translation. Transcription is the reading of the gene sequence by RNA polymerase to generate complimentary messenger RNA; translation is the reading of the messenger RNA by a ribosome to create a chain of amino acids that eventually forms a functional protein.
Penetrance: the proportion of individuals with a particular variant who display a given phenotype. A condition which does not manifest in every individual with a disease-causing variant is said to show reduced penetrance.
Expressivity: whereby certain genetic conditions can manifest in different ways, and to different degrees, even within the same family. Such conditions are said to show variable expression.
in vivo: an experimental procedure performed using living organisms or cells (cf. in vitro which is an experimental procedure performed in a controlled environment outside of a living organism; and in silico which are computational methods for predicting the effect of a variant in a gene on the function of its protein product).
Array CGH: a method of detecting the gain or loss of genetic material, ranging in size from several thousand to tens of millions of nucleotides (CNVs). Based on measurement of the relative intensity of the patient versus control fluorescent dye signal at each probe on the array.
Sanger sequencing: a method of DNA sequencing developed in 1977 by Frederick Sanger and colleagues and widely used since. Based on the selective incorporation of chain-terminating dideoxynucleotides (ddNTPs) by DNA polymerase during in vitro DNA replication (PCR) and subsequent size separation of the products of this PCR via capillary electrophoresis.
Next-generation sequencing (NGS): a catch-all term for a number of post Sanger sequencing technologies whereby millions of small fragments of DNA are sequenced in parallel, multiple times, to provide high depth of coverage. Bioinformatics analyses map these fragments, or individual reads, to the human reference genome. NGS can be used to sequence entire genomes, exomes or disease-specific gene panels.
Bioinformatics: the application of computer technology to the management and interpretation of genomic data.
Panel test: the targeted NGS testing of a panel (set) of genes, specifically selected because they are associated with one disease or a clinically related group of diseases.
Minigene splicing assay: used in in vivo or in vitro experiments to test the functional elements of the splicing process. The mini gene is an artificial gene that can be constructed to test specific hypotheses.
Variant of uncertain significance: a variant in a disease-associated gene, the specific effect of which is unknown or uncertain.
Incidental finding: a pathogenic or likely pathogenic variant in a gene that is not relevant to the initial reason for sequencing.
Reverse phenotyping: the objective reassessment of a patient’s phenotype in the context of the detected variant(s) in specific gene(s).
Ascertainment bias: the systematic distortion in the measurement of the true frequency of a phenomenon due to the way in which data are collected. In particular, in genomics, sample sizes and the populations in which variants are discovered affect the frequency and characteristics of such.
Personalised medicine: sometimes also called precision medicine, is the ultimate goal of tailoring the medical management of each individual to their particular genotype (cf. stratified medicine which is identifying subgroups of patients who will respond to one intervention better than another).
Mainstreaming: the move away from genomics being seen as the sole preserve of specialist services like clinical genetics to applications available in every clinic and specialty across the health system, from tertiary to primary care.
Advances in testing
The last decade has seen Array CGH, or so-called Microarray technology, increasingly incorporated into standard clinical practice. This can be considered as a genome-wide screen, to identify microdeletions or microduplications smaller than previously detectable via light microscopy. Such are generally known as copy number variants or CNVs. These can be part of normal human variation. However, if significant in terms of either the number and/or functional significance of the genes involved, they can be associated with a clinical phenotype. Some of these CNVs are recurrent and are now being categorised as syndromes.3 However, as with most genetic conditions, there is often variable expression or reduced penetrance.4
More recently, NGS is starting to enable the introduction of specialist large gene panels, whole-exome sequencing and even whole-genome sequencing, to screen for variants within large numbers of genes. Most of these variants affect single nucleotides and are thus referred to as single nucleotide variants or SNVs. NGS technology generally refers to any method of DNA sequencing developed post Sanger sequencing, and now encompasses a number of different technologies, all of which offer a higher throughput at lower cost than was previously possible.5 This promises to deliver larger amounts of sequence data, potentially more rapidly, as this technology becomes established on a service basis.
As the human genome contains approximately three billion base pairs, the volume of data produced by whole-genome sequencing is enormous. Approximately 1% of the genome constitutes coding sequence and this can be referred to as the exome. It is estimated that the genomes of any two individuals differ at 0.5% of all base positions, approximately 4.1 million variants per genome,6 the vast majority of which constitute normal human variation. In a patient presenting with problems (clinical phenotype), we are essentially wishing to identify the significant disease-causing variant(s) in their particular case.
As a result of the substantial data processing burden of NGS, laboratories now employ bioinformaticians: staff with specialist computer programming skills, to develop and maintain bespoke data-processing pipelines. DNA sequence data must be accurately aligned against a reference genome sequence, regions of poor or no coverage must be defined and thousands of identified variants must be filtered against strictly defined criteria, essentially to pick out any variant(s) associated with the patient’s phenotype. All this requires careful consideration regarding quality and consistency.7 8
As in any other area of medicine, it is important to choose the most appropriate diagnostic test for each individual situation. For example, in a child with complex epilepsy, a dedicated epilepsy gene panel (with better gene coverage for targeted genes) may be the most appropriate test; for a child with non-specific problems, an NGS whole-exome screen may be better. There is evidence that diagnostic pick-up can be improved by most appropriately selecting the test requested.9 Furthermore, the more specific a test, the less likely it is to produce findings that are of uncertain significance or (co)incidental to the reason for testing (see below).
The terms mutation and polymorphism are being replaced with the generic term variant, with one of the following modifiers: benign (type 1); likely benign (type 2); uncertain significance (type 3); likely pathogenic (type 4); and pathogenic (type 5).7 Type 3s are often referred to as variants of uncertain significance (VUSs). Various types of data are considered in the interpretation of SNVs, including: (1) effect on the encoded protein; (2) mode of inheritance; (3) frequency in disease-affected and unaffected populations (population data); (4) in vitro and/or in vivo functional studies; (5) computational prediction tools; and (6)co-occurrence and segregation (family) studies.7 10
Effect on the encoded protein
SNVs within the coding sequence can be categorised by their effect on the encoded protein. For example, a missense variant results in one amino acid in the protein being substituted for another, whereas a nonsense variant results in premature termination of translation potentially leading to the production of a truncated protein. Very generally, the latter is likely to represent a more severe effect than the former and hence carries a greater prior likelihood of a phenotypical effect.
Mode of inheritance
The mode of inheritance is obviously useful to know when attempting to interpret variants identified in any given disease gene. However, for novel genes, it is important to consider that the mode of inheritance is not always certain. It is also possible for some pathogenic variants within a gene to confer a dominant disease inheritance, while others confer recessive inheritance (see clinical vignette 2).
Whether a variant has previously been reported in the literature is clearly important, but it is essential to consider ascertainment bias when assessing such evidence. A variant may be declared as disease-causing because it has been found in patients affected with similar symptoms, but it is crucial to consider whether the variant is also found within a healthy control population and at what frequency.11
in vitro and/or in vivo functional studies
These include widely varying methods including direct assays on patient samples, studies in animal models or cell lines, and/or minigene splicing assays. The direct demonstration of a functional effect of a specific variant by any such means can be a highly valuable piece of evidence. It is currently generally not feasible for diagnostic laboratories to carry out such work for every new potentially significant variant. However, efforts are being made to develop high-throughput functional assays that can be applied to analyse very large numbers of variants, which should provide very useful data.12
Computational prediction tools
A growing number of in silico tools are available for predicting the likelihood of a variant having a significant effect on the function of a protein. The accuracy of these predictions depends on the volume and quality of the data drawn on and the ‘intelligence’ of the algorithms. For example, many tools estimate the degree of evolutionary conservation of a particular amino acid position. However, this relies on good quality cross-species sequence alignment. Recent systematic evaluation of the performance of a range of these tools supports the general consensus that their predictions can only be considered as weak evidence.13
Whether a variant is de novo or inherited is a potentially powerful piece of evidence, so family studies are often essential. For instance, if a variant is present in a parent (and/or other relatives) and they have similar symptoms, this increases the likelihood that it is significant. Likewise, if found to be de novo and both parents are clinically unaffected, this also supports pathogenicity. Alternatively, if present in a parent with no such problems, it is more likely to represent a benign, normal familial variant.
It is of course always important to beware of the pitfall of variable expression or reduced penetrance and to note that the average human germline SNV mutation rate equates to ~74 novel SNVs per genome per generation.14 Therefore, de novo status alone cannot be assumed to signify pathogenicity and must be interpreted with caution. Paternal age is also a significant factor: children of older fathers tend to have more de novo variants, not all of which are necessarily pathogenic.14 The possibility of non-paternity must also be considered when trying to establish if a variant is de novo.
There are particular interpretive challenges associated with consanguinity. There is an increased prior likelihood that a child of a consanguineous union will have a rare autosomal recessive disorder, but the child will also have increased homozygosity and so is likely to have relatively more rare homozygous variants anyway. Consanguineous families can of course have genetic conditions due to other genetic mechanisms, unrelated and coincidental to their consanguinity. It is especially important therefore to keep an open mind in the interpretation of such family studies.
Variant classification should be considered as a dynamic or iterative process and altered according to newly emerging clinical, family and/or molecular evidence.15 It is therefore important that scientists and clinicians work closely together to ensure evidence-based interpretation of variants, with the clinician then carefully reverse phenotyping the patient. Essentially, reverse phenotyping is seeing if detected variants are consistent with the patient’s presenting problems. As much as possible, this has to be a rigorous and objective process.
It is easy to jump to an incorrect diagnosis, because of preconceived ideas, previous experiences and/or apparently supportive evidence, regardless necessarily of other potential aetiological factors and/or evidence to the contrary. This observer bias is also sometimes known as a Procrustean error.16 It can be compounded by subject bias if a patient and/or their family subsequently research the putative diagnosis themselves and believe this explains their problems. In the context of genomic test results, there is a significant potential for such a pitfall, especially if a variant is identified in a gene that potentially fits the patient’s phenotype (which of course it will if identified through a targeted gene panel). Caution must therefore be exercised in assigning a variant as being definitely pathogenic in any given case.
For example, in clinical vignette 1 (box 3), this child was prematurely given the diagnostic label of 15q11.2 microdeletion syndrome. The 15q11.2 microdeletion is one of the more commonly found CNVs in clinical practice and is generally believed to be enriched in populations with learning difficulties.17 However, it has a low penetrance and an extremely non-specific and variable spectrum of associated features, such that many now argue that it is not a recognisable syndrome and probably not accountable (solely at least) for the problems of any given child.18 19 Therefore, the further finding of an SCN1A variant, if not totally replacing the previous ‘diagnosis’, is certainly potentially adding to it in this example.20
Illustrative clinical vignettes
Vignette 1: Example of observer bias or Procrustean error
A 7-year-old girl with significant delay and seizure disorder. Previous array comparative genome hybridisation (Array CGH) analysis showed 15q11.2 microdeletion and diagnosis of 15q11.2 microdeletion syndrome assigned by clinician ordering testing. After review a few years later, entered into a whole-exome study, as problems now thought to be more significant than usually seen with this CNV, which is also present in the clinically unaffected father. Exome study subsequently reported de novo pathogenic variant in SCN1A.
Vignette 2: Example of complex phenotype
An 11-year-old girl, generally well, but severe intellectual disability, non-specific facial dysmorphism, seizures and degree of sensorineural hearing loss. No history of regression and growth parameters in normal range. No significant family history. Whole-exome sequencing found de novo heterozygous variants in three genes: MECP2 , TBC1D24 and CHD7.
Vignette 3: Example of incidental finding
A 9-month-old boy, born at 30 weeks gestation, with intrauterine growth retardation. Subsequent failure to thrive and non-specific dysmorphism. Array CGH analysis showed only one abnormality: a microdeletion involving part of the BRCA1 gene. Subsequent inquiry identified grandmother with ovarian cancer. Whole-exome studies instigated to look for diagnosis of presenting problems.
Clinical vignette 1 can be used as an example of a Procrustean error, and of a complex diagnosis, due to a combination of more than one genomic variant. It is emerging that in around 5% of cases, where whole-exome sequencing is informative, two or more molecular diagnoses are being made.21 22 Such complex phenotypes can be overlapping or distinct, depending on whether the associated phenotypical features are shared. Overlapping phenotypes are more likely if the genes concerned are connected by the same biological pathway. Careful variant interpretation and reverse phenotyping is needed to determine which variant(s), if indeed any, in isolation or in combination, are pathogenic in such complex cases.
As another example of a complex phenotype, in clinical vignette 2, de novo variants were found in three genes: MECP2, TBC1D24 and CHD7. MECP2 and CHD7 are associated with Rett syndrome and CHARGE syndrome respectively.23 24 However, we are now aware of a much broader phenotypical spectrum for both these genes. TBC1D24 has been associated with syndromal deafness, when acting in an autosomal recessive manner, and autosomal dominant non-syndromal deafness.25 26 In this example therefore the child’s developmental delay could potentially be attributed to any of these three genes, individually or in combination; her seizures could be attributed to MECP2 and/or TBC1D24; and her hearing impairment to TBC1D24 and/or CHD7. Thorough reverse phenotyping, plus an iterative approach to new information, would hopefully resolve such a diagnostic conundrum in time.
In addition to VUSs, there is also the potential for incidental findings (IFs) in genomic testing, especially with whole-exome or whole-genome sequencing. These are pathogenic variants that are not apparently responsible for the individual’s presenting problems. For example, a child with learning difficulties identified to have a pathogenic variant in an adult cancer gene such as BRCA1 (see clinical vignette 3).
There is active debate about the return of IFs and recommendations are evolving. The American College of Medical Genetics and Genomics previously recommended that laboratories actively seek and report a limited and defined number of variants. The variants were selected on the basis that they are potentially medically important to that individual or to the rest of their family.27 Initially, these recommendations came without reference to patient preference; in other words, the only way to opt out of receiving IFs was to refuse the whole test. The American College’s view has since been relaxed, to incorporate some degree of patient choice, with patients able to opt out of receiving IFs.28
Interestingly, this opt in/opt out policy for IFs applies irrespective of age. In the case of testing minors, this approach may lead to information being disclosed that only becomes relevant to that individual later in life. This is at odds with British and European recommendations,29 30 and with long-established practice in clinical genetics in the UK, as it undermines the child’s right to decide for themselves what information they would like when older. While European recommendations largely support the American guidelines, they do acknowledge this conflict and suggest a need to try to balance the autonomy and interests of the child with parental wishes.31
Communication and consent
Given the complexities discussed above, it is essential that the healthcare professional taking the patient through the consent process for NGS has the appropriate knowledge and skills to do so. Proposed minimal requirements for consent forms for NGS are helpful and have been presented and discussed elsewhere.32 These include: a description of the test process; the possible benefits as well as disadvantages of the test; a description of the measures taken to ensure confidentiality; and information about how IFs will be managed. There is a danger, however, of becoming focused on consent as an end in itself; consent should of course be more than just a paper exercise.
Consent for genetic testing has traditionally followed an autonomy and information provision model, with full disclosure of all risks and benefits of a particular genetic test.33 In the context of genomic testing, and with the move to mainstreaming such testing, we need to question whether consent can ever be fully informed.34 If nothing else, there simply will not be enough time to go through all possible genomic testing outcomes, in the context of a busy general clinic.
There clearly needs to be some thought as to how best to approach the consent process for genomic testing, especially within the context of routine healthcare. With this in mind, some have proposed a broader consent process, in which individuals are informed of the types or categories of results that may be obtained by NGS.35 36 Others argue for a more relational approach to consent, grounded in the principles of so-called virtue ethics. This puts less emphasis on the informational aspect of consent and more on building a collaborative, ongoing relationship between patient and healthcare professional, based around values such as honesty, openness and trustworthiness.37
Whereas helping individuals and their families to achieve realistic expectations of NGS is clearly an important and challenging part of the consent process,38 39 we also need to be mindful of societal attitudes towards the return of genomic data. For example, there is some evidence of a disconnect between members of the public and genetic healthcare professionals: with the public appearing to want access to more information than professionals generally consider appropriate.40
As technology and the associated body of knowledge continue to improve, many currently considered VUSs will be reclassified as either benign or pathogenic, and new pathogenic mechanisms and modifiers will be described. These advances will be optimised if there is international collaboration in the contribution to, and curation of, databases of variants.41 Beyond this, our knowledge of how the genome functions (in health and disease) will expand through the development of other ‘omics’ fields or so-called ‘multi-omics’ research initiatives.42
Advances in technology and knowledge are likely to move us closer to the promise of personalised medicine. Recent advances in genome editing technologies are now opening up new opportunities to correct the underlying genetic variant(s) or alter the expression of specific genes in the relevant cells of a given patient.43 Moreover, if genomic knowledge will help to determine treatment options, then it follows that genomics will need to be incorporated into standard mainstream medical practice at a much earlier stage in the diagnostic process.44
Each human genome contains around four million variants, most of which contribute to normal variation and are not pathogenic. Recent advances in NGS technologies and bioinformatics are enabling us to identify potentially pathogenic variants. However, there remain mechanisms and interactions, genetic and environmental, still to be described. It is therefore important that this is an iterative process, with ideally planned clinical review of undiagnosed patients, with new findings and associated knowledge.
Because of the scale and complexity of NGS data, and the almost infinite number of presentations, genetic backgrounds and personal circumstances, there is a potential for the misinterpretation of genomic data for any given individual. Reverse phenotyping is the imperative clinical process therefore of assessing if and how these results are relevant (or indeed not) to a patient’s presenting problems (phenotype). NGS also has the potential to detect VUSs and IFs.
Obtaining informed consent for genomic testing is a challenge, especially in mainstream medical practice, where such will almost certainly be used earlier and more widely. Effective genomics teaching will therefore need to be integrated into core healthcare training programmes. Please also see Take home messages (box 4).
Take home messages
Genomic screening of multiple genes has huge potential to increase diagnostic rates in paediatrics (and indeed all other areas of medicine) and ultimately lead to targeted treatments (personalised medicine).
The scale of the quantity of genomic data presents unprecedented challenges in terms of the interpretation of the diagnostic significance of variants identified, in any given patient, with several potential pitfalls for misinterpretation.
Such technology will inevitably be used more and more in mainstream clinical practice, so these issues can no longer be seen as the preserve of clinical genetics alone (although this specialty will no doubt retain an important role in complex cases, family studies/counselling and education generally).
Multidisciplinary working will be particularly important in classifying variants and reverse phenotyping; this will have to be a dynamic process with the significance of certain variants being reconsidered with updated scientific and clinical information.
A traditional informational approach to consent is likely to prove impracticable for genomic testing, especially in the context of mainstream medical practice.
MJP thanks Dr Mohnish Suri, Consultant Clinical Geneticist, for his eloquent exposition of the Procrustean error in the context of making syndromal diagnoses.
Contributors MJP conceived of the idea for the review with design input from CLH and JMW. All authors reviewed the appropriate literature. CLH drafted the article with MJP and JMW providing input on specific content.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Correction notice This paper has been amended since it was published Online First. The corresponding author’s details were incorrect and these have now been updated.