Introduction

Humans have a remarkable ability to recognize and identify faces. Computational models of face recognition are particularly interesting because they can contribute not only to theoretical insights into the visual processing of human faces, but also to practical applications such as the identification of individuals for security or forensic purposes. The presence of a specific facial pattern in patients with a genetic syndrome indicates that there is consistency in the phenotypic expression of the affected genes.1 Compared to the recognition of familiar faces, the recognition of a disease-specific facial pattern in unrelated individuals is much more difficult, because it requires the skill to extract such a pattern from the facial appearance that is also influenced by family background and environment. Experienced geneticists can do this relatively well, but computers have so far been unable to do this.2 Objective techniques for assessing craniofacial morphology by anthropometric measurements on patients (anthropometry) or 2D and 3D photographs (photogrammetry)3,4,5,6,7,8 can be used to determine distances between facial landmarks and delineate certain facial features, but are insufficient to describe the overall facial pattern or ‘gestalt’. Preliminary work by Herpers et al9 was aimed at distinguishing between images of normal and dysmorphic faces. We have asked whether a computer can recognize and classify dysmorphic faces solely on the basis of gray-scale digital 2D photos. The photos were preprocessed with Gabor wavelets10 to yield robust descriptions of image regions. These filters also serve as a model of simple and complex cells in the primary visual cortex of mammals,11 and therefore are close to the early processing taking place in human perception. Technically, they lend themselves because they provide some invariance under lighting changes.

Patients and methods

Patients

Patients with Cornelia de Lange syndrome (n=12) (MIM 122470), fragile X syndrome (n=12) (MIM *309550), mucopolysaccharidosis type III (n=6) (MIM *252920), Prader–Willi syndrome (n=12) (MIM 176270), and Williams–Beuren syndrome (n=13) (MIM 194050) were ascertained in 2000 by contacting the appropriate parents' support group. At the annual meetings of the parents support groups, the patients and their parents were informed about the study design, and photographs of the patients were taken after informed consent was obtained. Several photographs of each patient, especially in inco-operative ones, were taken to ensure that a sharp photograph in optimal pose was obtained. At least two independent clinical geneticists had established the diagnoses in the patients. For fragile X syndrome, mucopolysaccharidosis type III, Prader–Willi, and Williams–Beuren syndromes, diagnosis was confirmed by biochemical or molecular tests. As there is no specific test for Cornelia de Lange syndrome, we included only patients who presented with typical clinical findings, that is, characteristic facial phenotype, microcephaly, short stature, brachydactyly, hirsutism, and psychomotor delay. The age of the patients (22 females/33 males) ranged from 1 8/12 to 33 8/12 years with a mean of 11 5/12 years.

Methods

Digital photographs of the patients' faces were taken with a resolution of 640 × 480 pixel in frontal pose in front of a white, unstructured background with the camera Nikon Coolpix 950. Diffuse skylight avoiding shadows was used. The faces of the patients were cropped, all standardized to 256 × 256 pixel. Colored photographs were transformed into gray-scale photographs, the best photograph of each patient was selected (Figure 1). The further processing is based upon the bunch graph matching algorithm,12 which was originally developed to solve the correspondence problem for face recognition as used in the face recognition technology test (FERET).13 The database of faces is represented in terms of a bunch graph (Figure 2b). The bunch graph is constructed by hand labeling each face with 48 graph nodes at defined regions of the face (Figure 2c). They are connected by 117 edges, which code the topology between the individual nodes. The graph nodes are labeled with bunches of feature vectors, the so-called jets (Figure 2a). A jet contains local texture information and is calculated with a set of Gabor wavelets of different spatial sizes (five exponentially spaced values with effective radii of 8, 11.3, 16, 22.6, and 32 pixel, respectively) and orientations (eight evenly spaced values between 0° and 157.5°) (Figure 2a). For each of the 40 combinations, one filter of odd and one of even symmetry are applied, and the root of the sum of the squared results of both is stored in a 40-dimensional feature vector, the jet (Figure 2a). For classification of the faces, bunch graphs of 54 faces – excluding the test image – were used. This leave-one-out method was used because of the sparseness of the data available and made it possible to test 55 different bunch graphs on one to two test images of persons completely unknown to the system. The test image was automatically labeled using elastic graph matching,12,14 which has been proven to be among the best methods for face recognition12,13 and face finding.15 For classification of the faces, the bunch graph is matched to an image by optimizing the similarity between the automatically labeled image graph and bunch graph, which is defined by the node similarity averaged over all 48 nodes. The node similarity is the maximum of all jet similarities at the node. The similarity between two jets is calculated by their normalized scalar product, which renders it independent of contrast changes in the image. At each node, the most similar jet votes for its syndrome (jet voting),16 and the bare majority over all nodes determines the final classification. The set of nodes with a right to vote can be restricted (in this case from 48 to 32), a modification that improved the results considerably.

Figure 1
figure 1

Overview of the patients. Patients 1–12 – Cornelia de Lange syndrome: six females, six males, age ranges from 5 3/12 to 33 8/12 years, mean 14 7/12 years. Patients 13–24 – fragile X syndrome: all males, age ranges from 4 9/12 to 13 10/12 years, mean 9 8/12 years. Patients 25–30 – mucopolysaccharidosis type III: four females, two males, age ranges from 7 to 13 2/12 years, mean 9 11/12. Patients 31–42 – Prader–Willi syndrome: seven females, five males, age ranges from 4 11/12 to 20 10/12 years, mean 10 6/12. Patients 43–55 – Williams–Beuren syndrome: five females, eight males, age ranges from 1 8/12 to 28 5/12 years, mean 11 8/12 years. *, correct syndrome recognition with inner nodes; ^, the correct and an incorrect diagnosis scored equally well.

Figure 2
figure 2

Computer-based recognition of dysmorphic faces. (a) Examples of Gabor wavelets of different spatial sizes and orientations and of even and odd symmetry, which are used to analyze digital photographs. At each point, the results for each size and orientation are arranged into a feature vector called jet. (b) Schematic view of a bunch graph. The displayed graph contains nine nodes and 13 edges, each node is labeled with the jets from six persons. During comparison to an image, the jet with highest similarity is selected independently at each node, here indicated by gray shading. This jet votes for its person's syndrome, the overall decision is reached by majority. (c) Frontal view photograph (256 × 256 pixel) of Patient 4 with Cornelia de Lange syndrome. The graph is labeled on the photograph including all 48 nodes, each of them referring to a fixed evaluation point. (d) The number at each node position indicates how often a jet at that node contributed to a correct diagnosis. The inner facial nodes are much more important for correct syndrome recognition than the outer nodes or nodes placed on the hair. All nodes are necessary to locate the face in the image initially.

To determine the recognition rate by human experts, the photographs were shown to six clinical geneticists of different experiences, who attended a dysmorphology workshop in Kiel, Germany, on 19 June 2002. Each photograph was shown for 12 s without disclosing any clinical data. Each geneticist was asked to assign the patient to one of the five groups.

Results and discussion

We took a frontal view 2D digital photograph of 55 patients with a well-known genetic syndrome characterized among other things by a specific facial phenotype: Cornelia de Lange syndrome (n=12), fragile X syndrome (n=12), mucopolysaccharidosis type III (n=6), Prader–Willi syndrome (n=12), and Williams–Beuren syndrome (n=13) (Figure 1). The photographs were preprocessed with Gabor wavelets to yield robust descriptions of image regions (Figure 2a). The feature vectors (jets) at defined facial nodes were stored in a database called bunch graph (Figure 2b,c). Using a leave-one-out method, each photograph was then compared to all other faces in the database.

In the first run – when using all 48 nodes – syndrome recognition was correct in 32/55 patients. The syndrome-specific recognition rates ranged from 0% (mucopolysaccharidosis type III) to 83% (Cornelia de Lange syndrome). For detailed results, see Figure 3. The overall recognition rate (58%) was significantly different from random assignment of the patients (20%). This result suggested that the program has the potential of recognizing disease-specific facial patterns.

Figure 3
figure 3

Syndrome recognition rates: white bars, results based on all nodes; light gray bars, inner nodes only; dark gray bars, human experts. Numbers at the top of the bars indicate absolute values. CdL, Cornelia de Lange syndrome; fraX, fragile X syndrome; MPS, mucopolysaccharidosis type III; PWS, Prader–Willi syndrome; WBS, Williams–Beuren syndrome. all-1, overall analysis; all-2, overall analysis based on correctly identified cases plus cases in which the correct diagnosis and an incorrect diagnosis scored equally well. The mean numbers of patients correctly classified by the six human experts have been rounded up/down to integers.

To improve the recognition rate, we investigated how often a node was used in correct syndrome recognition. As expected, the nodes on the forehead, the eyes, the nose, the mouth, the chin, and the cheeks were more important for decision-making than nodes at the border of the face and the hair (Figure 2d). Consequently, we selected a set of 32 inner facial nodes (all nodes except 14–16, 18–25, and 28–32, see Figure 2c) and reanalyzed the photographs. This time, the syndrome-specific recognition rates ranged from 50% (mucopolysaccharidosis type III) to 83% (Cornelia de Lange syndrome and Prader–Willi syndrome; for detailed results, see Figure 3). Although there was some improvement in the recognition of MPS faces, the poor recognition rate is probably because of the relatively small numbers of MPS photographs in the database. The overall syndrome recognition rate improved to 42/55 (76%). In another four patients (7%), the correct diagnosis and an incorrect diagnosis scored equally well. If these hits are included in the calculation, the overall recognition rate was 84%.

To compare the performance of our ‘syndrome classifier’ with human experts, we tested the rate of immediate pattern recognition by six clinical geneticists (for details, see Methods). As shown in Figure 3, syndrome recognition was correct in 34/55 (62%; range, 44–69%). We are well aware that this setting does not reflect clinical practice and that the two test methods are not strictly comparable. However, the results indicate that the computer can approach the rate of immediate pattern recognition by humans.

For nine patients who had been correctly classified, we had an alternative photograph, which, however, was not as good as the original photograph (Figure 4). Nevertheless, in 7/9 cases, the second photograph – after removal of the original photograph from the database – was correctly assigned based on either all or inner nodes. This demonstrates that the recognition is highly reproducible in the same patient.

Figure 4
figure 4

Alternative photographs: the numbers refer to the patients shown in Figure 1.

The overall recognition rate was remarkably high, especially if one considers the fact that our set of patients includes both sexes, that their age range is quite large (1 8/12 years to 33 8/12 years), and that some photographs are suboptimal with regard to pose, mimic, and lighting. It is not unreasonable to believe that the use of better standardized photographs as well as sex- and age-specific databases will improve the performance of our program. It should be noted that in clinical practice, additional information such as anamnestic data and clinical findings is used to make a specific diagnosis. We envision that the combination of our ‘syndrome classifier’ with established clinical databases17,18 will provide an ideal system for syndrome recognition. It is also worth noting that the ‘syndrome classifier’ does not require specific equipment; the pictures can be taken with a standard digital camera at diffuse skylight, and the program runs on a personal computer, which evaluates one image in under 10 s.

To our knowledge, this is the first successful demonstration that a computer is able to recognize a syndrome by facial resemblance. On the other hand, our results prove that there is indeed a specific facial pattern of certain syndromes and that these patterns can be compared by mathematical tools. This achievement provides a quantitative basis for analyzing the genetic variation of the facial ‘gestalt’ and opens new possibilities to study what is probably the most complex trait of humans: the face.