Can We Define Race?


August 27, 2015

Nicholas Wade has been staff writer for the Science page of the New York Times and in 2014 published A Troublesome Inheritance: Genes, Race and Human History. Some of the author’s conclusions have raised controversy, prompting two scientists to publish a thorough and detailed rebuttal of what they see as errors in Wade’s attempt to suggest racial differences.. Read the full text of the case made by Drs. deSalle and Tattersall, which they published in Gene Watch in June 2014, reproduced below.

Mr. Murray, You Lose the Bet
Nicholas Wade's newest book, A Troublesome Inheritance, suggests a biological basis for the existence of five distinct human 'races.' Charles Murray's Wall Street Journal review of the book praises Wade for shunning political correctness, but misses an important point: It's all based on some very bad science.

By Rob DeSalle and Ian Tattersall

Nicholas Wade's new book on the biology of human races, A Troublesome Inheritance, has by now been reviewed in many venues. The book has a simple structure. The first part argues that scientific orthodoxy can be stifling, and that in order to break from it there have to be brave purveyors of the truth. The second section argues that there is indeed genetic evidence for the biological basis of race. The third part suggests that, because there are races, we can now pinpoint a reason why different peoples purportedly behave differently. In his Wall Street Journal review of the book, Charles Murray suggests that this last part will be the target of most criticisms, reasoning that:

"The orthodoxy's clerisy will take that route, ransacking these chapters [the final five chapters] for material to accuse Mr. Wade of racism, pseudoscience, reliance on tainted sources, incompetence and evil intent. You can bet on it." (Italics added).

In contrast, our intent here is to examine the science and premises in the first two parts or first five chapters of this book. This is because only if the premises of these chapters have any scientific validity can the third part of the book be taken seriously.

Our reading of the first half of A Troublesome Inheritance indicates that Wade has made at least seven mistakes that are routinely committed when genomics and genetic information are used to examine the biological basis for human races, and are used as a justification for reifying race as a biological reality. We start with a foundational problem that all scientists face:

1. Misunderstanding the nature of hypothesis testing.
This first aspect of the "biology of race" controversy gets at the very core of what science really is, and indeed what the problems really are in understanding human variation. It is commonly accepted that the hypothetico-deductive approach provides the most sound and productive way to conduct science. In contrast, inductive approaches are to be avoided, because induction can only confirm what one already knows. This latter position might at first sound extreme; if you have an approach that actually confirms a scientific phenomenon, why not use it? The answer is simple: Science advances at the cost of hypotheses that are rejected, while inductive processes will always give you a positive answer. Hence, with respect to racial variation in human populations the proper approach is to pose hypotheses, and subsequently test them.

Unfortunately, one of the most common methodologies applied in the analysis of human population genetic information takes an entirely inductive approach. Called STRUCTURE, it throws data at an algorithm and asks: "How many units do I have?" This method is approvingly cited by Wade as the ultimate proof that there are five races of humankind. But while the algorithm itself is an important technical advance, how the results of such analyses bear on definitions of "race" is an entirely separate question because, as we have suggested, STRUCTURE is an inherently inductive approach. And while inductive approaches do a great job of summarizing and displaying information given a specific set of prior knowledge of a system, and in doing so can encourage the formulation of new hypotheses and refinement of existing hypotheses, they cannot be used to test hypotheses.

To make scientific statements about race, then, we need to have hypotheses in hand, arrived at inductively or otherwise. So what useful hypotheses can we offer up with respect to human genetics and the existence of human races? The most obvious hypothesis is:

H0 =  There are n "races" of a type of organism (A) that correspond to the n geographical divisions (often taken to be Africa, Asia and Europe) that we see on the planet today.

But simply posing our hypothesis in these terms brings us to the second problem with using biology to "prove" race:

2. Subjectivity in defining race (or a misunderstanding of what a species is).

How can we test a hypothesis of the kind we have just presented? First of all, we need a definition of "race" that is both objective and operationally testable. Without such a definition we cannot proceed to test the hypothesis. We cannot ask an algorithm to give us an idea of the number of races, because that would be inductive. We do have a good idea of what a species is, but the definition of the subordinate units of "race" and "subspecies" are substantially less than objective. In fact, we defy any scientist, journalist, philosopher or layperson to define race meaningfully in this biological context, and in such a way that it can be used to test H0 above. And if this can't be done, H0 becomes a useless hypothesis. However if, in contrast, you change the hypothesis to:

H1 = There are n species of a type of organism (A, B and C) corresponding to the geographical divisions (for the sake of argument, Africa, Asia and Europe) that we see on the planet today.

Then we do have a testable hypothesis because we do have an operational definition of species. You might object that this is just semantics. But in fact, objective definitions are hugely important in hypothesis testing. Without objective criteria to test our hypotheses, we simply cannot reject them.

But then you might say that "I will objectively define a race as being differentiated from other closely related entities." This is slightly better, but it is still subjective and untestable because "differentiated" is an extremely vague term. Putting numbers on it does not necessarily help, because if, for example, you refine your definition by saying that "a race is a group of organisms that are 50% divergent from the next most closely related group of things," you still have two problems. The first is that the 50% figure is entirely arbitrary, and others might think your "magic" number is not so magical. Most scientists will agree that genetic or morphological cohesion, or reproductive isolation, lie at the core of what a species is. But there is no consensus as to what degree of divergence is significant as entities go their separate ways in nature. For one group the magic number might be 5%, so that if it achieves over a 5% divergence level the probability of ending up with complete divergence, and hence becoming a new species, is high. But for another group of organisms, the magic number might well be 95%.

The second problem is that, whatever percent divergence you choose, it must mean something biological. The species definition that most taxonomists use (see below) requires 100% divergence in traits. It is either/or, and there is no subjectivity to it. The biological meaning of that 100% is that your entity is no longer meaningfully reproducing or significantly swapping genes with its closest relatives. They are on separate and historically established evolutionary trajectories. Percent divergence might mean something if researchers could pin down a magic threshold, but as we have just pointed out this is a very slippery concept.

Yet this is how Wade described the process of species formation in a recent broadcast interview:
"Since evolution happens all the time, it's a continuous, unstoppable process that as a population splits, the two halves will continue to evolve, but now independently. So, over time they will accumulate differences between each other and eventually they'll become new species."

While we know from experience that radio interviews can be harrowing, and that it is difficult to completely explain things in short sound bites, this description of species formation is pretty close to the portrayal he provides in his book. And what is particularly enlightening is that, directly prior to offering this definition he said:

"… regionality underlines the fact of race because the populations on each continent have been evolving independently since we left our African homeland about 50,000 years ago."

The subjective perception of species, population evolution and regionality expressed here leads to unwarranted conclusions about the existence of any entity below the level of the species Homo sapiens. This appears to reflect a failure on Wade's part to grasp the subtleties of taxonomic science. This misapprehension has led to the third mistake we see in his reasoning:

3. A misunderstanding of the rigors of taxonomic science.
Understanding our origins, and indeed the biology of all organisms on the planet, is really a problem of taxonomy. This vital branch of natural history is sometimes derided as "stamp collecting," but this claim could hardly be farther from the truth. Taxonomy is a well-developed and highly scientific endeavor that has been around in some form ever since humans began to name things. The science of taxonomy combines simple but rigorous hypothesis testing approaches, with objective definitions of species. It is true that taxonomists occasionally use the terms "subspecies" and "race" in their descriptions, but only as conveniences to imply future hypotheses to be tested.

The genomic approach to the existence of races in human beings has usually involved collecting the frequencies of variants at a large number of locations in the human genome, from increasingly large numbers of people. Of course, nobody would put much stock in a test of a hypothesis involving only two individuals from each of the geographic regions suspected of diverging. If one examines too few individuals there is a danger of over-diagnosing the number of entities (i.e. of finding purely random evidence for differentiation). Another caveat is that examining too few populations will also result in over-diagnosis. Consider the following scenario: populations of a cosmopolitan organism are examined for their genetic variability by sequencing the genomes of individuals from Africa and Oceania. Not surprisingly some genetic differences are detected and found to be significant, in that some are unique to the individuals from Africa while others are unique to individuals from Oceania. A big hoopla could be made, and species existence could be claimed, but this would be poor science because the severity of the test is so low as to make the test meaningless. Why? Because the organism might also exist in Europe, the Americas and East Asia. By leaving out the populations "in between" one would miss the connectedness of the two populations initially sequenced. This phenomenon in widely-distributed populations has led many researchers of human genetics to the words of Frank Livingstone: "There are no human races, there are only clines.”

Wade understands this. Here is how he describes a genome-level polymorphism study and how it can be interpreted in a taxonomic context. He first uses a study by Rosenberg et al. (2006) to suggest that there are five clusters of people on the planet. This important study used genomic information (nearly 400 markers) from 1,000 people, and employed the STRUCTURE clustering approach. These 1,000 subjects "clustered naturally into five groups, corresponding to the five continental races." This study was soon criticized by several researchers, who objected that intermediate populations needed to be examined to exclude potential clinal variation. Wade then describes the next study that Rosenberg et al. did, which was to increase the number of markers to nearly 1,000 (REF). Not surprisingly, they obtained the same results. Wade uses this second study to suggest that more data in this case address the "cline" criticism. More data would certainly help – they

always do – but the critical addition in this case would not be more genetic markers, but more individuals from different geographic areas. These were not supplied, but Wade nevertheless uses the expanded genomic information (i.e. the doubling of the number of markers) to state categorically that "They found the clusters are real." (Italics added).

More importantly for our argument about taxonomy, Wade goes on to discuss the inclusion of new information (using a newer genetic survey technology than in the Rosenberg et al. study) to address the problem. In this newer study, (Jun et al., 2008) 1,000 different individuals were surveyed, but from 51 well defined geographic areas. And instead of five major groups, the researchers in this study clustered their subjects into seven major groups. What is more, when even more subjects were added to Rosenberg's data set, as was done by Sarah Tishkoff and her colleagues (Tishkoff et al., 2008), 14 clusters were inferred. You might have smelled a rat here. But here is how Wade handles this new information:

"It might be reasonable to elevate the Indian and Middle Eastern groups (the two new ones) to the level of major races, making seven in all. But then many more subpopulations could be declared races, so to keep things simple, the five-race continent based scheme seems the most practical for most purposes." (Chapter 5, p 102)

Any self-respecting taxonomist would avoid the kind of language used by Wade here. It is unscientific and circular. We have heard the argument that just because inferences about the number of races vary, it doesn't mean race doesn't exist. An argument commonly used to shore up this view is that people disagree on the number of shapes, but shapes still exist. But this argument merely trivializes the definitions we use in science generally and taxonomy specifically.

There are 6-7 billion human beings on the planet, and the best test of any hypothesis about human genomes and populations would include them all. Of course, this is not possible at present. But if it were possible, and the clustering were performed as in the two studies we refer to above, we wonder how many groups might fall out. We suspect that, depending on the markers used, it might be as many as the number of nuclear families there are on the planet. Certainly the patterns that would emerge from such a global analysis would not be anywhere near clear with respect to any definition of race that one could come up with. Clearly, clustering is inadequate on its own to address problems like this in taxonomy and systematics. Which brings us to our fifth mistake made by proponents of a biological basis for race.

4. Misunderstanding the meaning of clustering and evolutionary trees.
Wade's "evidence" for the biological basis of races is based purely on clustering. But clustering is only one way genetic data (or any other kind of discrete data) can be analyzed to test hypotheses. Perhaps a better way to do this is to use a branching diagram based on the reconstruction of the evolutionary events that led to the branches. Significantly, Wade does not present this kind of information or analysis in his book, possibly because researchers have for a long time realized that branching diagrams cannot represent the patterns of evolution of individuals that belong to the same species, something that directly reflects the difficulty and artificiality of sorting individuals into "races." Branching diagrams can be very useful when used on single genes, and are extremely informative when used on clonal molecules like the maternally inherited mitochondrial DNA and the paternally inherited Y chromosome. But, to our knowledge, no correctly-conceived attempt to build evolutionary trees with a large number of recombining genetic regions such as those on our autosomes has resulted in a tree with any resolution. The bottom line here, then, is that hierarchical structuring of humans using phylogenetic trees based on the entire genome gives an unrecognizable and unresolved bush. But if that is the case, why do clustering methods appear to recover "structure"? We suggest that part of the reason is the next mistake in our list, the one that is made in doing genetic studies of geographically separated human populations by cherry picking, or the phenomenon we prefer to call the "Stephen Colbert effect.”

5. Cherry picking AIMs: The Stephen Colbert Effect.
Most of the early clustering studies used a number of genetic markers (in the range of 1,000 markers). More modern studies up the ante into the hundreds of thousands of markers. These markers are chosen because they are believed to be informative about the ancestry of people, which is why they are known as "Ancestral Informative Markers," or AIMs. These markers are established using what we like to call the "white swan" principle. People of different geographic origins have their genomes scanned, and when a particular variant appears at high frequency for a geographic location, that variant is said to be a marker for people from the geographic region concerned.

It is safe to say that this procedure introduces a bias into how the data are interpreted. This bias is so extreme that, when Stephen Colbert was presented with a genetic survey of his genome on the PBS show Faces of America, he was told he is 100% Caucasian. Some of the other guests were given similar results: YoYo Ma was told he is 100% Asian. But some individuals were shocked by their results, among them Eva Longoria who was given figures that deviated considerably from her prior view of her ancestry.

So what was going on? Currently, there are nearly 30 million places along the chromosomes of humans (of 3 billion total places) at which we can vary. But between any two randomly-chosen humans there are only about 3 million places at which individual people might have different DNA sequences. So if the typical ancestry study uses 300,000 markers (not too far from the actual number examined by commercial laboratories nowadays), it will only be looking at 10% of the potential differences between any two genomes, or about 0.1% of the entire genome. At best, then, these studies scan less than 1% of a human genome. What about the other 99% or so? Much of this remainder is not variable, but that part of it which is variable is African in origin. This means that 99% of the total variation in any human genome should be considered as African. And what that in turn means is that Stephen Colbert is actually 99% African, and at most 1% Caucasian.

A common argument used against this observation is known as the "Mount Everest Paradox." The argument goes as follows: The elevation of Mount Everest differs from the surface of the ocean by an incredibly small fraction (about 0.0008) of the Earth’s

diameter. But anyone standing at the foot of Mount Everest can tell the difference, and it is huge. Again, this is a trivial and unscientific argument: One could just as easily argue that, to a bacterium, a golf ball looks like Mount Everest (indeed, a 0.0002 percentage diameter-wise). Indeed, any golfer can tell you how hard it is to find a golf ball in the rough. It is not the changes or differences that matter, but rather what the differences mean, and whether or not there is an objective way to interpret them. Some researchers prefer to interpret this information in the context of ancestry, which brings us to the sixth major mistake Wade makes.

6. Conflating racially based genetic differences with explanation of ancestry.
The broader availability of genetic ancestry testing has made it something of the norm amongst people who are interested in their ancestry. But what do ancestry tests tell us? They basically tell us about the chunks of DNA in our genomes and where they might have come from. In this context, as some authors have claimed, ancestry testing has become a proxy for race determination. This is an unfortunate development in the use of genetics and genomics, mostly because our genomes are mosaics of ancestry, even including chunks of DNA that show ancestry with other species. But this ancestry approach is also flawed when it comes to our understanding of race in humans – again, because there are no definitions as to how many of the variants (and even more complicated, which variants) can make a difference between groups of people. Because ancestry can be traced all the way to the related family level, we suggest that the ancestry approach is not informative to the hypotheses we posed in the first part of this piece. Like race, ancestry is clinal with respect to any purported higher level, and ancestry simply connects us with one another. So what, in the end, do genetic ancestry tests tell us? Perhaps a good way to view the whole ancestry business is to use a term recently appearing in the literature to describe ancestry tests from companies: "recreational genomics." Such recreational approaches offer little, if anything, to science. It is arguable whether they even offer anything to those engaged in the recreation.

It is often argued that, in order to study the movements of humans and their evolutionary history, we need to speak about races. But this is entirely false, because we already have an excellent grasp of how humans migrated in the past based on mtDNA and Y chromosomes and the fossil record. We are not impeded at all in these endeavors by the lack of formally defined biological races. This is because we use clinal markers that follow individual haplotypes, and hence no a priori definition of race is needed to interpret the results of such tree-based analyses. It is also argued that ancestry is an important component in medicine, and the jump is made then that race is essential to the health of people. Because we argue that medicine will soon benefit from individualized genomics – and because, as we point out in our book, race and ancestry have been poor tools in medicine – we suggest that there are no coherent nor permanently cogent reasons to consider race in medicine. Perhaps ancestry will be important, but a concept of race in medicine is really barking up the wrong tree.

7. Conflating variation and allele frequency differences with adaptation (and hence elements of the human condition).
Adaptation and allele frequencies are the focus of Wade's last five chapters, and are extensively discussed in our book Race? Debunking a Scientific Myth. Wade's apparent justification for this is that we need to have a notion of races so that we can explain why some of us look different from others. Yet nearly all of the (remarkably few) "adaptations" that can be identified appear to be intensely local in their occurrence – for example, the diverse responses to high-altitude living, and to living under intense solar radiation – and are not at all usefully illuminated by any concept of major “races."

As noted above, Charles Murray placed the bet that attacks on Wade's book would be made on more sociological lines, based on scientists' fear of breaking away from tyrannical orthodoxy. Indeed, Wade addresses this tyranny issue in the first few pages of A Troublesome Inheritance. The fear from his perspective is that unorthodox thinking tends to get stifled by orthodoxy so that progress, both scientific and social, is impeded. We could not disagree more with Murray and Wade on this matter, but have refrained from going anywhere near that kind of argument. To us, the most important thing is that when the science itself is examined, and placed under real scrutiny, the thesis of the book fails miserably and Mr. Murray loses his bet.

We call Wade's insistence that science advances by departure from orthodoxy the Indiana Jones Fallacy. It is especially important to understand this idea's fallacious nature because all of the positive reviews of Wade's book (Murray's included) have harped on the far-reaching importance of Wade's departure from the tyranny of scientific orthodoxy.

As scientists, we recognize how gratifying it would be if every published scientific paper was earth-shaking and unorthodox. If so, scientific progress would be rapid and unlimited! But the sad truth is that much of science is rather boring and procedural – just as rigor demands. Even the hypothesis that there are genetic differences amongst people from different geographic regions – classifiable or not – is really quite mundane, since of course there are differences, as there are in any widespread species. We don't need to spend millions of dollars sequencing genomes to know this. The real questions are whether or not the differences really are significant, and/or interpretable in a rigorous scientific context, and whether the classification of people into races helps us to understand them. In the first case, while there may be minor differences, they do not seem to sort out on larger scales. And in the second case, the answer is a resounding “No!"

This last point may seem at first glance a bit counter-intuitive, because on the street it is often possible to broadly sort a fairly large proportion of your fellow citizens by general geographic origin. And indeed, for almost all of the past 50,000 years or so since Homo sapiens has been widely present throughout the Old World, our hunting-gathering precursors were sparsely spread out across vast landscapes, and constantly buffeted by rapidly-changing climatic and environmental conditions. This provided optimal circumstances for the incorporation of minor genetic novelties into local populations, and explains why, for example, Africans generally tend to resemble each other more closely than they do Eastern Asians or Europeans. But all of us remained members of one single,

interbreeding species, and we guarantee that the edges between populations were never sharp. What is more, over the past ten thousand years since the adoption of a more settled way of life, demographic circumstances have changed entirely as populations have mingled on a large scale and often over vast distances. This, above all, is why it is hopeless to look for the boundaries that are necessary if we are to usefully recognize "races." The central tendencies may be there, but the boundaries aren't. Which means that "race" is a totally inadequate way of characterizing, or even of helping us to understand, the glorious variety that is humankind.

Rob DeSalle is a curator at the American Museum of Natural History in the Sackler Institute for Comparative Genomics, a co-director of its molecular laboratories and a member of the Board of Directors of the Council for Responsible Genetics. He has written over 300 peer-reviewed scientific publications and several books.

Ian Tattersall is curator emeritus in the American Museum of Natural History and author of several books, including Paleontology: A Brief History of Life (2010). Tattersall and DeSalle co-authored Race? Debunking a Scientific Myth (2012) and Human Origins: What Bones and Genomes Tell Us about Ourselves (2007).