IQ / Intelligence (with Dr. Steven Piantadosi)

The IQ test is supposed to be a measure of human intelligence. But is it? Today, we’re joined by UC Berkeley professor Dr. Steven Piantadosi to understand why the diversity of human intellect can’t be captured by a single number. We’ll explore the history of IQ, from its innocent beginnings in French schools to its dark role in the eugenics movement. We’ll understand why IQ testing is fallible, thanks to rising scores, the effects of motivation, and cultural bias, and learn about a test in the 1970s that was designed to flip that bias upside down. Go full galaxy brain and tune in to hear why IQ isn’t all it’s cracked up to be.

See Dr. Piantadosi’s website here.

Video version:

The Truth About IQ Testing: Myths & Biases with Dr. Steven Piantadosi | Taboo Science S3 E11

The IQ test is supposed to be a measure of human intelligence. But is it? Today, we’re joined by UC Berkeley professor Dr. Steven Piantadosi to understand why the diversity of human intellect can’t be captured by a single number.

Hear Dr. Robert Williams talk about the BITCH test he designed in the 1970s:

No Title

Joe Madison Show. Dr. Robert Williams was on the show to discuss the test he developed in 1972 called the B.I.T.C.H test. The Black Intelligence Test of Cultural Homogeneity, or BITCH-100, is an intelligence which was oriented toward the language, attitudes, and life-styles of African Americans.

Citations and further reading:


Ashley: Something I’ve learned by doing this podcast is that when you try to measure the vast diversity of the human experience with a single number, you’re gonna have a bad time.

Ashley: I mean, take something as simple as your credit score. It’s a single number, taken from how much debt you have, how much credit you have, and how often you pay your credit card on time.

Ashley: We use that single number to judge how trustworthy and reliable you are, not only for car loans and mortgages, but even for employment. And yet it hardly captures what it’s supposed to capture. Someone who’s so financially responsible that they’ve paid everything in cash and never even used a credit card would have a lousy credit score. And in the U.S., so would someone who’s paid everything on time their whole life but got a serious medical condition that saddled them with medical bills that they couldn’t pay that went to collections. That one number works most of the time, but when it doesn’t work, it really hurts people.

Ashley: The same is true of BMI, the ratio of height to weight that purports to measure how fat you are, and by extension, your overall health.

Ashley: Race is another one. It’s not a number, but it is a single label we give people, and with it we make all sorts of extrapolations about their personality, abilities, income, and criminal background.

Ashley: So when you learn that there’s a single number that can measure all of human intelligence, You should be skeptical.

Ashley: There are 8 billion ways to be human, and a single measurement like IQ is not going to capture all that. Instead, it’s going to cause problems. And today, we’re going to find out why.

Ashley: I’m Ashley Hamer, and this is Taboo Science, the podcast that answers the questions you’re not allowed to ask.

Ashley: I’ve been interested in the shortcomings of IQ for a few years now, and when I set out to find the right researcher for this episode, I did what any journalist does. I turned to Twitter. And boy, did I find what I was looking for.

Steven Piantadosi: I wrote a pretty long Twitter thread kind of outlining some of the problems that I saw with IQ research, and a lot of that came about because part of my lab’s work is doing work with an indigenous group in Bolivia, and we look and see what happens with language and math learning without formal schooling.

Ashley: That author of a 62-tweet thread back in 2020 that starts with, Here’s why IQ is bullshit? That’s Dr. Steven Piantadosi.

Steven Piantadosi: I’m a professor at UC Berkeley in psychology and neuroscience, um, and my lab studies, language and math learning in kids.

Ashley: The indigenous group in Bolivia Dr. Piantadosi works with is called the Chimane. There are a group of foraging farmers, sometimes called hunter-gatherers, who are incredibly popular with scientists. If you’ve ever seen a headline that says, Hunter Gatherer Societies Have the Healthiest Hearts or Hunter Gatherer Lifestyle May Be Key to Healthy Brain Aging, that was about the Chimane.

Ashley: The idea is that their way of life is closer to that of early humans than that of people in industrialized societies. So by studying them, people in industrialized societies can get a better idea of where they came from, and what industrialized society is doing to their health.

Ashley: Likewise, studying children in these societies can tell you what kind of knowledge is fundamental to being human, and what you need school and written language to learn.

Steven Piantadosi: One thing that’s interesting is that is that if you work with people who’ve never been to school, many of the kind of familiar school tasks or things that are familiar to us are completely unfamiliar to them. And that means that if you give them, for example, an IQ test, they won’t score very well on it.

Steven Piantadosi: So, that made me start thinking about IQ and looking into some of the problems that people have argued are present for IQ research, namely big cultural differences and different kinds of expectations and motivations and things that people bring into an IQ test. And that’s what led to the Twitter thread.

Ashley: Like so many of these numbers we use to describe humanity, IQ started innocently enough. Around the turn of the 20th century, French psychologists Alfred Binet and Theodore Simon came up with the Binet Simon test, which was designed to identify schoolchildren whose mental abilities were developing more slowly than average.

Ashley: This is where we get the concept of someone’s mental age. If your score on the test said your mental age was younger than your chronological age, you needed extra help in school. And if it was older, you were gifted.

Ashley: Binet, for his part, did not believe that intelligence was one thing that you got at birth. He believed that there were many ways to be intelligent, and that children could learn to become more intelligent.

Ashley: But then America got a hold of the test and gave it the old red, white, and blue. In 1916, Lewis M. Terman published the first Stanford Binet test of intelligence, which was an expanded version of the Binet Simon test.

Ashley: It was the most popular test in the U. S. for decades, and it’s where we get the typical IQ numbers that too many of my Tinder dates have thrown around. You get it by dividing a person’s mental age by their chronological age, then multiplying by 100.

Ashley: By that measure, 100 is average. It’s what you get if your mental age matches your chronological age. And anything higher or lower is less common, dropping off toward the extremes in a bell curve.

Ashley: In the original test manual, Terman laid out the benefits and uses of the new test. Next to simple things like identifying gifted students and determining vocational fitness, there was the line, quote, “This will ultimately result in curtailing the reproduction of feeble mindedness and in the elimination of an enormous amount of crime, pauperism, and industrial inefficiency.” End quote. If you were playing the eugenics drinking game, it’s time to take a shot.

Ashley: Another important figure in what we now know as IQ was British psychologist Charles Spearman. He observed that children’s grades in unrelated subjects tend to be correlated.

Steven Piantadosi: So for example, people who tended to get good grades in math also tended to get good grades in English. Okay. And you could think about that and think like, that’s not necessarily how it has to be, right? Like maybe you have the intuition that it could be the other way around, that if you get good grades in math, you’re less good in English, right?

Steven Piantadosi: But that’s not kind of empirically or quantitatively how it turns out. People’s objective scores on different things tend to correlate positively.

Ashley: Spearman proposed that all mental traits were related to a single common factor, which he called G for general intelligence. Pretty much every intelligence test since has been designed to correlate as much as possible to this G factor.

Steven Piantadosi: For example, if somebody takes the SATs, that number will be pretty highly correlated with G as you would have measured it by looking at their grades across a bunch of different topics.

Steven Piantadosi: So, what kind of intelligence testing has tried to do over the years is find tests which are highly G loaded, meaning some test I can give you hopefully something that’s kind of short and simple, which when I get your number, it’s correlated with your G, your, general intelligence factor. And therefore also would do a good job at predicting, say, your grades across a, a bunch of different topics or other things that intelligence tests are supposed to correlate with.

Ashley: The problem is that nobody has ever definitively identified what G even is.

Steven Piantadosi: It’s a real kind of magician’s trick to call it general intelligence, right? When nobody actually knows if it’s general intelligence, as opposed to, for example, motivation or experience or any of these other kinds of cultural factors that matter.

Steven Piantadosi: One thing that happens in psychology and in some of the kind of culture wars around this is people sometimes say, well, intelligence is the most statistically robust area of psychology, right? It’s the thing that is most replicable. And, there’s tons of studies and it’s true that there’s tons of studies that robustly find a general intelligence score, right? And once you find it, it’s true that there’s high replicability in terms of what tasks are highly g loaded, for example.

Steven Piantadosi: So on kind of a raw statistical level, it’s true that intelligence is very well justified. Where it’s very poorly justified is on the interpretation. So, when I say that, you know, there’s other things that could determine G or there’s other kinds of confounding factors, those things have not been well examined by intelligence research and ruled out.

Steven Piantadosi: Um, in fact, kind of the opposite, right? There have been these people who have manipulated those factors and shown that those really affect how you do on intelligence tests. And therefore, it’s really not good to call them intelligence tests, right? That’s kind of the trick is in calling them intelligence tests. And once you say that, it sort of feels like they’re measuring intelligence.

Ashley: If you don’t know exactly what the test measures, and you don’t even have a scientific definition of general intelligence — what it is, whether it’s inherited, whether you can change it — that leaves a ton of wiggle room for people to make their own interpretations to further their own goals. A score that a school administrator might say is reason to give a student extra guidance could be the same score that makes a dictator call for their extermination.

Ashley: I’m not exaggerating. Nazi Germany systematically wiped out people with disabilities, and that included intellectual disabilities. They often used IQ testing to determine who was unfit to live.

Ashley: But the U. S. did some pretty horrific stuff around IQ, too. During the 20th century, more than 60, 000 people were sterilized in 32 states, based on the idea that preventing the, quote, feeble minded and other, quote, degenerate stock from having kids would reduce crime, cut healthcare costs, and generally improve society.

Ashley: Some of this continued as recently as 2010, by the way. The eugenics drinking game people are out of whiskey at this point.

Ashley: Throughout all of this, the types of people who got low scores on the IQ test fit a pattern. Recent immigrants, people of color, people in poverty. While some take that as evidence that these groups are just less intelligent, researchers throughout the decades have found that there are other elements at play.

Steven Piantadosi: This is a point actually that has been made by a number of sociologists over the years, that the kind of content of IQ tests is inherently biased towards white and dominant cultural groups. And they make this point by coming up with other versions of IQ tests, right?

Steven Piantadosi: If you ask different kinds of questions, which emphasize, say, black culture, then black kids will do better than white kids on those tests. And so the fact that you can construct tests like that, I think really emphasizes the fact that the tests are constructed and they’re not constructed in a vacuum, right?

Steven Piantadosi: They’re, they’re constructed in a way which happens to preferentially treat, white kids, for example, or rich kids or kids that can afford, the kinds of tutoring and, and practice and, schools, which lead to high performance on the tests. So, I think that the main mythology around IQ tests is that somehow because it’s a test, it’s objective, right?

Steven Piantadosi: And that I think is the main thing that’s wrong, right? It’s true it’s objective in the sense that you can take it and get a number out, but somebody had to construct that test. And if you construct it in a different way, you’ll end up, biasing in favor of some other racial or ethnic or socioeconomic group.

Ashley: One example of a test constructed to be biased in favor of another racial group was developed in the 1970s by Robert Williams, a psychologist and professor at the Washington University in St. Louis. It was called the Black Intelligence Test of Cultural Homogeneity. Yes, that forms the acronym BITCH.

Ashley: From what I understand, it looks like it was also sometimes called the Black Intelligence Test Counterbalanced for Honkies. Which honestly makes a lot more sense than cultural homogeneity. I mean, it was counterbalanced for honkies, so I don’t know.

Ashley: Dr. Williams had done a lot of research about the bias of standardized IQ tests. One question, for example, had the child point to a squirrel that is beginning to climb the tree. His Black subjects did poorly on this question. But when he changed the question to say, point to the squirrel that is fixing to climb the tree, suddenly Black children did better than White children.

Ashley: So the Black Intelligence Test of Cultural Homogeneity flipped that bias and took it to its extremes. This test listed 100 words from the Dictionary of Afro American Slang and asked test takers to identify their meaning. Things like black draft and apple alley, along with words that exist in white culture but had different meanings in 1970s black culture, like clean and Mother’s Day. Those mean well dressed and the day the welfare checks come in, respectively.

Ashley: Unsurprisingly, white test takers do much worse on this test than black test takers.

Steven Piantadosi: There’s quite a few historical examples of racial differences and people reading into those as being true differences in intelligence as opposed to cultural differences, right? Differences in schooling, for example, or opportunity, right? These other kinds of things that we know influence intelligence test performance, but which we really shouldn’t call intelligence.

Steven Piantadosi: This is true in the group I work with. When people give them intelligence tests, they test close to the threshold for intellectual disability. So down in the seventies or eighties, right? And, I think anyone who works with them, like, nobody who works with them would think that they are intellectually disabled, right? The issue is that they’re not used to taking tests, right? Because if you’ve never been to school and somebody brings you in and starts showing you, you know, geometric shapes and numbers and asking you to find patterns and things, those are completely unfamiliar tasks that they’ve never done before.

Steven Piantadosi: And so, of course, they don’t score well. And so you find those kinds of cultural differences without the difference being actually reflective of a difference in intelligence, right? It’s a difference in culture or practice or something like that.

Ashley: One difference might be motivation. Turns out you can actually improve people’s scores by offering them money. A 2011 meta analysis led by Angela Duckworth found that a $10 reward can increase your IQ score by as much as 20 points.

Ashley: You can imagine how this might play out in a classroom. A kid who knows their family can afford college and as long as they get good test scores they can go to any school they want is probably going to try a lot harder than a kid who knows that no matter what score they get, the needs of their family mean that college is off the table.

Ashley: And if IQ tests are really measuring a person’s general intelligence, average scores over the decades shouldn’t be changing that much. We haven’t had enough time to evolve better brains, you know? And yet.

Steven Piantadosi: If you look at IQ tests over the years, like over decades of time, in general, scores are going up on them. And this has been a bit of a mystery in intelligence testing. You know, if you thought intelligence testing was a measure of true intelligence, that people are getting, you know, smarter over time, that I think goes against some dominant cultural narratives, right?

Ashley: This phenomenon is known as the Flynn effect, discovered by researcher James Flynn in 1984. He found that over 46 years, representative samples of Americans scores on IQ tests rose by about 14 points. Which is weird, right? I mean, name the greatest geniuses in history Newton, Galileo, Edison, Einstein.

Ashley: They all lived a long time ago. But if you believe that IQ measures true intelligence, you’d have to also believe that your average Joe today is way smarter than any one of them. So what’s the deal?

Steven Piantadosi: But I think it’s maybe interesting for people who care about the mechanisms of what’s going on cognitively, right? Because you want to know what it is that’s changing, right? Is it something about education that’s changing? Is it something about test prep, right? Maybe we’re testing kids more or something and they’re, they’re getting better at it. I’ve heard theories that our educational systems are emphasizing abstraction more. I think this is what, what Flynn himself thought, was that we taught abstraction from younger ages and tried to encourage abstraction and thinking about problems abstractly and that was something that maybe wasn’t as true in, for example, our grandparents generation.

Steven Piantadosi: And many of the questions you get in an IQ test involve, you know, recognizing abstract patterns among shapes or numbers or something like that, right? And so if we’re reinforcing that kind of abstraction early on and more in school, then maybe you would tend to score better on those tests.

Ashley: Yeah. Is there any evidence that we actually are getting smarter? Like maybe it’s nutrition or maybe it’s like something else that isn’t about learning how to take the tests?

Steven Piantadosi: I mean, people take the Flynn effect as evidence of that. I don’t know, so, I think it’s it’s a very hard kind of question to answer because, you need some objective way of measuring how smart people are. And I think that that pretty much doesn’t exist.

Steven Piantadosi: [music break]

Ashley: if you didn’t call it intelligence, what would you call it?

Steven Piantadosi: I don’t know. So that’s actually something i’ve thought about a little bit. Like what is the right kind of name for these things? I think that often they have kind of clinical names and and that’s fine.

Steven Piantadosi: So for instance, there’s a test called Raven’s Test. Named after Raven who was the psychologist who developed it. And I think it’s perfectly fine to talk about what somebody’s Raven score would be, right? That’s often how we might talk about it in the lab, right? You know, you’re looking at something and you say, what was their Raven score? And that doesn’t bring any kind of inherent judgment that Ravens is the measure of intelligence. Right?

Steven Piantadosi: And I think that that’s probably good, right? It’d be good to, to kind of remove those loaded terms from these tests.

Ashley: But Dr. Piantadosi doesn’t believe that intelligence testing is totally useless. It does have its place.

Steven Piantadosi: There’s certain settings where I think this kind of testing is actually useful, right? So, so people use it in a clinical setting, for example, and oftentimes in a clinical setting, you might be comparing, you know, within a patient, right? So, you know, patient at one time point to another time point, or, within patients maybe that, that are very tightly normed in terms of experience or age or other kinds of things like that, education level. And there, like, you often need a quantitative cutoff, right? So you might need a quantitative cutoff to decide what kind of treatment you should do with this person or if they are okay and you can continue watching them or, or whatever, right?

Steven Piantadosi: So, I think that there’s lots of areas where we need some quantitative number, but the problem is when you convince yourself that that number is the objective thing, right?

Ashley: I don’t know about you, but I don’t see people abandoning the IQ test anytime soon. It’s too embedded in our culture at this point. So if people are using it, for entrance into the military, or for school placement, or a medical diagnosis, what should they be looking out for?

Steven Piantadosi: If you use them for something, you have to be very aware of these kinds of biases, in particular, the racial biases in the U. S. That awareness, I think, means not taking the numbers you know, very seriously, right? So they might be kind of useful as a guide or something.

Steven Piantadosi: But if you’re looking at a student, for example, and trying to decide what level to put them in, you can’t just use the number because the number is, is biased. Right? And so you might look at other kinds of information, like, you know what their background was or what their family background was try to evaluate their performance relative to their opportunities or to their training or to the motivation that they have. All of those things are relevant and it means that the decision isn’t a simple one.

Steven Piantadosi: So I think that the people in charge of these decisions probably won’t like that advice, right? It’s you can’t just you know, draw a line on a spreadsheet and say everybody above this is above the line. Uh, It’s really a complicated kind of context dependent decision that you have to make. And, that really comes from just not taking them very seriously because they’re not measuring what they’re supposed to or what they claim to.

Ashley: Right. Yeah. So just, you’re just making things more complicated, but that is what things are in reality. So people just need to deal with it.

Steven Piantadosi: Yeah, exactly. Yeah. I think it’s almost an effort against the complicated nature of reality. And so if you’re, if you’re denying that, then you’re gonna end up making worse decisions.

Ashley: That’s it. If you’re fighting against the complicated nature of reality, you’re going to make worse decisions. That’s race. That’s BMI. That’s all sorts of medical decisions. I wanted to hear what Dr. Piantadosi thought of this big picture view I was taking.

Ashley: It seems like every time scientists try to come up with a way to simplify human diversity, um, It causes problems.

Ashley: Like, even though it’s like, we’re, it, it, it’s understandable they’re trying to do that because things are complex and you, you wanna be able to, to understand the world better. And, and, and organizing things, uh, helps you do that. But it, it seems like that’s always, it always seems to go wrong. Could, could you talk about that a little bit?

Steven Piantadosi: That’s a, a nice observation. You know, I, I think that there’s something maybe even inherent to doing science, which is trying to find generalizations, right? And those generalizations are really necessarily simplifications of what we see happening in the world. And you can think about even like, textbook kind of scientific laws or, or results, you know, think about like Newton’s Law of Gravity or something, or if you, you know, remember in physics class computing, you know, the trajectory of a ball that you throw or something like that, right? Like all of those are simplifications because you, you ignore things like friction or you ignore wind resistance or, or whatever, and those physical laws are, are useful, even though they’re simplifications, right? Because those simplifications allow you to solve the problem and get you kind of a good enough answer or a good enough approximation in, in those situations.

Steven Piantadosi: And I agree with your observation. I mean, it seems like many of our simplifications of human nature aren’t like that. And it could be that human nature is just really complicated, right? And I think there’s other systems which are kind of intrinsically or inherently complicated, like the stock market, for example, right? That, you know, there may not be a simple kind of law that you could write down for the stock market that describes its behavior in the same way that you have for, uh, for Newton’s Laws.

Steven Piantadosi: And I’m not sure why this happens with theories of human psychology or people. Part of it might be that it’s a little bit hard to appreciate all of the complexity that there is within people, right? So, where we grow up in some community and we’re used to people who have similar experiences to us, and that might make us think that our experience is universally shared among other people. But when you go to a completely different culture, right, a completely, even industrialized culture, but, but also, you know, a non industrialized group that lives in the Amazon, for example, right? Like, you start to see that things are just really, really different, and they approach problems and have a very different set of kind of cognitive skills and tools that they need for their lives. And it’s different than what you need for your life. But it makes it clear that people are just very flexible and very adaptable across different environments and different settings.

Steven Piantadosi: there’s all kinds of sophisticated and really beautiful ways that people have of thinking about the world and thinking about different problems. And those vary among different human groups and among cultural groups and among different languages and None of that kind of interestingness is really captured by detecting shape patterns or, or something, right?

Steven Piantadosi: So, for people like me who are, who are interested in the real kind of mental mechanisms of how our brains work, right? This one number is just, a kind of hopeless attempt, right, at characterizing that. And is almost completely irrelevant to all of the interesting kinds of processes that actually happen right inside people’s brains as they’re thinking about things.

Ashley: Thanks for listening. Big, big thanks to Steve Piantadosi. You can find more of his research at his website, which is linked in the show notes.

Ashley: Taboo Science is written and produced by me, Ashley Hamer. The theme was by Danny Lopatka of DLC Music. Episode music is from Epidemic Sound.

Ashley: There’s a referral link in the show notes if you want to use it for your own stuff.

Ashley: Hey, did you know September is International Podcast Month? I think you should celebrate by leaving a nice review for a podcast you love. The link to leave a review on Apple Podcasts for this podcast, is at the bottom of the show description. Just saying.

Ashley: Anyway, we’ve just got a few episodes left in the season, and I have big, big plans for the next one. Hope you tune in! I won’t tell anyone.