Colleges shouldn’t use standardized admissions tests alone to measure scientific thinking skills (opinion)
Alice had sky-high GRE scores and terrific college grades, but as a graduate student in psychological science, she was lacking in one critical skill set: creativity. In her first year, she had performed just as her GREs predicted — she was a classroom star. But once she had to do a first research project, she had great difficulty in coming up with ideas that were original and useful, the hallmarks of creative thinking. As her adviser, I eventually determined that Alice’s problem was not that she could not be creative, but rather that after years of being rewarded for being a superstar analytical thinker, she had deeply suppressed any creative urges she might have had. Now she could not find them anymore.
That was back in the 1980s. I concluded that GREs and college grades did not tell us much about a student’s STEM thinking skills. By the 1990s, Wendy Williams and I decided to do a study on the predictive validity of the Graduate Record Examination for graduate study at Yale University in psychological science. We found that, beyond first-year grades, the GRE generally was a poor predictor of professors’ ratings of students’ research-based, teaching-based, analytical, creative and practical skills, as well as of professors’ ratings of doctoral dissertations.
Trouble was, we did not have a better test. Meanwhile, colleagues and I set out to study undergraduate admissions. By the 2000s, we found, in studies across the United States and a wide range of colleges and universities, that it was possible to increase prediction of undergraduate GPA if we tested not only the analytical skills measured by the SAT and ACT but also practical and especially creative skills. At the same time, it was possible substantially to reduce differences across ethnic groups in test scores so that one could increase prediction while also increasing diversity. These findings did not apply just to tests measuring general academic skills but also to specific ones. Augmenting College Board Advanced Placement tests by including creative and practical items also decreased differences across ethnic groups.
Those empirical findings, taken together, suggested to my colleagues and me that, whatever we were doing in college and university admissions, something was not quite right. Now I returned to the original problem: the question of who would succeed not only in psychological science but also in advanced STEM education more generally.
In a series of studies, we hypothesized that, whatever it is that college and university admissions tests are measuring, it was not central but rather peripheral to success in STEM education and later research (as well as teaching). So we designed a series of assessments that would measure STEM reasoning in particular. The first assessments included measures of skills in generating alternative hypotheses, generating experiments and drawing conclusions from empirical data. These skills seemed to us to be at the heart of scientific thinking.
We presented students at Cornell University with test items directly measuring those scientific thinking skills in the domain of psychological science. We also presented tests of general academic thinking skills: inductive reasoning (number series and classification of letter sets) of the kinds found on conventional intelligence tests. We further asked the students for self-reports of their SAT scores.
The results suggested that, whatever it is that conventional standardized tests directly measure, it is not scientific thinking skills. In particular, we found that, statistically, the tests of scientific reasoning tended to cluster together into one factor and the tests of general academic thinking skills tended to cluster into another factor. This is not to say that skills measured by conventional admissions tests are irrelevant to STEM success; they just do not appear to be central to it. Relying on them in isolation in admissions can, in fact, be STEM malpractice.
In further research, we sought to replicate these findings and also extend them to another domain of thinking important to STEM careers: teaching. In this work, we had students not only engage in the previous assessments but also in a new one in which they were presented with recorded scenarios of two professors teaching lessons in psychological science. Both professors purposely introduced flaws into their teaching, for example, being disorganized, answering questions poorly or even sarcastically, appearing not to know their material well, and so forth. Student participants were asked to view the teaching and to analyze the flaws in the professors’ teaching. We found that students’ skill in spotting flaws in science teaching clustered with the scientific thinking assessments rather than with the assessments of general academic thinking skills, such as number series and letter-set classifications. STEM research and STEM teaching skills, therefore, are nonidentical but closely related.
But what about other aspects of STEM thinking outside of psychological science? My colleagues and I did a further study in which we assessed the same scientific thinking skills but across a variety of STEM areas, not just psychological science. The results from the earlier studies replicated. It did not matter whether we used scientific thinking items from one STEM area or another: the scientific thinking items clustered together, as did the general academic thinking skills items.
We were still left with another question. In our assessments, students gave free responses to test items. They wrote down their hypotheses, proposed experiments and performed analyses of conclusions to be drawn. What would happen if we instead made these items multiple choice so that they more closely corresponded to the kinds of items used to measure general academic thinking skills? On the one hand, using multiple choice, it seemed to us, would decrease the content validity of the items because, in STEM research and teaching, problems are not presented in multiple-choice format. Scientists, for example, have to figure out their own alternative hypotheses to explain their results rather than selecting from among multiple-choice options created by unknown test constructors. But it seemed to us that introducing multiple-choice format might increase correlations with the conventional tests of general academic thinking skills. And this is exactly what we found. By mimicking the multiple-choice format, we increased correlations with conventional standardized multiple-choice tests.
What can we conclude from this series of studies? We can conclude what most of us, I suspect, already know — that the standardized tests currently being used in the United States and elsewhere for admission to STEM (and other) programs are remarkably incomplete in what they measure. Without STEM-relevant supplementation, using them in isolation can lead to a generation of scientists, like Alice, who are much more comfortable critiquing others’ ideas than coming up with their own creative ideas. The conventional tests do not measure creative or practical skills; they do not even directly measure scientific reasoning. They are, for many students, somewhat useful measures of what sometimes is called general mental ability (sometimes called GMA), but not of many of the skills that will matter most, whether the students will go into STEM fields or other fields.
The world is facing enormous problems. Many leaders who went through an educational funnel shaped by standardized tests are failing us. We can — and given the severity of our problems, we must — do better.
Robert J. Sternberg is professor of human development at Cornell University and honorary professor of psychology at the University of Heidelberg, Germany.
Editorial Tags: AdmissionsGraduate studentsImage Source: Istock.com/joyimageIs this diversity newsletter?: Disable left side advertisement?: Is this Career Advice newsletter?: Trending: Live Updates: liveupdates0