Where this op-ed falls short

In a May 2026 New York Times op-ed, economist Emily Oster argues that recent evidence shows school phone bans have had "very minimal impact" on student behavior and academics, but that schools should keep them anyway. Her central evidence is a new National Bureau of Economic Research working paper by Allcott, Baron, Dee, Duckworth, Gentzkow and Jacob (2026), which evaluated schools that adopted Yondr lockable phone pouches. The op-ed also critiques two other lines of phone-effects research it finds methodologically weaker, and grounds part of its case in Oster's own classroom experience.

Oster is not wrong to say the new study tempers expectations for phone bans. The problem is what the op-ed leaves out, mischaracterizes, or treats with one-sided skepticism. Six patterns of incomplete reporting, weak inference, and rhetorical framing run through the piece.

1 The op-ed selectively omits an adverse finding from the central study. Oster summarizes Allcott et al. (2026) as showing that "phone usage went down, and teachers liked the policy (all good)." The paper also reports a 0.2 standard deviation decline in student-reported well-being in year one before rebounding to +0.16 SD by year two — a finding directly relevant to Oster's later call for "richer outcomes" to be measured. The year-one decline does not appear in the op-ed.
2 The op-ed mischaracterizes a companion Florida paper. Oster cites Figlio and Özek (2025), a separate NBER paper studying Florida's 2023 statewide phone ban, as having "found similarly small effects on test scores." Their estimates are modest in magnitude but positive and statistically significant: 0.6–1.1 percentile points overall, with larger gains for male and middle/high-school students. The same paper also reports a 30% increase in in-school suspensions among Black students. Calling the Florida findings "similarly small" leaves out direction, significance, and the disparate racial impact.
3 The op-ed omits causal and quasi-experimental evidence that complicates the "minimal impact" framing. Quasi-experimental work in Norway by Abrahamsson (2024; forthcoming 2026) finds that smartphone bans in middle schools improved girls' GPA by 0.08 SD and their externally graded math exam scores by 0.22 SD, reduced specialist mental-health consultations by approximately 60%, and reduced bullying for both sexes. Beland and Murphy (2016) find UK phone bans raised low-achievers' test scores by about 0.14 SD. None of this evidence appears in the op-ed.
4 The op-ed moves the goalposts. Much of the public case for phone bans has emphasized test scores, attention, bullying, and school climate. When evidence shows null or negative effects on those measures, Oster's response is that "we need a richer approach to what counts as a positive outcome." If the original criteria fail, the criteria are redefined rather than the policy reassessed.
5 The op-ed applies asymmetric epistemic standards. Oster correctly critiques two correlational studies on phones and academic outcomes for confounding and reverse causation. She does not apply the same skepticism to the conclusion that bans should remain in place. The same standard should cut both ways. If correlational data is insufficient to establish that phones harm students, the omitted causal and quasi-experimental evidence on bans (point 3) becomes central to the policy question.
6 The op-ed conflates a college classroom with K–12 policy. Oster's "I do not allow my students to have phones or laptops in my classroom" anecdote comes from her role as a professor at Brown University, not from a K–12 setting. The op-ed is about middle and high school phone-ban policy. Different populations, different developmental stages, different evidence base — the anecdote does emotional work the comparison cannot support.

The annotated text

It's Still Demoralizing to Teach a Classroom of Scrolling Students

The op-ed is reproduced in full below for purposes of criticism and commentary. Highlighted passages have corresponding margin notes; unhighlighted text is included for context.

In the past several years, about three dozen states have instituted phone bans in schools, and more are likely to follow. These bans have been trumpeted as game changers. Anecdotal reporting points to more books being checked out from school libraries and more students engaging with one another in the hallway. "How the Phone Ban Saved High School," reads one headline. At the same time, respected academics have suggested that the arrival of phones in schools is linked to large test score declines in countries around the world.

It was, therefore, surprising to many people when a new paper this week showed that phone bans had a very minimal impact on student behavior and academics in a nationwide sample of schools. Phone usage went down, and teachers liked the policy (all good), but test scores didn't change much, disciplinary infractions increased in the short term and there was no demonstrable effect on bullying or student attention. Basically, not much changed.

This finding should not have been as surprising as it was. Based on what we know about phones and education, it is not realistic to expect phone bans to have enormous impacts on academic outcomes. But that doesn't mean that they are a bad idea, or that they should be walked back. Instead, we need to approach this topic with more realistic expectations, a richer approach to what counts as a positive outcome and more help for families and schools.

The expectations for phone bans were poorly calibrated, largely because the data on which some of the more extreme claims about phones is based is subject to considerable biases. For example, a paper published last fall argued that increases in phone usage were tied to large reductions in test scores in many countries between 2012 and 2022. The study found bigger drops in test scores in countries with greater smartphone adoption. But it turns out that those were also the countries that had longer school closures during the Covid-19 pandemic. Phones may have played a role in driving test scores down, but since we know school closures mattered for academic progress, too, the emphasis on phones overstates their role.

There is also plenty of data showing that children who spend more time on social media do worse in school, but they tend to come from households with fewer resources. It may also be that problems in school are contributing to social media use, rather than the other way around. Finally, given that a lot of phone usage is outside of school, it's unclear if these results would really apply to phone bans in school.

The new paper out this week takes a better approach, looking at how test scores and behavior varied over time as schools restricted phone use by introducing Yondr pouches that lock away phones during the day. An earlier paper, which looked at variation across school districts in Florida as some introduced phone bans earlier than others, found similarly small effects on test scores. These are the studies we should be focusing on.

Over the next several years, we will get more data exploring these questions. I expect a cottage industry of papers on school phone bans — and we'll probably also start to see results from school districts that change technology in other ways (for example, taking computers out of early childhood classrooms). We should expect to see similar results.

It would be a mistake to interpret these findings as a sign that we should forget about phone bans altogether. There are no magic bullets in education. Improving student learning is a game of inches, not miles. There is no clear positive reason for students to have phones in the classroom. No phones should be the default, and to introduce phones, we'd want to see evidence that they meaningfully improve learning or help in another way. None of that appears in the data. On the flip side, I think the knee-jerk reaction to also remove all computers and tech is an overstatement and unrealistic.

Instead, we need to alter our expectations. Phone bans may be helpful in some ways, but they aren't a cure-all, and that shouldn't be the bar for success.

Second, we have to get better data. Test scores are easy to measure, but a lot of the discussion around phone bans focuses on the experiences of students, how they interact with one another and whether the classroom feels engaging to both students and teachers. We should be measuring those outcomes systematically. I do not allow my students to have phones or laptops in my classroom, because screens affect their participation and, quite honestly, it's demoralizing to look out at a classroom of kids scrolling on their phones. I'm guessing other teachers feel similarly; we should figure out how to measure and evaluate this, too.

Finally, we need to find a more helpful approach for schools and parents to manage technology. We've sent parents and schools messages that are simultaneously fear-inducing ("phones are ruining your children") and overly optimistic ("phone bans will make it better"). Neither of these is true, and it's time to move to something that promises less but delivers more.

For schools, that may mean keeping phone bans and making additional changes, like modifying laptop use in some classrooms, while recognizing that technology is part of modern life and not the enemy. It could also mean focusing on resources and instructional support that will actually move the needle on test scores.

On the parental side, we need fewer blanket warnings about the dangers of technology and more help drawing appropriate boundaries for our kids. Teenagers absolutely need rules and restrictions on their phone use, and they need their parents to set those — and parents need help doing that. Phone bans promised an easy fix, but they aren't magic. The faster we realize that, the faster we can make realistic progress.

Overall framing oversimplifies a heterogeneous picture "Very minimal impact" is a reasonable summary of the pooled test-score average, but it skips the rest of the paper. The findings are mixed: a year-one decline in student-reported well-being (rebounding by year two), a ~16% suspension surge in year one, opposite-signed but small test-score effects for high schools and middle schools, and a negative classroom-attention estimate the authors flag as caveat-laden. Subsequent notes (n2–n6) discuss how the op-ed handles each; this note flags the overall pitch.
Allcott, Baron, Dee, Duckworth, Gentzkow & Jacob (2026), Tables 3–7. Well-being: Year 1 estimate −0.20 SD, Year 2 estimate +0.16 SD. High-school Math: small positive (~0.024 SD). Middle-school test scores: small negative (roughly half as large). Classroom attention: negative and statistically significant in Year 2 with a pre-trend caveat. Suspensions: roughly 0.03 student-level SD increase in Year 1, equivalent to a ~16% increase in suspension rates.
Allcott et al. (2026), NBER Working Paper No. 35132
"All good" omits the well-being decline Student-reported subjective well-being declined by approximately 0.2 standard deviations in the first year of phone-ban implementation before rebounding in year two. Calling this "all good" describes only the year-two recovery and only the teacher response; it omits the year-one student experience entirely. The framing matters because Oster's later argument relies on student well-being as one of the "richer outcomes" we should now track. The paper measured exactly that outcome — and found a year-one decline. Allcott et al. (2026), Tables 5 and 7
Glosses heterogeneity by school level The pooled test-score average is close to zero, and "didn't change much" is a reasonable summary of that average. The paper, however, reports heterogeneity by school level. High schools show a small positive Math effect (~0.024 SD ≈ 0.9 percentile points). Middle schools show a small negative effect, roughly half as large in magnitude. Both magnitudes are modest. The op-ed reports the pooled average without flagging the directional split, which matters for a policy debate where middle and high schools may merit different recommendations. Allcott et al. (2026), Table 4 and Figure 5
Buries the magnitude and ignores the racial disparity "Increased in the short term" is correct in direction but understates the magnitude and entirely omits the disparate racial impact reported in the companion Florida paper Oster cites in the next paragraph. Allcott et al. report a ~16% national increase in suspensions in year one. The Figlio & Özek Florida paper shows that suspensions more than doubled in the month following enforcement and that in-school suspensions increased by approximately 30 percent among Black students specifically in year one. A short-run implementation effect that includes a racially disparate disciplinary increase merits more than a passing clause — particularly in a piece that goes on to recommend keeping the policy.
Figlio & Özek (2025), abstract and pp. 4–6: "the enforcement of cellphone bans in schools led to a significant increase in student suspensions in the short-term, especially among Black students … the ban increased in-school suspensions by about 30 percent among [Black students]." Suspension rate "more than doubled in the month after disciplinary enforcement started." Adverse effects "much more pronounced for Black students and male students."
Allcott et al. (2026); Figlio & Özek (2025), NBER Working Paper No. 34388, abstract and pp. 4–6
Conflates "no effect on bullying" with "no effect on attention" Bullying: correctly characterized — the paper estimates a null effect. Classroom attention: not correctly characterized. The paper reports a negative and statistically significant estimate for classroom attention in year two, with a pre-trend caveat acknowledged by the authors. "No demonstrable effect" doesn't capture this. A more accurate phrasing would be "no effect on bullying; classroom attention shows a negative point estimate that the authors flag as uncertain due to pre-trends." Allcott et al. (2026), Table 6
Summary compression The paper finds an 80% drop in classroom phone use, a year-one well-being decline followed by a year-two rebound, a ~16% suspension surge that fades by year three, negative test-score effects for middle schoolers, positive Math effects for high schoolers, and a negative estimate on classroom attention. "Basically, not much changed" reduces this heterogeneous, partially adverse, partially beneficial set of findings to a single dismissive line. That reduction is what allows the goalpost-shift that follows.
Partly fair, but incomplete The COVID-closure point is legitimate, and Twenge does not directly control for closure duration. Twenge herself acknowledges that country-level associations cannot prove causation, that COVID closures matter, and that the 2022 school-day leisure-use measure cannot establish trends in that measure over time. These are real limitations.

But the critique is incomplete. Twenge's paper also documents PISA-score declines beginning in 2012, well before COVID. The same paper reports that academic declines were larger in countries where adolescents spent more school time using electronic devices for leisure. And Twenge cites a 2025 randomized experiment by Sungu et al. in which collecting students' phones improved academic performance, an actual causal design. None of this evidence is engaged in the op-ed. "The emphasis on phones overstates their role" is a defensible reading of the cross-country evidence. It is not a sufficient response to the broader literature. Twenge (2026), Journal of Adolescence; Sungu, Choudhury & Bjerre-Nielsen (2025), referenced within Twenge
Critique applies only to correlational studies The reverse-causation and SES-confound points are legitimate critiques of correlational social-media-and-school-performance studies. But the critique applies only to that correlational literature on out-of-school phone use. It does not address the causal or quasi-experimental evidence on in-school bans — Abrahamsson (2024) in Norway, Beland & Murphy (2016) in the UK, the Sungu et al. (2025) RCT cited within Twenge (2026), and Figlio & Özek (2025) in Florida — none of which the op-ed cites.
Abrahamsson (2024) uses an event-study design on Norwegian middle-school bans and finds significant positive effects on girls' GPA, mental-health consultations, and bullying. Beland & Murphy (2016) find phone bans improved low-achievers' UK test scores by approximately 0.14 SD. The Sungu et al. (2025) randomized experiment, cited within Twenge (2026), found collecting students' phones improved academic performance. Figlio & Özek (2025) find significant positive year-two test-score effects in Florida.
Abrahamsson (2024); Beland & Murphy (2016); Figlio & Özek (2025)
Mischaracterizes the Florida paper's findings "Similarly small effects on test scores" misses what Figlio and Özek actually find. Their abstract reports "significant improvements in student test scores in the second year of the ban." The body of the paper estimates Year 2 spring test-score gains of 1.1 percentile points overall, 1.4 percentile points for male students, and 1.3 percentile points for middle and high-school students. These are modest but positive and statistically significant effects, distinct in direction and significance from the close-to-zero average in Allcott et al. The two papers do not converge on the same finding. The Florida paper reports test-score benefits; the national paper does not. Treating them as a unified "small effects" story erases that distinction. It also obscures the most important contrast between the two designs. In Figlio and Özek's Florida setting, the statewide ban is associated with measurable test-score gains. In Allcott et al.'s Yondr-pouch setting, the pooled average is close to zero. That contrast is the actual puzzle, and the op-ed does not engage it.
Figlio & Özek (2025), abstract: "we find significant improvements in student test scores in the second year of the ban after that initial adjustment period." Body, p. 6: "student test scores improved by 0.6 percentiles, with the ban increasing spring test scores 1.1 percentiles in the second year … These positive test score effects are larger for male students (an effect of 1.4 percentiles on the spring test in the second year) and for students in middle and high schools (1.3 percentiles)." These effects are modest, but they are positive and statistically significant; that differs from Allcott et al.'s close-to-zero pooled average.
Figlio & Özek (2025), abstract and p. 6
Burden of proof loaded asymmetrically The argument structure is: "There is no clear positive reason for phones in classrooms; therefore phone-ban defaults are justified absent evidence that phones help." On the subset of evidence Oster foregrounds, however, the case for bans on the original outcomes is itself mixed: average test-score effects are near zero, attention and bullying do not improve, and discipline worsens in the short run. The conclusion that bans should remain in place may still be defensible, but it rests on a default policy preference and on broader evidence (Abrahamsson 2024; Beland & Murphy 2016; Figlio & Özek 2025) that the op-ed does not actually discuss. The argument is presented as if the evidence settles it. The actual work is being done by a default preference and by evidence the op-ed leaves uncited.
Goalpost-moving Much of the public case for phone bans has emphasized test scores, attention, bullying, and school climate. The New York Times news coverage of the new study describes the original expectations as bans being "supposed to improve many of the problems ailing American education, including distraction, bullying, declining test scores and absenteeism." Those are precisely the outcomes on which the new evidence is mixed. Average test-score effects are near zero, middle-school test-score effects are negative, bullying is null, and classroom attention has a negative year-two estimate with a pre-trend caveat. Saying these "shouldn't be the bar for success" after the data come in redefines the criteria. New York Times, May 4, 2026 coverage of Allcott et al.
College anecdote conflated with K–12 policy Oster's classroom example comes from her role as a professor at Brown University, where she teaches both undergraduate and graduate-level economics courses. Her students are post-secondary; they are not K–12 students. The op-ed is about phone-ban policy in K–12 schools, predominantly affecting students aged 11–18 in compulsory settings. A university classroom and a middle-school classroom differ in students' age, developmental stage, voluntariness of attendance, and the interventions available to enforce a phone policy. The personal anecdote does more argumentative work than the comparison can support. Teacher experience is a legitimate input to policy debate, but it is not the same kind of evidence as a K–12 policy evaluation, and presenting it as one of the "richer outcomes we should measure" blurs the difference. Brown University, Department of Economics, faculty profile

What a more defensible version of this argument would have looked like

An honest summary of the central paper. Allcott et al. (2026) is a serious piece of work with heterogeneous findings. The paper reports an 80% drop in in-class phone use, a year-one well-being decline followed by a year-two rebound, and a ~16% suspension surge that fades over time. It also finds positive Math effects in high schools, negative test-score effects in middle schools, and a negative estimate on classroom attention. A faithful summary would report all of this. The op-ed emphasizes the parts that fit a "minimal impact" headline while omitting several adverse or heterogeneous findings.

A faithful summary of the Florida paper. Figlio and Özek (2025) find significant positive test-score effects in year two, alongside a doubling of suspensions in the first month of enforcement that disproportionately affected Black students. These findings are consequential and contested, not "similarly small." The disparate racial impact is a major policy concern that deserves direct engagement, not omission.

Engagement with the strongest causal evidence. Abrahamsson (2024; forthcoming 2026), an event-study design using Norwegian administrative data, finds that smartphone bans in middle schools improved girls' GPA by 0.08 SD and externally graded math exam scores by 0.22 SD. The same analysis shows that mental-health consultations declined by approximately 60% and bullying decreased for both boys and girls, with larger effects for girls from low-socioeconomic backgrounds. Beland and Murphy (2016) find UK phone bans raised the test scores of the lowest-achieving quintile by 0.14 SD. A piece arguing that the new evidence tempers expectations for phone bans should at minimum acknowledge this body of work and explain why it is unpersuasive. Omitting it is a choice, not a conclusion the data forces.

Symmetric epistemic standards. Oster's critiques of Twenge (2026) and the correlational social-media evidence are partially fair. The same skepticism should apply to her own conclusion. If correlational data is insufficient to establish that phones harm students, it is also insufficient to establish that bans help. The op-ed applies the standard in one direction only.

Disclosure of the central paper's industry partnership. Allcott et al. (2026) used data supplied by Yondr, the lockable-pouch company, and acknowledges Yondr as a research partner. Funding came from several major education-policy philanthropies and organizations: Arnold Ventures, the Bezos Family Foundation, the National Governors Association, Stanford Impact Labs, the Stuart Foundation, and the Walton Family Foundation. None of this invalidates the paper's findings. But a piece making an evidence-based argument should note that the central paper was conducted in partnership with the firm whose product is being evaluated. The op-ed does not.

Clarity about which question is being argued. The op-ed slides between three distinct claims. The first is that phone bans should remain in place because there is no positive reason for in-class phones and no strong evidence that phones help. A second claim is that phone bans help on outcomes that "richer" measurement would capture. A third is that the case against phones in schools is settled even if the test-score evidence is weak. These are different arguments with different evidentiary requirements. Treating them as a single coherent position obscures which one is actually being defended, and on what grounds.

References