A study can carry every mark of trust — federal money, a registered protocol, a double blind, a prestige journal — and still be built to deliver a foregone conclusion. I believe the Arnold 2021 trial is such a study, and that its real function is to supply a citable verdict against neurofeedback as a whole. Its conclusion is now used to deny people reimbursement. In my opinion, it does not survive contact with its own methods.
Both arms improved. About equally.
Start with the trial's own headline statistic. From baseline to treatment end, the primary outcome improved significantly (p<.0001) in both arms — neurofeedback (d=1.51) and the so-called "sham" (d=1.47) — with no significant difference between them. A within-group result like this cannot, on its own, prove that neurofeedback works; both arms could be improving through expectation, attention, or natural course. But it equally cannot support the headline that was sold — "neurofeedback fails" — because, as the critics show, the comparator was never inert.
Bar chart: within-group improvement effect sizes from baseline to treatment end. Neurofeedback Cohen's d = 1.51; "sham" control (EMG biofeedback) d = 1.47. The two are nearly identical, with no statistically significant difference between arms.
Two within-group effect sizes, almost the same height. The data say "no difference between two active arms." It was published as "neurofeedback doesn't work." That gap is where my argument begins.
Where I have seen this before
In 1969, an executive at the tobacco company Brown & Williamson wrote a sentence that became the template for every industry that needed inconvenient science to go away. The strategy is not to disprove your opponent — it is to manufacture enough doubt that the public, regulators, and insurers stop trusting the evidence. Sociologists Naomi Oreskes and Erik Conway later named the practice in their book Merchants of Doubt.
"Doubt is our product, since it is the best means of competing with the 'body of fact' that exists in the mind of the general public. It is also the means of establishing a controversy."— Brown & Williamson internal memo, 1969
I am not claiming the people involved here smoke cigars in a back room. I am claiming the pattern is the same — and that, in my opinion, this trial is one move in a coordinated effort to discredit a whole field, not an honest test that happened to come out negative. Here is the playbook, and how I think this case maps onto it.
Design the study so the answer is fixed
Choose a comparator, an outcome measure, and a reward protocol that cannot detect the effect you claim to test.
Reframe the result after the fact
When the data don't cooperate, change the wording so a null becomes a defeat.
Use credentialed, conflicted experts
Let people with industry ties deliver the verdict, lending it authority.
Turn the verdict into policy
Feed the conclusion to regulators, guideline bodies, and insurers so it becomes a wall.
Everything you're trained to trust — flip it over
By the standard academic heuristics this should be a high-trust study. That is exactly what makes it effective: the credibility markers stay intact while, in my reading, the experiment beneath them is hollow. Flip the panel.
The critique lists nineteen. Here are the load-bearing ones.
Documented in the peer-reviewed critical review by Schummer & Sguigna (NeuroRegulation, 2024 — which formally calls for the study's retraction) and the methodological analysis by Pigott, Cannon & Trullinger (2021). These are their findings and arguments; below, the ones I find most damning.
The hypothesis was reworded after the result was known
The registered primary comparison was neurofeedback versus "placebo sham." When the so-called sham produced a large, durable improvement, the published conclusion swapped the wording to "control treatment." The relabelling, the critics argue, buries the actual finding: the comparator was never a placebo. (The trial randomised 144 children; 142 entered the primary analysis, 84 neurofeedback / 58 control.)
The "sham" delivered EMG biofeedback — an active intervention
Barth et al. (2017) reported that EMG biofeedback alone reduces ADHD hyperactivity. Comparing neurofeedback against a second active intervention and finding "no difference" is not evidence of failure — in the critics' view it dismantles the sham-controlled inference the trial rests on. (Some trialists do accept EMG biofeedback as a legitimate sham; I address that objection below.)
It claimed to test operant conditioning — but never checked if anyone learned
The trial set out to test whether children could be conditioned to shift their theta-beta ratio, yet recorded no learning curves. There is no evidence in the data that a single child in the neurofeedback arm learned to modify the signal. You cannot conclude a treatment failed if you never confirmed it was delivered.
Auto-thresholding, Pigott et al. argue, punished success and rewarded failure
Thresholds were auto-adjusted to hold a ~80% reward rate. Pigott et al. argue that this means a child who was learning had reinforcement withdrawn, while a child whose signal worsened was rewarded — inverting operant conditioning. (Adaptive thresholding is common practice; the dispute, addressed below, is whether this specific implementation defeats the learning it measures.) The interactive demo shows what Pigott's reading implies.
An outcome measure already flagged as unreliable
Janssen et al. (2017) and Ogrim & Hestad (2013) had reported the theta-beta ratio is "remarkably stable" and does not reliably move with training. Choosing a metric the literature had already flagged as insensitive, the critics argue, all but guarantees a null result.
Technicians weren't certified; supervision fell below clinical standard
Authors interviewed for the critique disclosed that technicians lacked basic neurofeedback skill and that oversight — annual site visits plus weekly calls from one expert — sat below the clinical standard of care. A treatment delivered badly reads as a treatment that does not work.
Medicated and dual-diagnosis children were left in
Stimulant medication alters the very EEG measures used as outcomes. Including medicated and comorbid participants without stratification injects noise precisely where the signal was supposed to appear.
The reward that, on Pigott's reading, punished learning
This makes Criticism 04 tangible. Press the button: a child genuinely learns to raise the targeted brain signal. Watch what auto-thresholding — on Pigott et al.'s interpretation — does the instant they succeed.
Operant conditioning, inverted
In neurofeedback as intended, crossing the threshold earns a reward and the child learns to stay there. Under the protocol Pigott et al. describe, every time the signal rises the threshold is pushed up to keep rewards near 80% — so the moment learning shows, it is taken away.
A conclusion allegedly required before publication
One author described a surreptitious communication between the study's authors and a journal editor, indicating that JAACAP would publish the study only if its conclusion stated that neurofeedback was no better than a placebo — after which, the critique reports, the manuscript was modified to conform.— Schummer & Sguigna (2024), reporting a single, uncorroborated account from one author (the co-author Schummer sat on the committee that received it)
I want to be precise: this is one uncorroborated account, reported by a critic who is not a neutral party, naming "a journal editor" in the singular. It is an allegation, not an adjudicated fact, and I present it as such. But if it is true, it is the keystone — it would mean the verdict was decided before the data were written up, and the methodological problems above are not accidents but the mechanism that delivered it.
Who benefits when neurofeedback "fails"?
Neurofeedback competes with stimulant medication. By their own published disclosures, two of the people who shaped this study's reach are tied to the makers of that competing product line.
Per the disclosures cited in the critique: Arnold received research funding from Shire (which became Takeda, maker of a leading ADHD stimulant), Supernus, Otsuka, Roche/Genentech and Young Living, and sat on advisory boards for six pharmaceutical companies. He earlier led the $17.7M NIMH MTA study, whose follow-ups were criticised for downplaying stimulants' diminishing efficacy. The scathing American Journal of Psychiatry editorial that amplified this trial — "Neurofeedback for ADHD: Time to Call It Quits?" — was written by James McGough, who per the critique served on the board of Sunovion and consulted for Eli Lilly, Takeda and Tris Pharma. A disclosed conflict does not by itself invalidate research — but, in my view, it sets the prior on which way the "errors" were likely to lean.
The strongest objections — and why I'm not persuaded
A signed opinion owes you the counterarguments. Here are the best defences of the trial, and my honest answer to each.
"EMG biofeedback is a standard, legitimate sham for neurofeedback."
"Adaptive thresholding is routine — it isn't sabotage."
"It's a large, NIH-funded, double-blind RCT — that beats clinic anecdote."
"You sell neurofeedback — of course you'd say this."
A contested study with real-world teeth
A flawed paper is a footnote. A flawed paper that becomes policy is a harm — and harms not one study but every clinician and family relying on the field.
Cited by insurance carriers to refuse neurofeedback reimbursement — turning a contested result into a coverage wall for families.
Amplified by McGough's 2022 AJP editorial "Time to Call It Quits?" — by an author with disclosed stimulant-industry ties.
No original author has asked for retraction. JAACAP rejected the letter documenting the criticisms, saying it did not meet the journal's standards.
It did not prove neurofeedback fails. It proved the experiment did.
Here is the narrow, honest reading I stand behind: a trial whose control was active, whose outcome was insensitive, whose reward scheme may have inverted learning, and whose hypothesis was reworded after the fact was never structurally capable of testing what it claimed to test. The strongest claim its data support is not "neurofeedback failed." It is "this study cannot tell us." That a study this weak is being used to shut a whole field out of reimbursement is, in my opinion, not an accident of science. It is manufactured doubt, doing exactly what it was built to do.