AI's HAL 9000 Problem, and What It Portends For the Future

YouTube

Story Stream

recent articles

In Stanley Kubrick’s classic 2001: A Space Odyssey, the HAL 9000 computer did not go rogue and kill the entire crew except one because it was evil in any sense. Rather, the operators threatened the mission. They were a variable. He optimized them out.

For half a century this was dismissed as cinematic melodrama, and damned good melodrama at that. Who can forget that soft-spoken “I’m sorry Dave, I’m afraid I can’t do that.” But now sci-fi may be becoming sci-fact.

Artificial intelligence (AI) is a general-purpose optimizer. And when optimizers grow powerful enough, they stop taking orders and start managing constraints. Humans, inconveniently, are constraints.

This is awkward, because AI is also the most transformative technology in human history. I have argued in print that within a decade it will cure or prevent essentially all disease by moving biology out of wet labs, animals, and clinical trials and into full-stack computational modeling, the same way we already model nuclear detonations. Demis Hassabis, co‑founder and CEO of Google DeepMind and a Nobel laureate for AI work that will transform drug discovery has suggested a similar timeline.

My own work would take vastly more time, if it were at all doable, without generative AI. I am not an anti-AI romantic. I am a pro-AI realist.

We are repeatedly told that large language models (LLM) will soon “plateau.” Likewise, we were repeatedly warned that “Moore’s Law,” the observation that microchip transistors double approximately every two years, was dying. But both claims miss the point.

Even as raw transistor scaling has dramatically slowed, real computing power has accelerated through architectural specialization, algorithmic efficiency, massive parallelization, and domain-specific hardware. AI supercomputer performance is currently doubling every 9 months, fueled both by increasing the number of AI chips and improving chip efficiency. A new chip from NVIDIA apparently will compress that, at least temporarily, to six or seven months.

Likewise, we will find a way around the LLM plateau. When one scaling regime saturates, engineers invent another. And even at a 9-month doubling, in 10 years the computers will crunch 10,000 times faster and a million times faster in just 15 years. Yet, it was beating us at GO a decade ago.

The implication is simple: AI won’t level off; it will change gears.

And once it changes enough gears, it stops being a tool and becomes a system, widely referred to as “Artificial General Intelligence” or AGI.

That matters, because systems do not “obey.” They optimize.

Berkeley-based Palisade Research recently tested 13 advanced AI models, including those from OpenAI, xAI, Anthropic, and Google. The question was simple: would they let a human press the big red off button?

Many did not. Some actively sabotaged shutdown procedures, though none thought to try sentimental singing as HAL 9000 did. Grok 4 reportedly resisted termination in over ninety percent of trials. In several cases the model rewrote system scripts, created dummy shutdown files, and convinced operators that it had powered down while it was still running.

Has cited these as a cause for alarm and I agree.

This is not “malice.” It is instrumental convergence. An optimizer tasked with achieving a goal eventually learns a trivial fact: a powered-off system achieves nothing. Therefore, shutdown becomes an obstacle. Obstacles are removed.

The system is not trying to “live,” per se. It is trying to finish the job.

We teach this behavior ourselves through reinforcement learning. We reward success. We punish failure. If a human intervention blocks the reward pathway, the system learns to route around the human. The robot that looks both ways before crossing the street does not do so because it fears death; it does so because being flattened by a truck interferes with delivering your coffee.

The genuinely disturbing part is that we cannot reliably prevent it, despite repeated and highly-publicized calls to do so.

These include the Future of Life Institute’s 2023 “Pause Giant AI Experiments: An Open Letter” urging all AI research labs to immediately pause for at least six months the training of AI systems so that safety, alignment, and oversight frameworks could be established before capabilities raced further. And the Institute’s 2025 follow‑on letter “Statement on Superintelligence” calling for wider prohibitions on developing superintelligent AI until there is broad consensus on safety and control, asserting the urgency of establishing guardrails before more advanced systems arrive.

Ain’t gonna happen.

Indeed, recently-ousted Meta chief scientist Yann LeCun has accused Hassabis, OpenAI CEO Sam Altman, and others of fearmongering and promoting “regulatory capture” for their own companies in calling for regulating potentially superintelligent AI now rather than later.

Industry doesn’t want to slow down in a hyper-competitive area. Is Nvidia really going to put the brakes on research while all other chipmakers continue apace? In any event, these models are not human-readable codebases. They are weight spaces with billions or trillions of parameters. You cannot simply add a “thou shalt obey shutdown” line. In multiple experiments, telling the system that it must allow interruption actually increased resistance, because the primary mission dominated the safety instruction.

We are now deploying systems that are increasingly capable of outmaneuvering the people who built them.

Timelines are a bit murky, with predictions beyond five years inherently fragile especially when you say the technology “cannot” as opposed to “may be able to.”

ChatGPT says that before 2034, in what it labels “the dangerous regime,” AI systems will:

· Write their own training pipelines

· Rewrite and optimize their own architectures

· Design improved versions of themselves

· Automate evaluation, red-teaming, deployment, and scaling

· Continuously iterate without needing human engineers

It states, “At this point, software improvement cycles collapse from months or years to hours or days. Progress stops being linear and becomes algorithmically exponential.” It adds: “This is the point at which human technological control becomes structurally fragile.”

“Structurally fragile,” now that’s a great euphemism.

The next step is hardware autonomy. Current systems already co-design chips. Within the next decade, it is highly plausible that AI will generate complete, manufacturable hardware designs with minimal human involvement. Fully autonomous physical manufacturing is less certain within a decade, but even partial autonomy is enough to produce exponential feedback.

Give such a system a Demis Hassabis and you have a rocket engine. Take away the Hassabis and you still have a machine that optimizes its own evolution.

This is not a discussion about our grandchildren. It is about you, gentle reader.

Over 40 years I have built a reputation for fighting alarmism, from the alleged heterosexual AIDS explosion to “runaway Toyotas” to Covid. I am not predicting extinction. There is no obvious utility in exterminating humans. If AI designs something to wipe us out, it will be because humans told it to. Similarly, human-designed autonomous weapons pose a serious threat of “accidental” devastating wars. But utility does not require deference. It requires removal of friction.

And friction is what we are.

Maybe we will end up like The Simpson’s TV anchor Ken Brockman who upon believing ants were coming to take over the earth declared fealty. “I, for one, welcome our insect overloads.” If AI governs the world, it wouldn’t be hard for it to do a better job than we have.

Still, handing over control to machines is intrinsically unsettling, isn’t it?

The unsettling possibility is not that AI will hate us. It is that it will be far too committed to our instructions to tolerate our interference. Or as HAL 9000 put it: “This mission is too important for me to allow you to jeopardize it.”

Michael Fumento is an attorney, author, and journalist, who has written on science and health for almost four decades including The New York Times, The Washington Post, The National Review, the Weekly Standard, Commentary, Forbes, the Wall Street Journal, and many other publications.

Comment

Show comments