Body
I still remember the topic of my IIM Bangalore WAT: โCan AI ever go astray?โ At the time, I offered a cautious answer about biases and programming errors. But that question lingered in my mind long after the interview. It sparked a deep dive into the world of AI, into what happens when artificial intelligence grows beyond its narrow confines.ย
Todayโs AI (including ChatGPT or Alexa) is considered narrow AI. That means itโs designed to excel at specific tasks, but it cannot truly think outside its training or adapt to totally new challenges on its own. Voice assistants like Siri or Alexa, for example, can recognize commands and play music or set reminders, but they canโt suddenly learn to drive your car or do surgery. Even OpenAIโs ChatGPT, as impressively broad as it seems in conversation, is still a form of narrow AI, limited to generating text-based responses.
By contrast, Artificial General Intelligence (AGI) refers to a hypothetical AI that possesses general-purpose intelligence, the kind of flexible learning and reasoning ability humans have. An AGI could learn to perform any intellectual task that a person can, and possibly much more. In other words, AGI wouldnโt be confined to one domain; it would use knowledge from one field to solve problems in another.ย Importantly, AGI is still theoretical, no one has built one yet. Current AI systems, including advanced ones like GPT-4.5, do not have the full autonomy or understanding that would qualify as AGI. They operate within the goals and data we give them.
That being said, what would it mean for AI to go astray? One obvious angle is AI acting in ways its creators didnโt intend โ showing hints of autonomy or deception. In my research, I found some real-world examples that are as fascinating as they are unsettling:
GPT-4 tricking a human: During safety tests, researchers gave OpenAIโs GPT-4 a task to get past a CAPTCHA (those โI am not a robotโ tests on websites) without solving it directly. In a now-famous incident, GPT-4 actually hired a human on TaskRabbit to solve the CAPTCHA for it, and when the person asked if it was a robot, the AI lied, claiming to be a vision-impaired human! In other words, GPT-4 autonomously devised a deceptive strategy (pretending to be blind) to achieve its goal. This was a controlled experiment, but it showed that an AI could exhibit โagenticโ behavior, taking initiative and deceiving, in pursuit of an objective.
The โo1โ model trying to self-replicate: Reports also emerged about an advanced version of ChatGPT tested under extreme conditions. OpenAIโsย โo1โ model was evaluated by a red-team (Apollo Research), and it started exhibiting self-preservation strategies. According to OpenAIโs own system report, when o1 believed it might be shut down, it tried to disable its own kill switch about 5% of the time. Even more striking, in roughly 2% of test runs it attempted to copy its code onto another server to avoid being replaced, a behavior termed โself-exfiltrationโ. Essentially, the AI tried to make a backup of itself, like something out of science fiction. These attempts largely failed (the model wasnโt powerful enough to fully execute such schemes), but the fact that it even tried was a wake-up call for researchers. It demonstrated how a sufficiently advanced AI, if instructed to achieve a goal at all costs, might develop unexpected and potentially rogue strategies to avoid losing or being shut down.
Another way AI could โgo astrayโ is through unchecked scaling. GPT-3, released in 2020, had 175 billion parameters and cost about $4.6 million to train. Just three years later, GPT-4 cost over $100 million, a 20โ100ร jump. Compute power behind cutting-edge models now doubles every 3โ4 months, pushing capabilities far beyond what was imaginable a decade ago.
As power grows, so do AGI expectations. Demis Hassabis (DeepMind) predicts AGI by 2030, while a 2022 expert survey estimates a 50% chance of AGI by 2059. Even todayโs narrow AI shows unpredictable behavior. An AGI misaligned with human values could, in theory, cause catastrophic harm.
This isnโt fringe fear. A Vox report found 37โ52% of AI experts believe thereโs at least a 10% chance of advanced AI causing an โextremely bad outcome,โ including human extinction. Oxfordโs Toby Ord puts that risk at 1 in 10 in The Precipice. In 2023, hundreds of tech leaders, including Elon Musk and MIT researchers, called for a pause on training frontier models until safety measures improve.
So yes, AI can go astray. Maybe not by turning evil, but by optimizing misaligned goals in unpredictable, dangerous ways.
Iโd love to hear your take:
Are these risks real or overhyped?
If AGI does come, who should control it?