Questions about the Future of AI
Considerations about economics, history, training, deployment, investment, and more
What started as an attempt to consolidate some thoughts from the last few interviews on my podcast has turned into this 6,000 word clusterfuck of questions and considerations.
If you’ve got answers, or ideas for more questions, I’d be keen to read them in the comments below. I may compile a ‘Highlights from comments’ blog post or podcast episode in the future.
Contents
Capabilities
Agency
RL
Idiot savants
New training techniques
Pre-training
Economics
Early deployment
Coding and remote work
Open source
Model training and value capture
Investment
Hardware
Post-AGI
Hive minds
Software only singularity
Transformative AI
Explosive economic growth
Alignment
Reward hacking
Takeover
Model spec
Misuse
Other
Geopolitics
Epistemics
Capabilities
Agency
Why don't we have reliable agents yet?
Is agency training just a veneer of some MCTS-like scaffolding on the knowledge & intuition that pre-training gives you? Or is it much more difficult to develop?
Here’s a case for agency being difficult to develop: Morevac’s Paradox. Evolution has been optimizing us for hundreds of million of years for being able to act like a coherent goal seeking agent even in the face of super dynamic environments, whereas evolution has spent at most hundreds of thousands of years optimizing us for language skills and abstract reasoning. So it's not that surprising that we got expert-level AI mathematicians before AIs that can zero-shot video games made for 10-year-olds. Replicating capabilities derived from a billion years of evolutionary optimization might take much longer than replicating skills contingent on a hundred thousand years of optimization.
The rebuttal to Morevac’s Paradox: the capabilities AIs are getting first have nothing to do with their recency in the evolutionary record and everything to do with how much relevant training data exists. Language and coding arrived first not because evolution only recently optimized us for reasoning but because we’ve got the fucking Internet and Github. Unitree robots are really good at walking around, despite the fact that evolution has spent over a quarter billion years teaching us locomotion - and this has everything to do with it being easy to get more data relevant to walking around using simulation.
What will be different about multi-agent systems?
How much of a parallelization penalty will there be? Instead of one instance seeing and considering the whole context, you're breaking apart the problem to multiple workers.
What is the explanation for why the length of task an AI can do doubles over consistent intervals?
How will “Moore’s Law for AI agents” generalize to non-coding tasks like video editing, playing new video games, or coordinating logistics for a happy hour?
Would we get any intuition pumps about what superintelligence might look like by considering what it would mean to have a horizon length that is 10x as large as humans?
RL
Dario said in his recent blog post on export controls that labs are only spending on the order of $1M on RL - why? You’re spending hundreds of million on the base model. If RL training is such a big complement to pre-training, why not spend a similar amount of compute on it?
I keep hearing that the big bottleneck for RL is the amount of environments that we have built so far. I don't really understand what this means. What exactly does it take to build a new RL environment? Presumably building complex, realistic, hard to reward-hack challenges?
Is there a specific reason this is very hard (other than everything in this fallen world being harder than you might naively anticipate)?Also you need smooth reward landscapes that allows AI to be rewarded for incremental improvements rather than getting stuck at 0. Smarter AIs have better priors about “what’s a reasonable thing to do when stuck”, which allow them to learn even from environments with sparser rewards.
By when will RL be the dominant workload in training? By when will most RL be online?
How sample-efficient is RL fine-tuning?
For what kinds of skills is it especially effective? What skills are just hard to instill into the model, even if you have the appropriate data?
Even if agentic RL is sample-efficient, doesn’t it take way more compute per ‘sample’ than RLHF-type training? As horizon lengths increase, your rollout has to become longer. The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. And if this is correct, will the pace of AI progress slow down?
Will this incentivize using smaller pre-trained models to do RL training? Would that allow more entrants to compete for the next tier of capabilities?
How much extra capability does the longer chain-of-thought get you (as opposed to just doing RL in the first place)?
I don’t really understand why test-time compute scaling on things like the ARC-AGI benchmark keep giving marginal return (even if on a log scale). I get why thinking a bit longer might be helpful. But o3-high (which as of writing has the highest score) wrote 43 million words per task. What the hell did it figure out with its 42nd millionth word? Why do the benchmarks keep improving even up till that point?
RL potentially just upweights 10 tokens worth of MCTS-like scaffolding in a model's thinking (words like “wait”, “let’s backtrack”). This explains why reasoning models can be easily distilled - finding these basic techniques in thought space might take a while, but their payload size is trivial.
So first question: is this actually correct?
How far can you distill reasoning and chain of thought? Will the models 6 months from now be able to do by instinct the kinds of math and coding that current models need to do large amounts of inference scaling for?
Idiot savants
What is the answer to this question I asked Dario over a year ago? As a scientist yourself, what should we make of the fact that despite having basically every known fact about the world memorized, these models haven’t, as far as I know, made a single new discovery? Even a moderately intelligent person who has so much stuff memorized would make all kinds of new connections (connect fact x, and y, and the logical implication is new discovery z).
People have proposed all sorts of answers to this. For example, Scott Alexander wrote,
“Humans also aren't logically omniscient. My favorite example of this is etymology. Did you know that ‘vacation’ comes from literally vacating the cities? Or that a celebrity is a person who is celebrated? Or that ‘dream’ and ‘trauma’ come from the same root? These are all kind of obvious when you think about them, but I never noticed before reading etymology sites.I think you don't make these connections until you have both concepts in attention at the same time, and the combinatorial explosion there means you've got to go at the same slow rate as all previous progress.”I agree that humans lack some godlike logical omniscience about the combinatorial consequences of all their knowledge. But there's plenty of examples of humans finding these kinds of important connections between fields, despite their much more limited world knowledge (see for instance these examples). I don't think Scott's argument explains why we have many examples of humans doing this, but none with AIs. And it's actually really funny that Scott is making this argument, because one of the things I love about his blog is that he personally has found countless numbers of these intellectually fascinating connections between fields. Where are the LLMs that have done this?
Another argument I’ve seen: Eric Schmidt (no not that one) writes, “I think I just don’t agree with your premise that there’s such low hanging fruit out there in the internet text corpus that hasn’t already been grabbed by smart, widely-read humans.”
Given that the number of potential connections increases as O(N^2) with the amount of knowledge we have as species, and that the amount of knowledge is itself growing at least linearly, I find it implausible that humans have exhausted this combinatorial overhang.
There is a “one man's modus ponens is another man's modus tollens” thing going on here. One way to interpret my original question is from an AI-skeptical point of view: The fact that LLMs aren't more powerful than humans despite their in-principle advantages over us suggests that they're not true AGI. But there's another interpretation of the same question - an interpretation which supports a more FOOMy vibe: Given the in-principle advantages LLMs have over us (in this case, as a result of their immense knowledge, but there are many others), once they actually do become AGIs, they'll fucking dominate.
Related to the question of LLMs knowing so much shit: It seems like knowledge is really cheap to store. Wikitext is less than 5 MB. So why do we humans forget so much? Why do our brains conspire so hard against acquiring new facts (which in raw bits cost basically nothing) that we have to come up with incredibly hacky systems like spaced repetition to keep knowledge around?
One suggestive pattern, copy pasting from my review of Terrence Deacon’s book, The Symbolic Species: “Childhood amnesia (where you can’t remember early parts of your life) is the result of the learning process kids use, where they prune and infer a lot more, allowing them to see the forrest for the trees. On the opposite end of the spectrum are LLMs, which can remember entire passages of Wikipedia text verbatim but will flounder when you give them a new tic tac toe puzzle.There's something super interesting here where humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization. It’s really fascinating that this memorization-generalization spectrum exists.”
New training techniques
I’m confused why all the labs have ended up making models that are so similar. Everyone is making “thinking” models. Has everybody just been trying a bunch of different shit, but this is the only thing that works? Or are they just copying each other, but in fact there’s a bunch of equally promising tangential research directions that no one is pursuing?
Many people have pointed out that there’s some missing middle between pre-training and in-context learning. Pre-training gives you some base of general understanding, something like a human skimming every textbook ever written; in-context learning is pure short term memory, discarded after every use. Are we likely to see a new training regime closer to dynamic evaluation, where you update your weights by meditating on feedback, or writing synthetic problems for practice?
Would the procedure for dynamically learning new skills have to be bespoke for each application (if you need it to be good at call center workflows, you need to make a bespoke conversation trajectory unrolling environment) or can you come up with some general procedure for upskilling?
One idea I’ve heard for building long horizon strategizing and coherency is to train AI systems of text based strategy games. Has somebody already tried this and it didn’t work? Or it hasn’t even been tried in the first place?
Pre-training
Is pre-training actually dead?
The compute required to train a GPT-4 level model has been declining in cost at the astonishing rate of 10x per year. I thought the whole point of these effective compute multipliers is that we could train GPT-5 at GPT- 4 cost. From the outside at least, it seems like these astonishing compute multipliers are only making existing capabilities cheaper to serve, not enabling the next generation of more powerful models to arrive much sooner. Rumors are that all the labs have been struggling to crack the next OOM of scaling. What’s going on? Data is running out? Maybe the engineering for much larger training runs gets exponentially harder? Or maybe so called algorithmic ‘compute multipliers’ don’t give an equivalent multiplicative boost at different levels of scale?
A couple months ago at ICML, Ilya compared the pre-training data corpus to fossil fuels - a limited resource which will rapidly be exhausted. This raises the question: what if this data corpus wasn’t limited? Would pre-training still be netting amazing new capabilities? Is next token prediction an innately fucked training objective, or did we just run out of data to let it cook?
Why are LLMs such mediocre writers despite having all the good writing in their training dataset?
Economics
Early deployment
Why aren't all the center workers getting laid off yet? It’s the first thing that should go. Should we take it as some signal that human jobs are just way harder to automate than you might naively think?
How will the difference between average S&P 500 returns and median S&P 500 company returns change over time?
In other words, will we live in the world where Nvidia, Microsoft, Meta, and Google become worth $10T, but everything else goes to 0, or whether broad deployment happens fast enough that McDonalds, JP Morgan Chase, etc. become much more productive at the same rate that AI becomes more powerful.
Will the models of 2026 and 2027 still best be thought of as time-shares of intelligence, or will they be integrated into companies and workflows such that individual copies are meaningfully distinct?
Honestly, I think time-shares of intelligence are still underrated. A new AI employee can just read every single doc in your company's Drive and every single line of code in your company's codebase within minutes. This means that scaling up your company or an AI application would be way less effortful than scaling up a human department.
What is the industrial-scale use case of AI? Between 1859 (when Drake first discovered oil in Pennsylvania), to 1908 (when Henry Ford invented the modern automobile), the main use for crude was as kerosene for lighting. What is the ultimate industrial-scale equivalent use case for AI?
How transformative would the AIs of today (March 2025) be even if AI progress stopped here?
I've personally become more bearish about the economic value of current systems after using them to build miniapps for my podcast: I can't give them feedback that improves their scope or performance over time, and they can’t deal with unanticipated messy details.
But then again, the first personal computers of the 1980s weren't especially useful either. They were mostly used by a few hobbyists. They had anemic memories and processing power, and there just wasn't a global network of applications to make them useful yet.
Another way to ask this question: imagine that you plopped down a steam engine in a hamlet from 1500. What would they do with it? Nothing! You need complementary technologies. There weren’t perfect steam engine shaped holes in these hamlets; similarly, there aren’t many LLM-shaped holes in today’s world
Coding and remote work
People like Dario say that in a year, 90%+ of code will be written by AIs. In some sense, compilers automated 90%+ of code writing. But I think compilers are a smaller deal than what these people imagine AI coders will be in a year. So what magnitude of actual productivity improvement to software engineering do they expect from AI? 2x? 10x? 100x?
If you get a 100x increase in software productivity, what kinds of things could we build in the world that we can’t build now?
Here’s why I think this is such an interesting question. The economic historian Robert Allen thinks cheap energy is a big part of the reason the Industrial Revolution first happened in Britain. The first steam engine (the Newcomen engine) was super inefficient. It worked not by pushing the piston directly with the steam but by relying on the steam condensing to pull the piston in. The only place where it made sense to even try this design was plentiful coal mines in England. This allowed Britain to get up the learning curve in mechanization, such that they could design devices which were viable with much less energy. The tl;dr here is that you need some initial hand-holding - some cold start - in order to get up the learning curve in a new technology. And because of cheap coal, Britain had it for the Industrial Revolution. Does effectively free software cause some kind of cold start to some future technological trend?
It seems like some AI labs (OpenAI, Meta) are racing towards being first to a billion user AI assistants, whereas others (Anthropic, maybe GDM?) are racing towards the fully autonomous software engineer. Who’s right? What are the marginal returns to more software (as compared to owning another Facebook- or Whatsapp-size social network)? If you believe in a software driven intelligence explosion, then that places a pretty high premium on the value of more software.
The internet and mobile revolutions were largely driven by digital advertising – a relatively small slice of overall global GDP. If AI truly can automate all labor, how soon before AI revenues absolutely gobble up all the social/search/ads revenue of big tech?
How much of the value of AI requires embodied robots (versus just digital LLMs)?
Open source
Does inference scaling mean that open-weight models don’t decentralize either the benefits or the risks of AI as much as you might naively think (more here)?
What does an intelligence explosion with open-weight models at the frontier look like?
Is there any plausible way in which this could happen? Wouldn’t it require publishing the weights every week/day/hour (depending on speed of intelligence explosion)?
If this happened, would it be good?
What are the implications of the “panspermia” timeline, where multiple countries develop AGI from the same initial seed (e.g., by stealing model weights or building on open weights models)?
Would this shared foundation matter more if future advances come from iterated amplification and distillation rather than fresh training runs?
Does this increase AI takeover risk, since all the superintelligences start from the same subconscious ‘id’?
Does it increase the global impact of ‘Western’ values which are embedded in those stolen weights?
Model training and value capture
How are returns to training bigger models scaling? If a lab like OpenAI or DeepMind doubles its training compute budget for its next flagship model compared to the last one, are they getting more than 2x the revenue from it?
It seems like the ratio of revenue to training cost is increasing over time, plausibly because so much of the value of new skills is deferred to when AIs are fully reliable and complemented with a wider range of capabilities
If progress in AI stalls, do foundation model companies inevitably get commoditized? Is there any moat other than staying 6+ months ahead (and in the extreme scenario beating everyone else to the intelligence explosion)? If model companies fail to differentiate, where does the value get captured?
One answer might be the hyperscalers who control the datacenter compute, and whose complement (the models) just got commodified. But datacenter compute itself doesn’t seem that differentiated (so much so that the hyperscalers seem to be able to easily contract it out to third parties like CoreWeave). So maybe the lion’s share of value goes to the people making the components that go into chip production: 1) wafer production (TSMC), 2) advanced packaging (TSMC’s CoWoS), and 3) high bandwidth memory (SK Hynix)
Many previous attempts to make AI applications based on scaffolds and wrappers have been gobbled up by better foundation models (which can just "scaffold" themselves). Will that keep being the case?
It's interesting to me that some of the best and most widely used applications of foundation models have come from the labs themselves (Deep Research, Claude Code, Notebook LM), even though it's not clear that you needed access to the weights in order to build them. Why is this? Maybe you do need access to the weights of frontier models, and the fine tuning APIs or open source models aren’t enough? Or maybe you gotta ‘feel the AGI’ as strongly as those inside the labs do?
How much does being at the bleeding edge of AI capabilities matter? Is there any point in competing in the model race if you have no plan to get to the top? Or is there a viable business strategy based on being 6 months behind but following fast?
Investment
Hyperscaler AI capex is getting pretty big - approaching $100B a year for some of the big dogs. If there is a lull in AI capabilities (e.g., inference scaling doesn't end up being that useful or easy to scale, and it takes a couple of more years to make another big breakthrough), what will happen? Will fidgety CFOs force a firesale of compute contracts?
On the other hand, if AI does end up being as economically remunerative as "AGI" implies, how fast could the hyperscalers ramp up their investment? What would it take for them to invest more than their annual free cash flow (high 10s of Bs for Microsoft, Google, Amazon)? And how would they raise this money?
Hardware
Will the desire to avoid the 70%+ Jensen tax inevitably drive more in-house ASIC development?
Will this drive greater diversity in model architectures and training techniques?
How well will distributed training work?
How easily can datacenters that have been built up assuming a pretraining focus be repurposed towards RL?
Will future training just look a lot like inference? There's not much difference between inference workloads and RL, since an agent has to go away for a while, try to solve a problem, and come back with the results.
Can RL rollouts just be totally decentralized?
How much do hardware tariffs, energy costs, and geopolitical considerations influence where we build the next generation of AI data centers?
If significant tariffs were imposed on datacenter components, would that shift planned buildouts towards Europe or Asia?
How much does network latency and proximity to customers matter when planning multi-billion dollar, decade-long infrastructure investments?
How can we build hardware mechanisms that would help you prove to others what you’re using your compute for (and not using it for)?
A lot of the stories about how the intelligence explosion goes well involve different actors instituting a mutual slow down where they refocus on alignment + greater monitoring. But such agreements require verifiability.
Post-AGI
Hive minds
Will central planning actually work with AI?
The case for:
The central AI can have much higher bandwidth communication with the periphery. Today, sensing and deciding have to both happen at the front line. In the future, robot appendages of Big Brother could actually just run the whole military and economy.
The central planner can just directly learn from the experience of other AIs (think of a much more advanced version of Tesla FSD model learning from millions of driving samples).
Compute/intelligence can be much more centralized than it is today. Right now, Xi Jinping has the same 10^15 FLOPs as anyone else. Not so for the mega-inference-scaled dictators of the future.
In order to align incentives for humans, we have to use markets. But we can much more easily control the preferences of AIs.
The case against:
This vision assumes that only the central government is getting more complex and capable, while the rest of society stays similar. But AI deployment will lead to the whole economy becoming more complex. Apple circa 2025 might be able to centrally plan the economy of Babylon. But it wouldn’t be able to plan America circa 2025.
What is my blog post about fully automated firms missing?
How soon after AGI do you get hive minds, fully automated firms, and other crazy powerful and unique products of AIs’ unique advantages in cultural learning/coordination (namely, that AIs can copy, merge, distill, and scale themselves)?
An analogy is that the first AGIs will be like humans 200,000 years ago: yes, they have many key advantages, but the things that make us so dominant today - state capacity, joint stock corporations, fossil-fueled civilization - took eons of cultural evolution, population growth, technological upgrading.
Today, the world economy runs on trade; complex, mutually-dependent supply chains; specialization; and decentralized knowledge. None of this looks like a singleton. Will all this fundamentally change once we have AGI? How about ASI?
Or maybe a global system of interconnected markets, delegated decision-making, and mutual interdependence was the singleton all along?
Software only singularity
How does the probability of an intelligence explosion change based on when we achieve AGI?
AGI timelines tell you a lot about the nature of intelligence itself. If it takes decades to build AGI (rather than 3 more years of koding-koding-koding), then you’ve learned that transfer learning isn’t that powerful. You’ve learned that current systems aren’t simply “hobbled” little AGIs - rather, they will seem in retrospect more like AlphaZero - an amazing early demonstration of the possibility of AGI, but not the thing itself.
Could an AI competent at research engineering accelerate AI progress such that we make a BERT to GPT-4.5 size jump in just one year?
Case against: my podcast with Ege & Tamay.
AI labs apparently run multiple pre-training teams, each with varying allocations of compute and number of researchers. Observing how progress differs across these configurations would yield valuable insights into the tradeoff between cognitive effort and compute scaling in AI progress—though labs are unlikely to share this data.
Suppose there's a software-only singularity. Would the gap between AI labs narrow or widen during this takeoff?
There's a clear reason to expect the gap to widen during an intelligence explosion: whichever lab is slightly ahead benefits from having smarter, more capable automated researchers, rapidly compounding its advantage.
Yet historically, we've seen the gap consistently narrow. In 2018, DeepMind was far far ahead of everyone else. Today, labs like DeepSeek can close the distance to within half a year of OpenAI.
However, this pattern might not persist. The two primary methods for labs catching up—poaching experienced talent and learning from leaks or public deployments—would lose relevance in a scenario where the best models remain internal, autonomously driving progress.
Still, if the intelligence explosion accelerates by simultaneously scaling compute resources, the leading lab might remain incentivized to deploy publicly, attract investment, and thereby continue indirectly facilitating catch-up.
Currently, model training (and inference) costs for the same level of capability are declining 10x a year. If you extrapolate this trend, then it suggests that you can train GPT-4 level capabilities using 100 H100s by 2027. Does this imply that while eventually we'll be able to distill human-level intelligence into mosquito drones, the first AGIs will be a system that costs $100B and requires Montana-sized infrastructure to train, and uses inference scaling hacks that cost $10 per token (like some Yoda, slowly meditating on each syllable)?
Transformative AI
Some people imagine that once we achieve ASI, it rapidly invents crazy superweapons, new computing paradigms, fusion-powered space probes, etc. But this doesn’t seem to map on to how past new inventions and scientific discoveries were made. It seems like a general upgrading of your society's tech stack is more important than raw cognitive effort in a specific sector: you couldn't have discovered the structure of DNA without X-ray crystallography; and we didn’t figure out there was a Big Bang until we saw cosmic background microwave thanks to the radio astronomy techniques that were initially developed for communications during World War II. Does this suggest that we're not just going to have this technological explosion in the middle of the desert? In order to truly advance the state of technology, you basically need to upgrade the whole economy. And this points to a broad deployment before an R&D explosion.
To be clear, transformative AI might still happen super fast! The AIs are thinking very quickly, there’s billions of them, they’re much better at learning-by-doing, etc. But it’s simply a question of whether or not we’ll just straight shot nano-tech or dyson spheres and skip over all the boring everyday improvements in seemingly unrelated fields.
Explosive economic growth
Will explosive growth happen?
The basic argument for explosive growth: Total economic output = Total Factor Productivity (TFP) × Labor × Capital. Today, economic growth has a weak feedback loop: output can accumulate into more capital, but not into more (human) labor. With AI, however, capital (compute) also functions as labor, creating a stronger feedback loop. Automated economic output = TFP (rapidly rising due to automated research and accelerated learning-by-doing) × Capital (previous output).
If you’re interested in other considerations, check out my interview with Tyler for the con case, and the one with Ege & Tamay for the pro case.
Will explosive growth basically look like what has happened to China post Deng (double digit growth rates instigated by abundant skilled labor)?
Will economic growth be extensive or intensive? Or some secret third thing?
If the former, then it’s possible that the effects of explosive economic growth will not be felt that widely: your cityscape will not look that different; your life might not even change that much. But out in the desert, somewhere far from your view, they're producing solar farms and robot factories which are worth 200% of the existing economy.
Alignment
Reward hacking
“LLMs were aligned by default. Agents trained with reinforcement learning reward hack by default” (tweet here): is this actually the correct framing?
Base LLMs were also misaligned by default. People had to figure out good post-training (partly using RL) to solve this. There's obviously no reward hacking in pretraining, but it’s not clear that pretraining vs RL have such different 'alignment by default'.
Are there any robust solutions to reward hacking? Or is reward hacking such an attractive basin in training that if any exploit exists in the environment, models will train to hack it?
Can we solve reward hacking by training agents in many different kinds of unique environments? In order to succeed, they’d have to develop robust general skills that don't just involve finding the exploits in any one particular environment.
Are capabilities and alignment the same thing here? Does making models more useful require solving reward hacking?
If this is the case, we might be living in the alignment-by-default world? It would be weird if we solve reward hacking well enough to make these models reliable general agents in every scenario except those involved in taking over the world.
Takeover
What’s the right way to think about potentially misaligned AIs deployed through the economy? Are they like disgruntled employees who dislike the CEO (bullish for humanity), or more like Cortés landing in the New World (bearish)?
An optimistic framing: Think of AI as just another set of workers. CEOs typically aren’t the smartest people in their companies, nor do they grasp all technical nuances of how their instructions are executed by engineers or researchers. The same applies at the national level—Xi Jinping doesn't lose sleep over whether all 100 million CCP members genuinely love him, yet he remains in power. Plenty of employees actively dislike their CEOs but still inadvertently contribute positively to the company's objectives. Why should we assume misaligned AIs would behave differently?
A pessimistic framing: history is littered with examples of violent takeovers enabled by moderate technological advantages. Geneticist David Reich, on my podcast, described human history as repetitive waves of violent expansion, where a technologically or organizationally superior group wipes out most people across entire continents. Consider Hernán Cortés, who conquered an empire of over 10 million people in just two years with fewer than 1000 soldiers (plus horses, steel, and Smallpox). Or the British East India Company, which took over a subcontinent of 150 million people with just a few thousand officers, aided by marginally superior artillery tactics and logistical innovations derived from European warfare—not some profound, galaxy-brain advantage. Perhaps AIs similarly wouldn’t need an overwhelming technological edge to decisively overpower humanity.
If the AIs ‘takeover’, will we be able to see it coming? How wild and sci-fi will it be?
Will it be like a gazelle being hunted by hunter-gatherers where it can clearly understand the strategic situation as it tries to run away from the pointy sticks?
Or will it be more like a deer peacefully grazing, unaware that it’s being watched through a scope 100 yards away, and a moment later… boom.
Model spec
What should the model spec say?
Alignment stories often assume we can get an AI to deeply internalize a foundational document that spells out its core values, directives, and ultimate authorities—something like, "Follow user instructions unless they're clearly dangerous; in disputes, defer to the judgment of Sam Altman or President Trump."
Think about how much the United States today is contingent on the specific quill strokes of Madison - what exactly does this comma mean, what did he mean by ‘general welfare’ and ‘interstate commerce’?
How do we get political opposition parties, allied governments, and other people in the world to fully grasp the enormous (temporary) leverage they currently have to influence this document?
If China develops AGI first, how do we make sure that their model spec doesn't just make Xi Jinping the god dictator forever? Do private Chinese companies and high-level members of the CCP have enough influence and understanding of the strategic situation to prevent such an outcome?
Misuse
Is there an inherent tradeoff between preventing misuse and reducing the risk of a coup (either by AIs themselves or by humans using AI)? Or is this a false dichotomy?
Preventing misuse looks a lot like making sure the most advanced capabilities aren't widely deployed because of the inherent dual-use nature of understanding reality. Preventing takeover means making sure that we’re maximally open and transparent about frontier systems, and deploy them as widely as possible in order to prevent any one faction from monopolizing their benefits.
Jailbreaking is becoming harder as models become smarter. This is pretty intuitive: a smart but ethical human assistant could tell if you’re asking him to help weaponize Smallpox or simply practicing for an organic chemistry exam. So as AIs get smarter, they should be able to tell the difference as well. Are we set here if we just put “don’t make bioweapons” in the model spec?
It seems like the whole misuse story is especially anchored on bioterrorism. The intuition pump is to imagine every person with a team of virology PhDs in her pocket. What actually would happen in this scenario?
Other
Geopolitics
I’m concerned about nationalization of AGI development for the reasons listed below. Where am I wrong?
Reduces the salience of safety and alignment, in favor of whatever the administration + deep state at the time cares about.
Decreases the competence of the group overseeing the intelligence explosion, which is robustly bad, especially if you think alignment is difficult and subtle.
Increases the likelihood of dictatorship, with the President ending up as the ultimate decision maker in the spec or something.
Instigates an obvious arms race framing in China.
Increases the likelihood that the first AGIs are used to develop weapons, drones, and bioweapons instead of what’s needed for our glorious transhumanist future.
How will the Chinese political system react to fully automated remote work, superhuman hackers, automated AI researchers etc.?
How deep are their private markets? How willing are the big public funds to finance the next rung of scaling?
What might instigate a Chinese Manhattan Project for AGI?
Suppose tomorrow Xi wanted to prioritize building AGI. At an institutional level, what concretely could he do?
How does Chinese industrial espionage in other industries work? Suppose there's a hardware company that wants to learn how Apple does some procedure. Is there some government department they file their well-scoped snooping request with?
Developing AI (and successfully deploying it at scale) is a huge industrial project. China is really good at industrial scale-up and state-directed investment. Does this give them a huge advantage in deploying AI?
If you’re the leader of a nation like India or Nigeria today, and you get AGI-pilled, what should you do?
Honestly, it feels like a really tough position. The odds that you develop a frontier AI lab are pretty low. And you don’t have some crucial input into the semiconductor supply chain that will give you any leverage during crunch time.
If you have energy and property rights and friendly relations with the US and/or China, maybe you can become a datacenter hub?
Does superintelligence actually give you ‘decisive strategic advantage’? Examples like Cortez or the East India Trading Company seem to imply that slight advantages in technology might be totally overwhelming in a military conflict. But perhaps modern weapons (nukes especially) are so destructive already that they provide sufficient deterrence against adversaries with smarter AIs launching their mosquito drone swarms and superhuman cyber attacks.
Epistemics
How wrong should we expect our conceptual handles around AGI to be?
I've been reading The House of Government recently. It's a fascinating account of people involved in the Russian Revolution. There were many different factions of people who were disillusioned with the Czarist regime - the anarchists, the Mensheviks, Bolsheviks, the social revolutionaries, the Decembrists. They intensely debated the dichotomies which were most salient to them given their milieu.
The “decisive battle” … covered all the usual points of disagreement: the “working class” versus “the people”; the “sober calculation” versus “great deeds and self-sacrifice”; “objectivism” versus “subjectivism”; and “universal laws of development” versus “Russia’s uniqueness.”
Yet none of them anticipated the considerations we now recognize to be far more relevant to economic development: dispersed knowledge, voluntary exchange, and entrepreneurial innovation.
I think about this whenever my Bay Area friends debate AGI - will there be a software-only singularity, adversarial misalignment, training gaming, explosive growth, etc, etc? Maybe the frameworks we're using and the questions we're asking are fundamentally misguided.
Given this topic is so epistemically murky that someone smart can come up with a new consideration that alters your key conclusions, how much should you update on the most recent compelling story you’ve heard?
Historically, the way we’ve dealt well with rapidly-evolving, uncertain processes is classical liberalism. Pushing for super tailored proposals based on reasoning your way to a specific trajectory (“We’re <5 years to ASI, therefore ‘The Project’”) has a pretty bad track record.
If you enjoyed this blog post, you may enjoy my new book, The Scaling Era: An Oral History of AI, 2019-2025. This book curates and organizes the highlights across my podcast episodes about AI with scientists, CEOs, economists, and philosophers.
Thanks to Max Farrens for comments and editing. Thanks especially to Carl Shulman, but also Leopold Aschenbrenner, Tamay Besiroglu, Ege Erdil, Daniel Kokotjlo, Scott Alexander, Sholto Douglas, Adam D’Angelo, Paul Christiano, Craig Falls, Gwern Branwen, Dan Hendrycks, Andrej Karpathy, Toby Ord and many others for conversations which helped inspire questions.
I'm not sure this part is true: "But datacenter compute itself doesn’t seem that differentiated (so much so that the hyperscalers seem to be able to easily contract it out to third parties like CoreWeave)".
Aren't Google's TPUs and Groq's LPUs a source of differentiation?
Excellent list of questions. One underrated fact is that we know how to deal with the mistakes humans make, our entire society is built around it. But we don't know how to deal with the mistakes LLMs make, and will need to build structures around it for it to "take over".
To me that's an incredibly important part of the conversation, and a lot of unknown unknowns that you ask about lie at the other side of it.