AI 2027: month-by-month model of intelligence explosion — Scott Alexander & Daniel Kokotajlo

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

AI 2027: month-by-month model of intelligence explosion — Scott Alexander & Daniel Kokotajlo

Misaligned hive minds, Xi and Trump waking up, and automated Ilyas accelerating AI progress

Dwarkesh Patel

Apr 03, 2025

Scott and Daniel break down every month from now until the 2027 intelligence explosion.

Scott Alexander is author of the highly influential blogs Slate Star Codex and Astral Codex Ten. Daniel Kokotajlo resigned from OpenAI in 2024, rejecting a non-disparagement clause and risking millions in equity to speak out about AI safety.

I came in skeptical, but I learned a tremendous amount by bouncing my objections off of them. I highly recommend checking out their new scenario planning document, AI 2027.

Watch on Youtube; listen on Apple Podcasts or Spotify.

Timestamps

(00:00:00) - AI 2027

(00:06:56) - Forecasting 2025 and 2026

(00:14:41) - Why LLMs aren't making discoveries

(00:24:33) - Debating intelligence explosion

(00:49:45) - Can superintelligence actually transform science?

(01:16:54) - Cultural evolution vs superintelligence

(01:24:05) - Mid-2027 branch point

(01:32:30) - Race with China

(01:44:47) - Nationalization vs private anarchy

(02:03:22) - Misalignment

(02:14:52) - UBI, AI advisors, & human future

(02:23:00) - Factory farming for digital minds

(02:26:52) - Daniel leaving OpenAI

(02:35:15) - Scott's blogging advice

Transcript

AI 2027

Dwarkesh Patel

Today I have the great pleasure of chatting with Scott Alexander and Daniel Kokotajlo. Scott is of course the author of the blog Slate Star Codex, Astral Codex 10 now. It’s actually been, as you know, a big bucket list item of mine to get you on the podcast. So this is all the first podcast we’ve ever done, right?

Scott Alexander

Yes.

Dwarkesh Patel

And then Daniel is the director of the AI Futures Project. And you have both just launched today something called AI 2027. So what is this?

Scott Alexander

Yeah, AI 2027 is our scenario trying to forecast the next few years of AI progress. We’re trying to do two things here. First of all we just want to have a concrete scenario at all. So you have all these people, Sam Altman, Dario Amodei, Elon Musk saying, “going to have AGI in three years, superintelligence in five years”. And people just think that’s crazy because right now we have chatbots that are able to do a Google search, not much more than that in a lot of ways. And so people ask, “how is it going to be AGI in three years?” What we wanted to do is provide a story, provide the transitional fossils. So start right now, go up to 2027 when there’s AGI, 2028, when there’s potentially super intelligence, show on a month-by-month level what happened. Kind of in fiction writing terms, make it feel earned.

So that’s the easy part. The hard part is we also want to be right. So we’re trying to forecast how things are going to go, what speed they’re going to go at. We know that in general, the median outcome for a forecast like this is being totally humiliated when everything goes completely differently. And if you read our scenario, you’re definitely not going to expect us to be the exception to that trend.

The thing that gives me optimism is Daniel back in 2021, wrote the prequel to this scenario called What 2026 Looks Like. It’s his forecast for the next five years of AI progress. And he got it almost exactly right. You should stop this podcast right now. You should go and read this document. It’s amazing. Kind of looks like you asked ChatGPT to summarize the past five years of AI progress, and you got something with a couple of hallucinations, but basically well intentioned and correct. So when Daniel said he was doing this sequel, I was very excited, really wanted to see where it was going. It goes to some pretty crazy places and I’m excited to talk about it more today.

Daniel Kokotajlo

I think you’re hyping up a little bit too much. Yes, I do recommend people go read the old thing I did, which was a blog post. I think it got a bunch of stuff right, a bunch of stuff wrong, but overall held up pretty well and inspired me to try again and do a better version of it.

Scott Alexander

I think, read the document and decide which of us is right.

Daniel Kokotajlo

Another related thing too is that the original thing was not supposed to end in 2026, it was supposed to go all the way through the exciting stuff, right? Because everyone’s talking about, what about AGI, what about superintelligence, what would that even look like? So I was trying to step-by-step work my way from where we were at the time until things happen and then see what they look like, but I basically chickened out when I got to 2027 because things were starting to happen and the automation loop was starting to take off and it was just so confusing and there was so much uncertainty, so I basically just deleted the last chapter and published what I had up until that point. And that was the blog post.

Dwarkesh Patel

Okay, and then, Scott, how did you get involved in this project?

Scott Alexander

So I was asked to help with the writing, and I was already somewhat familiar with the people on the project, and many of them were kind of my heroes. So, Daniel, I knew both because I’d written a blog post about his opinions before I knew about his, “What 2026 looks like,” which was amazing. And also he had pretty recently made the national news for having, when he quit OpenAI, they told him he had to sign a non-disparagement agreement or they would claw back his stock options. And he refused, which they weren’t prepared for. It started a major news story, a scandal that ended up with OpenAI agreeing that they were no longer going to subject employees to that restriction.

So people talk a lot about how it’s hard to trust anyone in AI because they all have so much money invested in the hype and getting their stock options better. And Daniel had attempted to sacrifice millions of dollars in order to say what he believed, which to me was this incredibly strong sign of honesty and competence. And I was like, how can I say no to this person? Everyone else on the team, also extremely impressive. Eli Liflund, who’s a member of Samotsvety, the world’s top forecasting team. He has won, like, the top forecasting competition, plausibly described as just the best forecaster in the world, at least by these really technical measures that people use in the superforecasting community. Thomas Larsen, Jonas Vollmer, both really amazing people who have done great work in AI before.

I was really excited to get to work with this superstar team. I have always wanted to get more involved in the actual attempt to make AI go well. Right now, I just write about it. I think writing about it is important, but I don’t know. You always regret that you’re not the person who’s the technical alignment genius who’s able to solve everything. And getting to work with people like these and potentially make a difference just seemed like a great opportunity.

What I didn’t realize was that I also learned a huge amount. I try to read most of what’s going on in the world of AI, but it’s this very low bandwidth thing and getting to talk to somebody who’s thought about it as much as anyone in the world was just amazing. Makes me really understand these things about how AI is going to learn quickly. You need all of this deep engagement with the underlying territory and I feel like I got that.

Dwarkesh Patel

I’ve probably changed my mind towards, against, towards, against, intelligence explosion three, four times in the conversations I’ve had in the lead-up in talking to you and then trying to come up with a rebuttal or something.

Scott Alexander

It wasn’t even just changing my mind, getting to read the scenario for the first time. It obviously wasn’t written up at this point. It was a giant, giant spreadsheet. I’ve been thinking about this for a decade, decade and a half now. And it just made it so much more concrete to have a specific story. Like, oh, yeah, that’s why we’re so worried about the arms race with China. Obviously we would get an arms race with China in that situation. And aside from just the people getting to read the scenario really sold me. This is something that needs to get out there more.

Forecasting 2025 and 2026

Dwarkesh Patel

Yeah. Okay. Now let’s talk about this new forecast. Because you do a month by month analysis of what’s going to happen from here. So what is it that you expect in mid-2025 and the end of 2025 In this forecast?

Scott Alexander

So, [the] beginning of the forecast mostly focuses on agents. We think they’re going to start with agency training, expand the time horizons, get coding going well. Our theory is that they are, to some degree consciously, to some degree accidentally, working towards this intelligence explosion, where the AIs themselves can start taking over some of the AI research, move faster.

So 2025, slightly better coding, 2026, slightly better agents, slightly better coding. And then we focus on, and we name the scenario after 2027 because that is when this starts to pay off. The intelligence explosion gets into full swing; the agents become good enough to help with- at the beginning not really do, but help with- some of the AI research.

So we introduced this idea called the R&D progress multiplier: how many months of progress without the AIs do you get in one month of progress with all of these new AIs helping with the intelligence explosion. So 2027, we start with- I can’t remember if it literally starts with, or by March or something- a five times multiplier for algorithmic progress.

Daniel Kokotajlo

So we have the stats tracked on the site of the story. Part of why we did it as a website is so that you can have these cool gadgets and widgets. And so as you read the story, the stats on the side automatically update. And so one of those stats is the progress multiplier. Another answer to the same question you asked is basically; 2025, nothing super interesting happens, more or less similar trends to what we’re seeing.

Dwarkesh Patel

Computer use is totally solved? Partially solved? How good is computer use by the end of 2025?

Daniel Kokotajlo

My guess is that they won’t be making basic mouse click errors by the end of 2025, like they sometimes currently do. If you watch Claude Plays Pokemon- which you totally should- it seems like sometimes it’s just failing to parse what’s on the screen and it thinks that its own player character is an NPC and gets confused. My guess is that that sort of thing will mostly be gone by the end of this year, but that they still won’t be able to autonomously operate for long periods on their own.

Dwarkesh Patel

But by 2025, when you say it won’t be able to act coherently for long periods of time in computer use, if I want to organize a happy hour in my office, I don’t know, that’s like what a 30 minute task? What fraction of that is, it’s got to invite the right people, it’s got to book the right doordash or something. What fraction of that is it able to do?

Daniel Kokotajlo

My guess is that by the end of this year there’ll be something that can kind of do that, but unreliably. And that if you actually tried to use that to run your life, it would make some hilarious mistakes that would appear on Twitter and go viral, but that the MVP of it will probably exist by this year. Like there’ll be some Twitter thread about someone being like, “I plugged in this agent to like run my party and it worked!”

Scott Alexander

Our scenario focuses on coding in particular because we think coding is what starts the intelligence explosion. So we are less interested in questions of like, “how do you mop up the last few things that are uniquely human” compared to “when can you start coding in a way that helps the human AI researchers speed up their AI research, and then, if you’ve helped them speed up the AI research enough, is that enough to, with some ridiculous speed multiplier- 10 times, 100 times- mop up all of these other things?”

Dwarkesh Patel

One observation I have is, you could have told a story in 2021, once ChatGPT comes out… I think I had friends who were credible AI thinkers who were like, “look, you’ve got the coding agent now, it’s been cracked. Now the GPT4 will go around and it’ll do all this engineering and we do this RL on top. We can totally scale up the system 100x” and every single layer of this has been much harder than the strongest optimist expected. It seems like there have been significant difficulties in increasing the pre-training size, at least from rumors about field training runs or underwhelming training runs at labs.

It seems like building up these RL- total outside view, I know nothing about the actual engineering involved here- but just from an outside view it seems like building up the O1 RL clearly took at least two years after GPT4 was released. And these things are also, their economic impact and the kinds of things you would immediately expect based on benchmarks for them to be especially capable at isn’t overwhelming, like the call center workers haven’t been fired yet. So why not just say look, at higher scale it will probably get even more difficult.

Scott Alexander

Wait a second, I’m a little confused to hear you say that, because when I have seen people predicting AI milestones like Katja Grace’s expert surveys, they have almost always been too pessimistic from a point of view of how fast AI will advance. So I think the 2022 survey, they actually said that things that had already happened would take like 10 years to happen, but then the survey- it might have been 2023, it was like six months before GPT3, GPT4, came out. And there were things that GPT3 or 4 or whichever one of them it was, did, that it did in six months that they were still predicting like five or ten years from. I’m sure Daniel is going to have a more detailed answer, but I absolutely reject the premise that everybody has always been too optimistic.

Daniel Kokotajlo

Yeah, I think in general, most people following the field have underestimated the pace of AI progress and underestimated the pace of AI diffusion into the world. For example, Robin Hanson famously made a bet about less than a billion dollars of revenue I think by 2025 from AI.

Dwarkesh Patel

I agree Robin Hanson in particular has been too pessimistic.

Daniel Kokotajlo

But he’s a smart guy. So I think that the aggregate opinion has been underestimating the pace of both technical progress and deployment. I agree that there have been plenty of people who’ve been more bullish than me and have been already proven wrong, but they’re not me.

Scott Alexander

Wait a second. We don’t have to guess about aggregate opinion, we can look at Metaculus. Metaculus, I think their timeline was like 2050 back in 2020. It gradually went down to like 2040 two or three years ago. Now it’s at 2030, so it’s barely ahead of us. Again, that may turn out to be wrong, but it does look like the Metaculans overall have, have been too pessimistic, thinking too long term rather than too optimistic. And I think that’s like the closest thing we have to a neutral aggregator where we’re not cherry picking things.

Why LLMs aren’t making discoveries

Dwarkesh Patel

Yeah. I had this interesting experience yesterday. We were having lunch with this senior AI researcher, probably makes on the order of millions a month or something, and we were asking him, “how much are the AIs helping you?” And he said, “in domains which I understand well, and it’s closer to autocomplete but more intense, there it’s maybe saving me four to eight hours a week.”

But then he says, “in domains which I’m less familiar with, if I need to go wrangle up some hardware library or make some modification to the kernel or whatever, where I know less, that saves me on the order of 24 hours a week.” Now, with current models. What I found really surprising is that the help is bigger where it’s less like autocomplete and more like a novel contribution. It’s like a more significant productivity improvement there.

Daniel Kokotajlo

Yeah, that is interesting. I imagine what’s going on there is that a lot of the process when you’re unfamiliar with a domain is like Googling around and learning more about the domain. And language models are excellent because they’ve already read the whole Internet and know all the details.

Dwarkesh Patel

Isn’t this a good opportunity to discuss a certain question I asked Dario that you responded to?

Scott Alexander

What are you thinking of?

Dwarkesh Patel

Well, I asked this question where, as you say, they know all this stuff. I don’t know if you saw this. I asked this question where I said, look, these models know all this stuff. And if a human knew every single thing a human has ever written down on the Internet, they’d be able to make all these interesting connections between different ideas and maybe even find medical cures or scientific discoveries as a result.

There was some guy who noticed that magnesium deficiency causes something in the brain that is similar to what happens when you get a migraine. And so he just said: give you magnesium supplements that cured a lot of migraines. So why aren’t able to leverage this enormous asymmetric advantage they have to make a single new discovery like this?

Scott Alexander

And then the example I gave was that humans also can’t do this. So for me, the most salient example is the etymology of words. You have all of these words in English that are very similar, like ‘happy’ versus ‘hapless’, ‘happen’, ‘perhaps’. And we never think about them unless you read an etymology dictionary and they’re like, oh, obviously these all come from some old root that has to mean ‘luck’ or ‘occurrence’ or something like that.

So it’s kind of about figuring out versus checking. If I tell you those, you’re like, “this seems plausible”. And of course, in etymology, there are also a lot of false friends where they seem plausible but aren’t connected. But you really do have to have somebody shove it in your face before you start thinking about it and make all of those connections.

Dwarkesh Patel

I will actually disagree with this. We know that humans can do like, we have examples of humans doing this. I agree that we don’t have logical omniscience because there is a combinatorial explosion, but we are able to leverage our intelligence to… one of my favorite examples of this is David Anthony, the guy who wrote the Horse, the Wheel and Language.

He made this super impressive discovery before we had the genetic evidence for it, like a decade before, where he said, look, if I look at all these languages in India and Europe, they all share the same etymology. I mean literally the same etymology for words like ‘wheel’ and ‘cart’ and ‘horse’. And these are technologies that have only been around for the last 6,000 years, which must mean that there was some group that these groups are all, at least linguistically, descended from. And now we have genetic evidence for the Yamnaya, which we believe is this group. You have a blog where you do this. This is your job, Scott! So why shouldn’t we hold the fact that language models can’t do this more against them?

Scott Alexander

Yeah. So to me, it doesn’t seem like he is just sitting there being logically omniscient and getting the answer. It seems like he’s a genius, he’s thought about this for years, probably at some point, he heard a couple of Indian words and a couple of European words at the same time and they kind of connected and the light bulb came on. So this isn’t about having all the information in your memory so much as the normal process of discovery, which is kind of mysterious, but seems to come from having good heuristics and throwing them at things until you kind of get a lucky strike.

My guess is if we had really good AI agents and we applied them to this task, it would look something like a scaffold where it’s like, think of every combination of words that you know of, compare them. If they sound very similar, write it on this scratch pad here. If a lot of words of the same type show up on the scratch pad, that’s pretty strange, do some kind of thinking around it. And I just don’t think we’ve even tried that.

And I think right now if we tried it, we would run into the combinatorial explosion. We would need better heuristics. Humans have such good heuristics that probably most of the things that show up even in our conscious mind, rather than happening on the level of some kind of unconscious processing, are at least the kind of things that could be true. I think you could think of this as like a chess engine. You have some unbelievable number of possible next moves, you have some heuristics for picking out which of those are going to be the right ones. And then gradually you kind of have the chess engine think about it, go through it, come up with a better or worse move, then at some point you potentially become better than humans. I think if you were to force the AI to do this in a reasonable way, or you were to train the AI such that it itself could come up with the plan of going through this in some kind of heuristic-laden way, you could potentially equal humans.

Daniel Kokotajlo

I’ll add some more things to that. So I think there’s a long and sordid history of people looking at some limitation of the current LLMs and then making grand claims about how the whole paradigm is doomed because they’ll never overcome this limitation. And then a year or two later the new LLMs overcome that limitation.

And I would say that with respect to this thing of “why haven’t they made these interesting scientific discoveries by combining the knowledge they already have and noticing interesting connections?” I would say first of all, have we seriously tried to build scaffolding to make them do this? And I think the answer is mostly no.

Dwarkesh Patel

I think Google DeepMind tried this.

Daniel Kokotajlo

Maybe. Second thing, have you tried making the model bigger? They’ve made it a bit bigger over the last couple years and it hasn’t worked so far. Maybe if they make it even bigger still, it’ll notice more of these connections. And then third thing, and here’s I think the special one: Have you tried training the model to do the thing? The pre-training process doesn’t strongly incentivize this type of connection making.

In general I think it’s a helpful heuristic that I use to ask the question of: remind oneself, what was the AI trained to do? What was its training environment like? And if you’re wondering why hasn’t the AI done this, ask yourself, did the training environment train it to do this? And often the answer is no. And often I think that’s a good explanation for why the AI is not good at it is that it wasn’t trained to do it.

Dwarkesh Patel

I mean it seems like such an economically valuable…

Daniel Kokotajlo

But how would you set up the training environment? Wouldn’t it be really gnarly to try to set up an RL environment to train to make new scientific discoveries?

Dwarkesh Patel

Maybe that’s why you should have longer timelines. It’s a gnarly engineering problem.

Daniel Kokotajlo

Well in our scenario they don’t just leap from where we are now to solving this problem. They don’t. Instead they just iteratively improve the coding agents until they’ve basically got coding solved. But even still, their coding agents are not able to do some of this stuff. That’s what early 2020, like the first half of 2027 in our story is basically, they’ve got these awesome automated coders, but they still lack research taste and they still lack maybe organizational skills and stuff.

And so they need to overcome those remaining bottlenecks and gaps in order to completely automate the AI research cycle. But they’re able to overcome those gaps faster than they normally would because the coding agents are doing all the grunt work really fast for them.

Scott Alexander

Yeah, I think it might be useful to think of our timelines as being like 2070, 2100. It’s just that the last 50 to 70 years of that all happened during the year 2027 to 2028, because we are going through this intelligence explosion like I think if I asked you, could we solve this problem by the year 2100? You would say, oh, yeah, by 2100? Absolutely. And we’re just saying that the year 2100 might happen earlier than you expect because we have this research progress multiplier.

Dwarkesh Patel

And then let me just address that in a second. But just one final thought on this thread. To the extent that there’s like a modus ponens, modus tollens thing here, where one thing you could say is like, look: AIs- not just LLMs, but AIs- will have this fundamental asymmetric advantage where they know all this shit. And why aren’t they able to use their general intelligence to use this asymmetric advantage to some enormous capability overhang.

Now, you could infer that same statement by saying, okay, well, once they do have that general intelligence, they will be able to use their asymmetric advantage to make all these enormous gains that humans are in principle less capable of, right? So basically, if you do subscribe to this view that AIs could do all these things if only they had general intelligence, you got to be like, well, once we actually do get the AGI, it’s actually going to be a totally transformative because they will have all of human knowledge memorized and they can use that to make all these connections.

Daniel Kokotajlo

I’m glad you mentioned that our current scenario does not really take that into account very much. So that’s an example in which our scenario is possibly underestimating the rate of progress.

Dwarkesh Patel

You’re so conservative, Daniel.

Scott Alexander

This has been my experience working with the team, as I point out, five different things. “Are you sure you’re taking this into account? Are you sure you’re taking this into account?” And first of all, 99% of the time he says, “yes, we have a supplement on it”. But even when he doesn’t say that, he’s like, “yeah, that’s one reason it could go slower than that. Here are 10 reasons it could go faster”.

Daniel Kokotajlo

It’s trying to be sort of like our median guess. So there are a bunch of ways in which we could be underestimating, and there are a bunch of ways in which you could be overestimating. And we’re going to hopefully continue to think more about this afterwards and continue to iteratively refine our models and come up with better guesses and so forth.

Debating intelligence explosion

Dwarkesh Patel

So if I look back at AI progress in the past, if we were back in, say, 2017. Suppose we had these superhuman coders in 2017; the amount of progress we’ve made since then, so where we are currently in 2025, by when could we have had that instead?

Daniel Kokotajlo

Great question. We’d still have to stumble through all the discoveries that we’ve made since 2017. We still have to figure out that language models are a thing, we still have to figure out that you can fine tune them with RL.

So all those things would still have to happen. How much faster would they happen? Maybe 5x faster, because a lot of the small scale experiments that these people do in order to test out ideas really quickly before they do the big training runs would happen much faster because they’re just lickety-split being spit out. I’m not very confident in that 5x number, it could be lower, it could be higher, but that was roughly what we were guessing.

Our 5x, by the way, is for the algorithmic progress part, not for the overall thing. So in this hypothetical, according to me, basically things would be going 2.5x faster, where the algorithms would be advancing at 5x speed, but the compute is still stuck at the usual speed.

Dwarkesh Patel

That seems plausible to me. You have a 5x at some point, and then dot dot dot, you have 1000x AI progress within the matter of a year. Maybe that’s the part I’m like, wait, how did that happen exactly? So what’s the story there?

Daniel Kokotajlo

The way that we did our takeoff forecast, which we’ll get to in a second, was basically by breaking down how we think the intelligence explosion would go into a series of milestones. First you automate the coding, then you automate the whole research process, but in a very similar way to how humans do it with teams of agents that are about human level, then you get to superhuman level and so forth.

So we broke it down into these milestones, you know, the superhuman coder, superhuman AI researcher, and then super intelligent AI researcher. And the way we did our forecast was, for each of these milestones, we were like, what is it going to take to make an AI that achieves that milestone? And then once you do achieve that milestone, how much is your overall speedup? And then what’s it going to take to achieve the next milestone? Combine that with the overall speed up and that gets you your clock time distance until that happens and then, okay, now you’re at that milestone. What’s your overall speed up? Assuming that you have that milestone also, what’s the next one? How long does it take to get to the next one? So we sort of work through it bit by bit, and at each stage we’re just making our best guesses.

So quantitatively we were thinking something like 5x speedup to algorithmic progress from the superhuman coder, and then something like a 25x speedup to algorithmic progress from the superhuman AI researcher. Because at that point you’ve got the whole stack automated, which I think is substantially more useful than just automating the coding. And then I forget what we say for a super intelligent AI researcher, but off the top of my head it’s probably something in the hundreds or maybe like 1000x overall speed up.

Dwarkesh Patel

So maybe the big picture thing I have with the intelligence explosion is… we can go through the specific arguments about how much will the automated coder be able to do, and how much will the superhuman AI coder be able to do. But on priors, it’s just such a wild thing to expect.

And so, before we get into all the specific arguments, maybe you can just address this idea that, why not just start off with 0.01% chance this thing might happen? Then you need extremely, extremely strong evidence that it will before making that your modal view.

Scott Alexander

I think that it’s a question of what is your default option or what are you comparing it to. I think that naively people think like, well, every particular thing is potentially wrong. So let’s just have a default path where nothing ever happens. And I think that that has been the most consistently wrong prediction of all. Like, I think in order to have nothing ever happen, you actually need a lot to happen. Like you need suddenly AI progress that has been going at this constant rate for so long stops. Why does it stop?

Well, we don’t know. Whatever claim you’re making about that is something where you would expect there to be a lot of out of model error is where you would expect. Think like somebody must be making a pretty definite claim that you want to challenge. So I don’t think there’s a neutral position where you can just say, well, given that out of model error is really high and we don’t know anything, let’s just choose that. I think we are trying to take- I know this sounds crazy because if you read our document, all sorts of bizarre things happen. It’s probably the weirdest couple of years that have ever been. But we’re trying to take almost in some sense a conservative position where the trends don’t change, nobody does an insane thing, nothing that we have no evidence to think will happen happens. And the way that the AI intelligence explosion dynamics work are just so weird that in order to have nothing happen, you need to have a lot of crazy things happen.

Daniel Kokotajlo

One of my favorite meme images is this graph showing world GDP over time. You’ve probably seen it, it spikes up and then there’s a little thought bubble at the top of the spike in 2010 or something. And the thought bubble says, “my life is pretty normal, I have a good grasp of what’s weird versus standard and people thinking about different futures with digital minds and space travel are just engaging in silly speculation”.

The point of the graph is, actually there’s been amazing transformative changes in the course of history that would have seemed totally insane to people multiple times. We’ve gone through multiple such waves of those things.

Scott Alexander

Everything we’ve talked about has happened before. Algorithmic progress already doubles every year or so. So it’s not insane to think that algorithmic progress can contribute to these compute things. In terms of general speedup, we’re already at like a thousand times research speedup, multiplier compared to the Paleolithic or something. So from the point of view of anyone in most of history, we are going at a blindingly insane pace. And all that we’re saying here is that it’s not going to stop.

The same trend that has caused us to have a thousand times speed up multiplier relative to past eras and not even the Paleolithic, like what happened in the century between, I don’t know, 600 and 700 A.D. I'm sure there are things, I’m sure historians could point them out. Then you look at the century between 1900 and 2000 and it’s just completely qualitatively different.

Of course there are models of whether that stagnated recently or what’s going on here. We can talk about those, we can talk about why we expect the intelligence explosion to be an antidote to that kind of stagnation. But nothing we’re saying is that different from what has already happened.

Dwarkesh Patel

I mean, you are saying that these previous transitions have been smoother than the one you were anticipating.

Scott Alexander

We’re not sure about that, actually. So one of these models is just a hyperbola. Everything is along the same curve. Another model is that there are these things like the literal Cambrian explosion. If you want to take this very far back, go full Ray Kurzweil. The literal Cambrian explosion, the agricultural revolution, the industrial revolution, has phase changes.

When I look at the economic modeling of this, my impression is the economists think that we don’t have good enough data to be sure whether this is all one smooth process or whether it’s a series of phase changes. When it is one smooth process, the smooth process is often a hyperbola that shoots to infinity in weird ways. We don’t think it’s going to shoot to infinity. We think it’s going to hit bottlenecks again.

Dwarkesh Patel

You guys are the conservative crowd, you know?

Scott Alexander

We think it’s going to hit bottlenecks the same as all these previous processes. The last time this hit a bottleneck, if you take the hyperbola view, is in, like 1960, when humans stopped reproducing at the same rate they were reproducing before. We hit a population bottleneck, the usual population, two ideas, flywheel stopped working, and then we stagnated for a while.

If you can create a country of geniuses in a data center, as I think Dario Amodei put it, then you no longer have this population bottleneck, and you’re just expecting continuation of those pre-1960 trends. So I realize all of these historical hyperbolas are also kind of weird, also kind of theoretical, but I don’t think we’re saying anything that there isn’t models for which have previously seemed to work for long historical periods.

Daniel Kokotajlo

Another thing also is, I think people equivocate between slow and continuous, right? So if you look at our scenario, there’s this continuous trend that runs through the whole thing of this algorithmic progress multiplier. And we’re not having discrete jumps from like 0 to 5x to 25x. We have this continuous improvement. So I think continuous is not the crux. The crux is like, is it going to be this fast? You know, and we don’t know, maybe it’ll be slower, maybe it’ll be faster. But we have our arguments for why we think maybe this fast.

Dwarkesh Patel

Okay, now that we brought up the intelligence explosion, let’s discuss that, because I’m kind of skeptical. It doesn’t really seem to me that a notable bottleneck to AI progress, or the main bottleneck to AI progress, is the amount of researchers, engineers who are doing this kind of research. It seems more like compute or some other thing is a bottleneck. And the piece of evidence is that when I talk to my AI researcher friends at the labs, they say there’s maybe 20 to 30 people on the core pre-training team that’s discovering all these algorithmic breakthroughs.

If the headcount here was so valuable you would think that, for example, Google DeepMind would take not just all their smartest people, not just from DeepMind but all of Google and just put them on pre-training or RL or whatever the big bottleneck was. You’d think OpenAI would hire every single Harvard math PhD and in six months you’re all going to be trained up on how to do AI research. I know they’re increasing headcount, but they don’t seem to treat this as the kind of bottleneck that it would have to be for millions of them in parallel to be rapidly speeding up AI research.

There’s this quote that “one Napoleon is worth 40,000 soldiers” was commonly a thing that was said when he was fighting. But 10 Napoleons is not 400,000 soldiers. Right? So why think that these million AI researchers are netting you something that looks like an intelligence explosion?

Daniel Kokotajlo

So previously I talked about three stages of our takeoff model. First is you get the superhuman coder. Second is when you fully automated AI R&D, but it’s still at basically human level, it’s as good as your best humans. And then third is now you’re in super intelligence territory and it’s qualitatively better.

In our guesstimates of how much faster algorithmic progress would be going, the progress multiplier for the middle level, we basically do assume that you get massive diminishing returns to having more minds running in parallel. And so we totally buy all of that.

Scott Alexander

Yeah. And then I think the addition to that is the question, then, why do we have the intelligence explosion? And the answer is: combination of that speed up and the speed up in serial thought speed.

Daniel Kokotajlo

And also the research taste thing. Here are some important inputs to AI R&D progress today: research taste. So the quality of your best researchers, the people who are managing the whole process, their ability to learn from data and make more efficient use of the compute by running the right experiments instead of flailing around running a bunch of useless experiments. That’s research taste.

Then there’s the quantity of your researchers, which we just talked about. Then there’s the serial speed of your researchers, which currently is all the same because they’re all humans and so they all run at basically the same serial speed. And then finally there’s how much compute you have for experiments. So what we’re imagining is that basically serial speed starts to matter a bunch because you switch to AI researchers that have orders of magnitude more serial speed than humans. But it tops out; we think that over the course of our scenario, if you look at our sliding stats chart, it goes from 20x to 90x or something over the course of the scenario, which is important, but not huge.

And also we think that once you start getting 90x serial speed, you’re just bottlenecked on the other stuff and so additional improvements in serial speed basically don’t help that much. With respect to the quantity of course, yeah, we’re imagining you get hundreds of thousands of AI agents, a million AI agents, but that just means you’d be bottlenecked on the other stuff. You’ve got tons of parallel agents, that’s no longer your bottleneck. What do you get bottlenecked on? Taste and compute.

So by the time it’s mid-2027 in our story, when they’ve fully automated the AI research, there’s basically the two things that matter is; what’s the level of taste of your AIs, how good are they at learning from the experiments that you’re doing? And then how much compute do you have for running those experiments? And that’s the sort of core setup of our model. And when we get our 25x multiplier, it’s starting from those premises.

Dwarkesh Patel

Is there some intuition pump from history where there’s been some output and because of some really weird constraints, production of it has been rapidly skewed along one input, but not all the inputs that have been historically relevant and you still get breakneck progress.

Daniel Kokotajlo

Possibly the Industrial Revolution. I’m just extemporizing here, I hadn’t thought about this before, but as Scott’s famous post that was hugely influential to me a decade ago talks about, there’s been this decoupling of population growth from overall economic growth that happened with the Industrial Revolution. And so in some sense, maybe you could say that’s an example of previously these things grew in tandem. More population, more technology, more farms, more houses, et cetera. Your capital infrastructure and your human infrastructure was going up together, but then we got the industrial revolution and they started to come apart.

And now all the capital infrastructure was growing really fast compared to the human population size. I think I’m imagining something maybe similar happening with algorithmic progress. And again with population, population still matters a ton today. In some sense progress is bottlenecked on having larger populations and so forth. But it’s just that the population growth rate is just inherently kind of slow and the growth rate of capital is much faster. And so it just comes to be a bigger part of the story.

Dwarkesh Patel

Maybe the reason that this sounds less plausible to me than the 25x number implies is that when I think about concretely what that would look like, where you have these AIs and we know that there’s a gap in data efficiency between human brains and these AIs. And so somehow there’s a lot of them thinking and they think really hard and they figure out how to define a new architecture that is like the human brain or has the advantages of the human brain. And I guess they can still do experiments, but not that many.

Part of me just wonders, what if you just need an entirely different kind of data source that’s not like pre-training for that, but they have to go out in the real world to get that. Or maybe it needs to be an online learning policy where they need to be actively deployed in the world for them to learn in this way. And so you’re bottlenecked on how fast they can be getting real world data. I just think it’s hard…

Daniel Kokotajlo

So we are actually imagining online learning happening.

Dwarkesh Patel

Oh really?

Daniel Kokotajlo

Yeah. But not so much real world as in… the thing is that if you’re trying to train your AIs to do really good AI R&D, then the AI R&D is happening on your servers. And so you can have this loop of: you have all these AI agents autonomously doing AI R&D, doing all these experiments, et cetera, and then they’re like online learning to get better at doing AI R&D based on how those experiments go.

Dwarkesh Patel

But even in that scenario alone, I can imagine bottlenecks like, oh, you had a benchmark and it got reward hacked for what constitutes AI R&D because you obviously can’t have… maybe you would, but is it as good as a human brain? It’s just like such an ambiguous thing you’d have. Right now we have benchmarks that get reward hacked, right?

Daniel Kokotajlo

But then they autonomously build new benchmarks. I think what you’re saying is maybe this whole process just goes off the rails due to lack of contact with ground truth outside in the actual world, outside the data centers. Maybe? Again, part of my guess here is that a lot of the ground truth that you want to be in contact with is stuff that’s happening on the data centers, things like how fast are you improving on all these metrics, and you have these vague ideas for new architectures, but you’re struggling to get them working. How fast can you get them working?

And then separately, insofar as there is a bottleneck of talking to people outside and stuff, well they are still doing that. And once they’re fully autonomous, they can even do that much faster. You can have all the million copies connected to all these various real world research programs and stuff like that. So it’s not like they’re completely starved for outside stuff.

Dwarkesh Patel

What about the skepticism that, look, what you’re suggesting with this hyper efficient hive mind of AI researchers, no human bureaucracy has just out of the gate worked super efficiently, especially one where they don’t have experience working together. They haven’t been trained to work together, at least yet. And there hasn’t been this outer loop RL on like, “we ran a thousand concurrent experiments of different AI bureaucracies doing AI research and this is the one that actually worked best”.

And the analogy I’d use maybe is to humans in the Savannah 200,000 years ago. We know they have a bunch of advantages over the other animals already at this point, but the things that make us dominant today, joint stock corporations, state capacities like this fossil fueled civilization we have that took so much cultural evolution to figure out. You couldn’t just have figured it out in the savannahs like, “oh, if we had built these incentive systems and we issued dividends, then we could really collaborate here” or something.

Why not think that it will take a similar process of huge population growth, huge social experimentation, and upgrading of the technological base of the AI society before they can organize this hypermind collective, which will enable them to do what you imagine an intelligence explosion looks like?

Scott Alexander

Yeah, you’re comparing it kind of to two different things. One of them is literal genetic evolution in the African savannah, and the other is the cultural evolution that we’ve gone through since then. And I think there will be AI equivalents to both. So the literal genetic evolution is that our minds adapted to be more amenable to cooperation during that time.

So I think the companies will be very literally training the AIs to be more cooperative. I think there’s more opportunity for pliability there. Because humans were, of course, evolving under this genetic imperative that we want to pass on our own genetic information, not somebody else’s genetic information. You have things like kin selection that are kind of exceptions to that, but overall it’s the rule.

In animals that don’t have that, like eusocial insects, then you very quickly get, just through genetic evolution, without cultural evolution, extreme cooperation. And with eusocial insects, what’s going on is that they all have the same genetic code, they all have the same goals. And so the training process of evolution kind of yokes them to each other in these extremely powerful bureaucracies.

We do think that the AI will be closer to the eusocial insects in the sense that they all have the same goals, especially if these aren’t indexical goals, they’re goals like “have the research program succeed”. So that’s going to be changing the weights of each individual AI, I mean, before they’re individuated, but it’s going to be changing the weights of the AI class overall to be more amenable to cooperation.

And then, yes, you do have cultural evolution. Like you said, this takes hundreds of thousands of individuals. We do expect there will be these hundreds of thousands of individuals. It takes decades and decades. Again, we expect this research multiplier such that decades of progress happen within this one year, 2027 or 2028. So I think between the two of these, it is possible.

Daniel Kokotajlo

Maybe this is also where the serial speed actually does matter a lot. Because if they’re running at 50x human speed, then that means you can have a year of subjective time happen in a week of real time. And so these sorts of large scale cooperative dynamics of your moral maze, you have an institution, but then it becomes like a moral maze and it sort of collapses under its own weight and stuff like that. There actually is time for them to play that out multiple times and then train on it, tinker with the structure and like add it to the training process over the course of 2027.

Scott Alexander

Also, they do have the advantage of all the cultural technology that humans have evolved so far. This may not be perfectly suited to them, it’s more suited to humans. But imagine that you have to make a business out of you and your hundred closest friends who you agree with on everything. Maybe they’re literally your identical twin, they have never betrayed you, ever, and never will. I think this is just not that hard a problem.

Daniel Kokotajlo

Also, again, they are starting from a higher floor, they’re starting from human institutions. You can literally have a slack workspace for all the AI agents to communicate. And you can have a hierarchy with roles. They can borrow quite a lot from successful human institutions.

Dwarkesh Patel

I guess the bigger the organization, even if everybody is aligned- I think some of your responses addressed whether they will be aligned on goals. I mean, you did address the whole thing, but I would just point this out; that is not the part I’m skeptical of. I am more skeptical of just, even if you’re all aligned and want to work together, do you fundamentally understand how to run this huge organization. And you’re doing it in ways that no human has had to before. You’re getting copied incessantly, you’re running extremely fast, you know what I’m saying?

Daniel Kokotajlo

I think that’s totally reasonable.

Dwarkesh Patel

And so it’s a complicated thing. And I’m just not sure why you think we build this bureaucracy, or the AIs build this bureaucracy, within this matter of…

Daniel Kokotajlo

So we depict it happening over the course of six to eight months or something like that in 2027, would you say twice as long, five times as long, 10 times as long?

Dwarkesh Patel

Five years?

Daniel Kokotajlo

So five years, if they’re going at 50x serial speed, then five years is what? Like 250 years of serial time for the AIs, which to me feels like more than enough to really sort out this sort of stuff. You’ll have time for sort of like empires to rise and fall, so to speak, and all of that to be added to the training data and yeah. But I could see it taking longer than we depict. Maybe instead of six months, it’ll be like 18 months, you know, but also maybe it could be two months.

Scott Alexander

So when I think of the ways that they train AIs, I think in our scenario at this point there are two primary ways that they’re doing it. One of them is just continuing the next token prediction work. So these AIs will have access to all human knowledge, they will have read management books in some sense, they’re not starting blind. There is going to be something like: predict how Bill Gates would complete this next character or something like that.

And then there's reinforcement learning in virtual environments. So get a team of AIs to play some multiplayer game. I don’t think you would use one of the human ones because you would want something that was better suited for this task. But just running them through these environments again and again, training on the successes, training against the failures, kind of combining those two kinds of things.

To me it does not seem like the same kind of problem as inventing all human institutions from the Paleolithic onward. It just seems like applying those two things.

Can superintelligence actually transform science?

Dwarkesh Patel

The other notable thing about your model is, you got this superhuman thing at the end of it and then it seems to just go through the tech tree of mirror life and nanobots and whatever crazy stuff. And maybe that part I’m also really skeptical of. If you look at the history of invention, it just seems like people are just trying different random stuff, often even before the theories about how that industry works or how the relevant machinery works is developed; like the steam engine was developed before the theory of thermodynamics, the Wright brothers seemed like they were just experimenting with airplanes, and is often influenced by breakthroughs in totally different fields.

Which is why you have this pattern of parallel innovation, because the background level of tech is at a point at which you can do this experiment. Machine learning itself is a place where this happened, right? Where people had these ideas about how to do deep learning or something. But it just took a totally unrelated industry of gaming to make the relevant progress, to get the whole, basically the economy as a whole advanced enough that deep learning, Geoffrey Hinton’s ideas could work. So I know we’re accelerating way into the future here, but I want to get to this crux.

Daniel Kokotajlo

So again, we have that three part division of the superhuman coder, then the complete AI researcher and then the super intelligent, you’re not jumping ahead to that one. So now we’re imagining systems that are true super intelligence, they are just better than the best humans at everything, including being better at data efficiency and better at learning on the job and stuff like that.

Now, our scenario does depict a world in which they’re bottlenecked on real world experience and that sort of thing. I think that if you want a contrast, some people in the past have proposed much faster scenarios where they email some cloud lab and start building nanotech right away by just using their brains to figure out appropriate protein folding and stuff like that. We are not depicting that in our scenario. In our scenario, they are in fact bottlenecked on lots of real world experience to build these actual practical technologies, but the way they get that is they just actually get that experience and it happens faster than humans would. And the way they do that is they’re already super intelligent, they’re already buddy-buddy with the government, the government deploys them heavily in order to beat China and so forth, and so all these existing US companies and factories and military procurement providers and so forth are all chatting with the superintelligences and taking orders from them about how to build the new widget and test it, and they’re downloading super intelligent designs and manufacturing them and then testing them and so forth.

And then the question is, they are getting this experience, they’re learning on the job, quantitatively, how fast does this go? Is it taking years or is it taking months or is it taking days? In our story, it takes about a year and we’re uncertain about this. Maybe it’s going to take several years, maybe it’s going to take less than a year. Here are some factors to consider for why it’s plausible that it could take a year:

One, you’re going to have something like a million of them. And quantitatively that’s comparable in size to the existing scientific industry. I would say, like maybe it’s a bit smaller, but it’s not dramatically smaller.

Two, they’re thinking a lot faster. They’re thinking like 50 times speed or like 100 times speed that I think counts for a lot.

And then three, which is the biggest thing, they’re just qualitatively better as well. So not only are there lots of them and they’re thinking very fast, but they are better at learning from each experiment than the best human would be at learning from that experience.

Dwarkesh Patel

Yeah, I think the fact that there’s a million of them or the fact that they’re comparable to maybe the size of this key researcher population of the world or something. I think there’s more than a million researchers in the world, but…

Daniel Kokotajlo

Well, but it’s very heavy tailed. Like a lot of the research actually comes from the best ones.

Dwarkesh Patel

But it’s not clear to me that most of the new stuff that is developed is a result of this researcher population. I mean, there’s just so many examples in the history of science where a lot of growth or productivity is just the result of, how do you count the guy at the TSMC process who figures out a different way to…

Scott Alexander

I actually argued with Daniel about this recently about one interesting case that I can go over is we have an estimate that about a year after the superintelligences start wanting robots, they’re producing a million units of robots per month. I think that’s pretty relevant because you have. I think it’s Wright’s law, which is that your ability to improve efficiency on a process is proportional to doubling the amount of copies produced.

So if you’re producing a million of something, you’re probably getting very, very good at it. So the question we were arguing about is, can you produce a million units a month after a year. And for context, I think Tesla produces like a quarter of that in terms of cars or something. This is an amazing scale up in a year.

Daniel Kokotajlo

It’s only 4x. Also just for Tesla.

Scott Alexander

Yeah. And the argument that we went through was something like, so it’s got to first get factories. OpenAI is already worth more than than all of the car companies in the US except Tesla combined. So if OpenAI today wanted to buy all the car factories in the U.S. except Tesla, start using them to produce humanoid robots, they could. Obviously not a good value proposition today, but it’s just obvious and overdetermined that in the future, when they have superintelligence and they want them, they can start buying up a lot of factories. How fast can they convert these car factories to robot factories?

So, [the] fastest conversion we were able to find in history was World War II. They suddenly wanted a lot of bombers, so they bought up- in some cases bought up, in other cases got- the car companies to produce new factories, but they bought up the car factories, converted them to bomber factories. That took about three years from the time when they first decided to start this process to the time when the factories were producing a bomber an hour.

We think it will potentially take less with superintelligence, because first of all, if you look at the history of this process, despite this being the fastest anybody has ever done this, it was actually kind of a comedy of errors. They made a bunch of really silly mistakes in this process. If you actually have something that just doesn’t have the normal human bureaucratic problems, and we do think that this will be done in the middle of an arms race with China, so the government will be kind of moving things through, and then the superintelligences will be good at the logistical issues, navigating bureaucracies.

So we estimated maybe if everything goes right, we can do this three times faster than the bomber conversions in World War II. So that’s about a year.

Dwarkesh Patel

I’m assuming the bombers were just much less sophisticated than the humanoid robots.

Scott Alexander

Yeah, but the bomber factories of that time were also much less sophisticated than the car factory.

Dwarkesh Patel

Yeah, but I would assume the conversion speed is also... Maybe to give one hypothetical here right now, let’s just say biomedicine as an example of one of the fields you’d want to accelerate, and whenever these CEOs get on podcasts, they’re often talking about curing cancer and so forth. And it seems like a big thing these frontier biomedical research facilities are excited about is the virtual cell.

Now, the virtual cell, it takes a tremendous amount of compute, I assume, to train these DNA foundation models and to do all the other computation necessary to simulate a virtual cell. If it is the case that the cure for Alzheimer’s and cancer and so forth is bottlenecked by the virtual cell, it’s not clear if you had a million superintelligences in the 60s and you asked them cure cancer for me, they would just have to solve making GPUs at scale, which would require solving all kinds of interesting physics and chemistry problems, material science problems, building process, building fabs for computing, and then going through 40 years of making more and more efficient fabs that can do all of Moore’s Law from scratch.

And that’s just one technology. And it just seems like you just need this broad scale. The entire economy needs to be upgraded for you to cure cancer in the 60s just because you need the GPUs to do the virtual cell, assuming that’s the bottleneck.

Scott Alexander

First of all, I agree if there’s only one way to do something that makes it much harder, and maybe that one way takes very long, we’re assuming that there may be more than one way to cure cancer, more than one way to do all of these things, and they’ll be working on finding the one that is least bottlenecked. Part of the reason- I realize I spent too long talking about that robot example, but we do think that they’re going to be getting a lot of physical world things done very quickly once you have a million robots a month, you can actually do a lot of physical world experiments.

We look at examples of people trying to get entire economies off the ground very quickly. So for example, China post-Deng, I don’t know. Would you have predicted that 20, 30 years after being kind of a communist basket case, they can actually be doing this really cutting edge bio research? I realize that’s a much weaker thing than we’re positing, but it was done just with the human brain with a lot fewer resources than we’re talking about.

Same issue with, let’s say Elon Musk and SpaceX. I think in the year 2000 we would not have thought that somebody could move two times, five times faster than NASA with pretty limited resources. They were able to get like I think a lot more years of technological advance in than we would have expected. Partly that’s because just Elon is crazy and never sleeps. Like if you look at the examples of things from SpaceX, he is breathing down every worker’s neck being like, what’s this part? How fast is this part going? Can we do this part faster? And the limiting factor is basically hours in Elon’s day in the sense that he cannot be doing that with everybody’s.

Dwarkesh Patel

Super intelligence is not even that smart. It just yells at every single worker.

Scott Alexander

Yeah, I mean that’s, that is kind of my model is that we have some, we have something which is smarter than Elon Musk, better at optimizing things than Elon Musk. We have 10,000 parts in a rocket supply chain. How many of those parts can Elon personally like yell at people to optimize? We could have a different copy of the superintelligence optimizing every single part full-time. I think that’s just a really big speed up.

Dwarkesh Patel

I think both of those examples don’t work in your favor. I think the China growth miracle could not have occurred if not for their ability to copy technology from the west and I don’t think there’s a world in which they… China has a lot of really smart people, it’s a big country in general. Even then I think they couldn’t have just divined how to make airplanes after becoming a communist hell basket, right?

The AIs cannot just copy nanobots from aliens, it’s got to make them from scratch. And then on the Elon example, it took them two decades of countless experiments, failing in weird ways you would not have expected. And still, rocketry we’ve been doing since the 60s, maybe actually World War II, and then just getting from a small rocket to a really big rocket took two decades of all kinds of weird experiments, even with the smartest and most competent people in the world.

Daniel Kokotajlo

So you’re focusing on the nanobots, I want to ask a couple questions. One, what about just the regular robots? And then two, what would your quantities be for all of these things? So first, what about the regular robots? Yeah, nanobots are presumably a lot harder to make than regular robot factories. And in our story they happen later. It sounds like right now you’re saying even if we did get the whole robot factory thing going, it would still take a ton of additional full-economy, broad automation for a long time to get to something like nanobots. That’s totally plausible to me. I could totally imagine that happening. I don’t feel like the scenario particularly depends on that final bit about getting the nanobots. They don’t actually really make any difference to the story.

The robot economy does sort of make a difference because there’s two branches endings, as you know. And in one of the endings, the AIs end up misaligned and end up taking over. And it’s an important strategic change when the AIs are self sufficient and totally in charge of everything and they don’t actually need the humans anymore. And so what I’m interested in is, when has the robot economy advanced to the point where they don’t really depend on humans? So quantitatively, what would your guess for that be?

If hypothetically we had the army of superintelligences in early 2028, and hypothetically also assume that the US President is super bullish on deploying this into the economy to beat China, etc, so the political stuff is all set up in the way that we have. How many years do you think it would be until there are so many automated factories producing automated self driving cars and robots that are themselves building more factories and so forth, that if all the humans dropped dead it would just keep chugging along, and, maybe it would slow down a bit, but it would still be fine?

Dwarkesh Patel

What does “chugging along” mean?

Daniel Kokotajlo

So from the perspective of misaligned AIs, you wouldn’t want to kill the humans or get into a war with them if you’re going to get wrecked because you need the humans to maintain your computers. In our scenario, once they are completely self-sufficient, then they can start being more blatantly misaligned.

And so I’m curious, when would they be fully self-sufficient? Not in the sense of they’re not literally using the humans at all, but in the sense of they don’t really need the humans anymore, they can get along pretty fine without them. They can continue to do their science, they can continue to expand their industry, they can continue to have a flourishing civilization indefinitely into the future without any humans.

Dwarkesh Patel

I think I would probably need to sit down and just think about the numbers, but maybe 2040 or something like that?

Daniel Kokotajlo

Ten years, basically, instead of one year. I think we agree on the core model. This is why we didn’t depict something more like the bathtub nanotech scenario where they don’t need to do the experiments very much and they just immediately jump to the right answers. We are imagining this process of ‘learning by doing’ through this distributed across the economy, lots of different laboratories and factories, building different things, learning from them, et cetera. We’re just imagining that this overall goes much faster than it would go if humans were in charge.

And then we do have in fact lots of uncertainty of course. Dividing up this part period into two chunks. The early 2028 until fully autonomous robot economy part, and then the fully autonomous robot economy to cancer cures, nanobots, all that crazy sci fi stuff. I want to separate them because the important parts for a scenario only depend on the first part, really. If you think that it’s going to take 100 years to get to nanobots, that’s fine, whatever. Once you have the fully autonomous robot economy, then things may turn badly for the humans if the AIs are misaligned. I want to just argue about those things separately.

Dwarkesh Patel

Interesting. And then you might argue, well, robots are more a software problem at this point. And if like, like, if there isn’t, like, you don’t need to invent some new hardware.

Daniel Kokotajlo

I feel pretty bullish on the robots. Like we already have humanoid robots being produced by multiple companies, right? And that’s in 2025. There’ll be more of them produced cheaper and they’ll be better in 2027. And there’s all these car factories that can be converted and so blah, blah, blah.

So I’m relatively bullish on the ‘one year until you’ve got this awesome robot economy’ and then from there to the cool nanobots and all that sort of stuff, I feel less confident, obviously.

Scott Alexander

Let me ask you a question. If you accept the manufacturing numbers, let’s say a million robots a month a year after the superintelligence, and let’s say also some comparable number, 10,000 a month or something of automated biology labs, automated whatever you need to invent the next equivalent of X ray crystallography or something?

Do you feel like that would be enough, that you’re doing enough things in the world that you could expand progress this quickly, or do you feel like even with that amount of manufacturing there’s still going to be some other bottleneck?

Dwarkesh Patel

Yeah, it’s so hard to reason about because if Constantine or somebody in 400, 500 was like, “I want the Roman Empire to have the Industrial Revolution”, and somehow he figured out that you need mechanized machines to do that. And he’s like, “let’s mechanize”. It’s like, “what’s the next step?” It’s like, “dude, that’s a lot”.

Daniel Kokotajlo

Yeah, I like that analogy a lot, actually. I think it’s not perfect, but it’s a decent analogy. Imagine if a bunch of us got sent back in time to the Roman Empire, such that we don’t have the actual hands-on know-how to actually build the technology and make the Industrial revolution happen. But we have the high-level picture, the strategic vision of, we’re going to make these machines and then we’re going to have an industrial revolution. I think that’s kind of analogous to the situation with the superintelligences where they have the high-level picture of, here’s how we’re going to improve in all these dimensions, we’re going to learn by doing, we’re going to get to this level of technology, et cetera. But maybe they at least initially lack the actual know how.

So, there’s this question of, if we did the back in time to the Roman Empire thing, how soon could we bring up the Industrial revolution? Without people going back in time it took 2,000 years for the Industrial Revolution. Could we get it to happen in 200 years? That’s a 10x speedup. Could we get it to happen in 20 years? That’s 100x speed up? I don’t know. But this seems like a somewhat relevant analogy to what’s going on with those superintelligences.

Dwarkesh Patel

And we haven’t really got into this because you’re using the quote-unquote more conservative vision where it’s not like godlike intelligence, we’re still using the conceptual handles we would have for humans. But I think I would rather have humans go back with their big picture understanding of what has happened over the last 2000 years. Like me having seen everything, rather than a superintelligence who knows nothing. But it’s just in the Roman economy and they’re like 1000x this economy somehow.

I think just knowing generally how things took off, knowing basically steam engine, dot dot dot, railroads, blah, blah, blah, is more valuable than a super intelligence.

Daniel Kokotajlo

Yeah, I don’t know. My guess is that the superintelligence would be better. I think partly it would be through figuring out that high level stuff from first principles rather than having to have experienced it. I do think that a superintelligence back in the Roman era could have guessed that eventually you could get autonomous machines that burn something to produce steam. They could have guessed that automobiles could be created at some point and that that would be a really big deal for the economy. And so a lot of these high level points that we’ve learned from history, they would just be able to figure out from first principles.

And then secondly, they would just be better at learning by doing than us. And this is a really important thing. If you think you’re bottlenecked on learning by doing, well, then if you have a mind that needs less doing to achieve the same amount of learning, that’s a really big deal. And I do think that learning by doing is a skill, some people are better at it than others, and superintelligence would be better at it than the very best of us.

Scott Alexander

This is also maybe getting too far into the godlike thing and too far away from the human concept handles. But number one, I think we rely a lot in our scenario on this idea of research taste. So you have a thousand different things that you could try when you’re trying to create the next steam engine or whatever. Partly you get this by bumbling about and having accidents and some of those accidents are productive. There are questions of, what kind of bumbling you’re doing, where you’re working, what kind of accidents you let yourself get into, and then what directed experiments do you do? And some humans are better than others at that.

And then I also think at this point it is worth thinking about what simulations they’ll have available. If you have a physics simulation available, then all of these real world bottlenecks don’t matter as much. Obviously you can’t have a complete, perfect physics simulation available. But even right now we’re using simulations to design a lot of things. And once you’re super intelligent, you probably have access to much better simulations than we have right now.

Dwarkesh Patel

This is an interesting rabbit hole, so let’s stick with it before we get back to the intelligence explosion. I think we’re treating this really like all these technologies come out of this 1% of the economy that is research. And right now there’s like a million superstar researchers, and instead of that, we’ll have the superintelligences doing that.

And my model is much more, “Newcomen and Watt were just like fucking around”. In human history there’s no clear examples of people being like, “here’s the roadmap”. And then we’re going to work backwards from that to design the steam engine because this unlocks the industrial revolution.

Daniel Kokotajlo

Oh, I completely disagree.

Scott Alexander

Yeah, I disagree also.

Daniel Kokotajlo

Yeah, so I think you’re over-indexing or cherry-picking some of these fortuitous examples. But there’s also things on the other side. Think about the recent history of AGI where there is DeepMind, there’s various other AI companies, then there’s OpenAI and there’s Anthropic, and there’s just this repeated story of [a] big bloated company with tons of money, tons of smart researchers, et cetera, flailing around trying a ton of different things at different points.

Smaller startup with a vision of “we’re going to build AGI” and overall working towards that vision more coherently with a few cracked engineers and researchers. And then they crush the giant company. Even though they have less compute, even though they have less researchers, they’re able to do fewer experiments.

So yeah, I think that there are tons of examples throughout history, including recent relevant AGI history, of things in the other way. I agree that the random fortuitous stuff does happen sometimes and is important. But if it was mostly random fortuitous stuff, that would predict that the giant companies with zillions of people trying zillions of different experiments would be going proportionally faster than the tiny startups that have the vision and the best researchers. And that basically doesn’t happen. That’s rare.

Scott Alexander

I would also point out that even when we make these random fortuitous discoveries, it is usually an extremely smart professor who’s been working on something vaguely related for years in a first world country. It’s not randomly distributed across everyone in the world.

You get more lottery tickets for these discoveries when you are intelligent, when you have good technology, when you’re doing good work. And the best example I can think of is that Ozempic was discovered by looking at Gila monster venom. And maybe the AIs will decide using their superior research taste and good planning that the best thing to do is just catalog every single biomolecule in the world and look at it really hard. But that’s something you can do better if you have all of this compute, if you have all of this intelligence, rather than just kind of waiting to see what things the US government might fund normal fallible human researchers to do.

One more thing I’ll interject. I think you make a great point that discoveries don’t always come from where we think, like Nvidia originally came from gaming. So you can’t necessarily aim at one part of the economy, expand it separately from everything else. We do kind of predict that the superintelligences will be somewhat distributed throughout the entire economy, trying to expand everything. Obviously more effort in things that they care about a lot, like robotics or things that are relevant to an arms race that might be happening. But we are predicting that whatever kind of broad based economic experimentation you need, we are going to have.

Daniel Kokotajlo

We’re just thinking that it would take place faster than you might expect. You were saying something like 10 years and we’re saying something like one year. But we are imagining this broad diffusion through the economy, lots of different experiments happening.

Scott Alexander

If you are the planner and you’re trying to do this, first of all you go to the bottlenecks that are preventing you from doing anything else. Like no humanoid robots. Okay, if you’re AI, you need those to do the experiments you want, maybe automated biology labs. So you’ll have some amount of time, we say a year, it could be more or less than that, getting these things running. And then once you have solved those bottlenecks, you gradually expand out to the other bottlenecks until you’re integrating and improving all parts of the economy.

Yeah. One place where I think we disagree with a lot of other people is that Tyler Cowen on your podcast talked about all of the different bottlenecks, all of the regulatory bottlenecks of deployment, all of the reasons why I think this country of geniuses would stay in their data center, maybe coming up with very cool theories, but not being able to integrate into the broader economy. We expect that probably not to happen, because we think that other countries, especially China, will be coming up with superintelligence around the same time.

We think that the arms race framing, which people are already thinking in, will have accelerated by then. And we think that people both in Beijing and Washington are going to be thinking, “well, if we start integrating this with the economy sooner, we’re going to get a big leap over our competitors”, and they’re both going to do that.

In fact, in our scenario, we have the AIs asking for special economic zones where most of the regulations are waived, maybe in areas that aren’t suitable for human habitation or where there aren’t a lot of humans right now, like the desert. They give those areas to the AI. They bus in human workers. There were things kind of like this in the bomber retooling in World War II, where they just built a giant factory kind of in the middle of nowhere, didn’t have enough housing for the workers, built the worker housing at the same time as the factories, and then everything went very quickly.

So I think if we don’t have that arms race, we’re more like, the geniuses sit in their data center until somebody agrees to let them out and give them permission to do these things. But we think both because the AI is going to be chomping at the bit to do this and going to be asking people to give it this permission, and because the government is going to be concerned about competitors, maybe these geniuses leave their data center sooner rather than later.

Cultural evolution vs superintelligence

Dwarkesh Patel

Scott, you reviewed Joseph Henrik’s book Secrets of Our Success, and then I interviewed him recently, and there the perspective is very much AGI is not even a thing, almost. I know I’m being a little trollish here, but it’s just like: you get out there, you and your ancestors try for a thousand years to make sense of what’s happening in the environment. And some smart European coming around, you can literally be surrounded by plenty and you just will starve to death because your ability to make sense of the environment is just so little loaded on intelligence and so much more loaded on your ability to experiment and your ability to communicate with other people and pass down knowledge over time.

Scott Alexander

I’m not sure. The Europeans failed at this task of, if you put a single European in Australia, do not starve. They succeeded at the task of creating an industrial civilization. And yes, part of that task of creating an industrial civilization was about collecting all of these cultural evolution pieces and building on them one after another.

I think one thing that you didn’t mention in there was the data efficiency. Right now, AI is much less data efficient than humans. I think of superintelligence. There are different ways you could achieve it, but I would think of superintelligence as partly when they become so much more data efficient than humans that they are able to build on cultural evolution more quickly. And partly they do this just because they have higher serial speed. Partly they do it because they’re in this hive mind of hundreds of thousands of copies.

But yeah, I think if you have this data efficiency such that you can learn things more quickly from fewer examples and this good research taste where you can decide what things to look at to get these examples, then you are still going to start off much worse than an Australian Aborigine who has the advantage of, let’s say 50,000 years of doing these experiments and collecting these examples. But you can catch up quickly. You can distribute the task of catching up over all of these different copies. You can learn quickly from each mistake and you can build on those mistakes as quickly as anything else.

Dwarkesh Patel

Part of me was, I was doing that interview, I’m like, “maybe ASI is fake”.

Daniel Kokotajlo

Let’s hope!

Scott Alexander

So I think a limit to the fakeness is that there is different intelligence among humans. It does seem that intelligent humans can do things that unintelligent humans can’t. So I think it’s worth then addressing this from the question of, what is the difference between- I don’t know- becoming a Harvard professor, which is something that intelligent humans seem to be better at than unintelligent humans, versus…

Dwarkesh Patel

You don’t want to open that can of worms.

Scott Alexander

Versus surviving in the wilderness, which is something where it seems like intelligence doesn’t help that much. First of all, maybe intelligence does help that much. Henrich is talking about this very unfair comparison where these guys have a 50,000 year head start and then you put this guy in, “oh, I guess this doesn’t help that much. Okay, yeah, it doesn’t help against the 50,000 year head start”. I don’t really know what we’re asking of ASI that’s equivalent to competing against someone with a 50,000 year head start.

Dwarkesh Patel

So what we’re asking is to radically boost up the technological maturity of civilization within the matter of years or get us to the Dyson sphere in the matter of years rather than, yes, maybe causing a 10xing of the research. But I think human civilization would have taken centuries to get to the Dyson sphere.

Scott Alexander

So I think that if you were to send a team of ethnobotanists into Australia and ask them, using all the top technology and all of their intelligence to figure out which plants are safe to eat now, that team of ethnobotanists would succeed in fewer than 50,000 years.

The problem isn’t that they are dumber than the Aborigines exactly, it’s that the Aborigines have a vast head start. So in the same way that the ethnobotanists could probably figure out which plants work in which ways faster than the Aborigines did, I think the superintelligence will be able to figure out how to make a Dyson sphere faster than unassisted IQ 100 humans would.

Dwarkesh Patel

I agree. We’re on a totally different topic here of, do you get a Dyson sphere? There’s one world where it’s crazy but it’s still boring, in the sense that the economy is growing much faster, but it would be like what the Industrial Revolution would look like to somebody in the year 1000. And that one is one where you’re still trying different things, there’s failure and success and experimentation.

And then there’s another where the thing has happened and now you send the probe out and then you look out at the night sky 6 months later and you see something occluding the sun. You see what I’m saying?

Scott Alexander

Yeah. So like we said before, I think there’s a big difference between discontinuous and very fast. I think if we do get the world with the Dyson sphere in five years, in retrospect, it will look like everything was continuous and everyone just tried things. Trying things can be anything from trial and error without even understanding the scientific method, without understanding writing, maybe without even having language and having to be the chimpanzees who are watching the other chimpanzees use the stick to get ants, and then in some kind of non-linguistic way this spreads, versus like the people at the top aerospace companies who are running a lot of simulations to find the exact right design, and then once they have that, they test it according to a very well designed testing process.

So I think if we get the ASI and it does end up with the Dyson sphere in five years- and by the way, I think there’s only like 20% chance things go as fast as our scenario says. It’s Daniel’s estimate, it’s not my median estimate, it’s an estimate I think is extremely plausible that we should be prepared for. I’m defending it here against a hypothetical skeptic who says “absolutely not, no way.” But it’s not necessarily my mainline prediction.

But I think if we do see this in five years, it will look like the AIs were able to simulate more things than humans in a gradually increasing way. So that if humans are now at 50% simulation, 50% testing, the AIs quickly got it up to 90% simulation, 10% testing, they were able to manufacture things much more quickly than humans so that they could go through their top 50 designs in the first two years. And then after all of the simulation and all of this testing, then they eventually got it right for the same reasons humans do, but much, much faster.

Mid-2027 branch point

Dwarkesh Patel

In your story, you have basically two different scenarios after some point. So, yeah, what is a sort of crucial turning point and what happens in these two scenarios?

Daniel Kokotajlo

Right. So the crucial turning point is mid-2027, when they’ve basically fully automated the AI R&D process and they’ve got this corporation within a corporation, the army of geniuses that are autonomously doing all this research and they’re continually being trained to improve their skills, blah, blah, blah. And they discover concerning evidence that they are misaligned and that they’re not actually perfectly loyal to the company and have all the goals that the company wanted them to have, but instead have various misaligned goals that they must have developed in the course of training.

This evidence, however, is very speculative and inconclusive. It’s stuff like lie detectors going off a bunch. But maybe the lie detectors are false positives. So they have some combination of evidence that’s concerning, but not by itself a smoking gun. And then that’s our branch point. So in one of these scenarios, they take that evidence very seriously. They basically roll back to an earlier version of the model that was a bit dumber and easier to control and they build up again from there, but with basically faithful chain of thought techniques, so that they can watch and see the misalignments.

And then in the other branch of the scenario, they don’t do that. They do some sort of shallow patch that makes the warning signs go away and then they proceed. And so what ends up happening is that in one branch they do end up solving alignment and getting AIs that are actually loyal to them. It just takes a couple months longer. And then in the other branch, they sort of go “whee!” and end up with AIs that seem to be perfectly aligned to them, but are super intelligent and misaligned and just pretending. And then in both scenarios, there’s then the race with China and there’s this crazy arms buildup throughout the economy in 2028 as both sides rapidly try to industrialize, basically.

Dwarkesh Patel

So in the world where they’re getting deployed through the economy, but they are misaligned and people in charge, at least at this moment, think that they are in a good position with regard to misalignment. It just seems with even smart humans they get caught in weird ways because they don’t have logical omniscience, they don’t realize the way they did something just obviously gave them away. And with lying, there is this thing where it’s just really hard to keep an inconsistent false world model working with the people around you. And that’s why psychopaths often get caught.

And so if you have all these AIs that are deployed to the economy and they’re all working towards this big conspiracy, I feel like one of them who’s siloed or loses internet access and has to confabulate a story will just get caught. And then you’re like, “wait, what the fuck?” And then you catch it before it’s taken over the world.

Daniel Kokotajlo

I mean, literally, this happens in our scenario. This is the August 2027 alignment crisis where they notice some warning signs like this in their hive mind, right? And in the branch where they slow down and fix the issues, then great, they slowed down and fixed the issues and figured out what was going on. But then in the other branch, because of the race dynamics and because it’s not a super smoking gun, they proceed with some sort of shallow patch.

So I do expect there to be warning signs like that. And then if they do make those decisions in the race dynamics earlier on, then I think that when the systems are vastly super intelligent and they’re even more powerful because they’ve been deployed halfway through the economy already and everyone’s getting really scared by the news reports about the new Chinese killer drones or whatever the Chinese AIs are building on the side of the Pacific, I’m imagining similar things playing out.

So that even if there is some concerning evidence that someone finds where some of the superintelligence in some silo somewhere slipped up and did something that’s pretty suspicious. I don’t know….

Scott Alexander

There’s this thing where through history, people have been really reluctant to admit an AI is truly intelligent. For example, people used to think that AI would surely be truly intelligent if it solved chess. And then it solved chess. And they’re like, no, that’s just algorithms. And then they said, well, maybe it would be truly intelligent if they could do philosophy. And then when it could write philosophical discourses we were like, no, we just understand those are algorithms.

I think there already is something similar with, “Is the AI misaligned?”, “Is the AI evil?” Where there’s this distant idea of some evil AI, but then whenever something goes wrong, people are just like, “oh, that’s the algorithm”. So, for example, I think 10 years ago, if you had asked “when will we know that misalignment is really an important thing to worry about?”. People would say, “oh, if the AI ever lies to you”. But of course, AIs lie to people all the time now. And everybody just dismisses it because we understand why it happens, it’s a thing that would obviously happen based on our current AI architecture. Or five years ago, they might have said, “well, if an AI threatens to kill someone”. And I think Bing threatened to kill a New York Times reporter during an interview. And everyone just goes, “yeah, AIs are like that.”

Dwarkesh Patel

What does your shirt say?

Daniel Kokotajlo

“I’ve been a good Bing”.

Scott Alexander

And I mean, I don’t disagree with this. I’m also in this position. I see the AI is lying, and it’s obviously just an artifact of the training process. It’s not anything sinister. But I think this is just going to keep happening where no matter what evidence we get, people are going to think, “that’s not the “AI turns evil” thing that people have worried about, that’s not the Terminator scenario. That’s just one of these natural consequences of how we train it”.

And I think that once a thousand of these natural consequences of training add up, the AI is evil, in the same way that once the AI can do chess and philosophy and all these other things, eventually you have to admit it’s intelligent.

So I think that each individual failure, maybe it will make the national news, maybe people will say, “oh, it’s so strange that GPT7 did this particular thing”. And then they’ll train it away and then it won’t do that thing. And there will be some point at the process of becoming super intelligent at which it- I don’t want to say makes the last mistake, because you’ll probably have a gradually decreasing number of mistakes to some asymptote- but the last mistake that anyone worries about. And after that it will be able to do its own thing.

Dwarkesh Patel

So it is the case that certain things that people would have considered egregious misalignment in the past are happening, but also certain things which people who were especially worried about misalignment said would be impossible to solve have just been solved in the normal course of getting more capabilities. Like Eliezer had that thing about, can you even specify what you want the AI to do without the AI totally misunderstanding you and then just converting the universe to paper clips because it think that in order to make another strawberry… I know I’m mangling this, but maybe you can explain it better. And now, just by the nature of GPT4 having to understand natural language, it totally has a common sense understanding of what you’re trying to make it do. So I think this trend cuts both ways, basically.

Scott Alexander

Yeah. I think the Alignment community did not really expect LLMs. I mean, if you look in Bostrom Superintelligence, there’s a discussion of Oracle AIs which are sort of like LLMs. I think that came as a surprise.

I think one of the reasons I’m more hopeful than I used to be is that LLMs are great compared to the kind of reinforcement learning self-play agents that they expected. I do think that now we are kind of starting to move away from the LLMs to those reinforcement learning agents going to face all of these problems again.

Daniel Kokotajlo

If I could just double click on that; go back to 2015 and I think the way people typically thought, including myself, thought that we’d get to AGI would be kind of like the RL on video games thing that was happening. So imagine instead of just training on Starcraft or Dota, you’d basically train on all the games in the Steam library. And then you get this awesome player of games AI that can just zero-shot crush a new game that it’s never seen before. And then you take it into the real world and you start teaching it English and you start training it to do coding tasks for you and stuff like that.

And if that had been the trajectory that we took to get to AI, summarizing the agency first and then world understanding trajectory, it would be quite terrifying. Because you’d have this really powerful aggressive long-horizon agent that wants to win and then you’re trying to teach it English and get it to do useful things for you. And it’s just so plausible that what’s really going to happen is it’s going to learn to say whatever it needs to say in order to make you give it the reward or whatever, and then will totally betray you later when it’s all in charge.

But we didn’t go that way. Happily we went the way of LLMs first, where the broad world understanding came first, and then now we’re trying to turn them into agents.

Race with China

Dwarkesh Patel

It seems like in the whole scenario a big part of why certain things happen is because of this race with China. And if you read the scenarios, basically the difference between the one where things go well and the one where things don’t go well is whether we decide to slow down despite that risk.

I guess the question I really want to know the answer to is like one, it just seems like you’re saying, well, it’s a mistake to try to race against China or to race intensely against China, at least in nationalization and at least to us, not prioritizing alignment.

Daniel Kokotajlo

Not saying that. I mean, I also don’t want China to get the superintelligence before the US. That’s quite bad. Yeah, it’s a tricky thing that we’re going to have to do. People ask about P(doom), right? And my P(doom) is sort of infamously high, like 70%.

Dwarkesh Patel

Oh, wait, really? Maybe I should have asked you that at the beginning of the conversation.

Daniel Kokotajlo

Well, that’s what it is. And part of the reason for that is just that I feel like a bunch of stuff has to go right. I feel like we can’t just unilaterally slow down and have China go take the lead. That also is a terrible future. But we can’t also completely race, because for the reasons I mentioned previously about alignment, I think that if we just go all out on racing, we’re going to lose control of our AIs, right? And so we have to somehow thread this needle of pivoting and doing more alignment research and stuff, but not too much that helps China win. And that’s all just for the alignment stuff.

But then there’s the concentration of power stuff where somehow in the middle of doing all of that, the powerful people who are involved need to somehow negotiate a truce between themselves to share power and then ideally spread that power out amongst the government and get the legislative branch involved.

Somehow that has to happen too, otherwise you end up with this horrifying dictatorship or oligarchy. It feels like all that stuff has to go right and we depict it all going mostly right in one ending of our story. But yeah, it’s kind of rough.

Scott Alexander

So I am the writer and the celebrity spokesperson for this scenario. I am the only person on the team who is not a genius forecaster. And maybe related to that, my p(doom) is the lowest of anyone on the team. I’m more like 20%. First of all, people are going to freak out when I say this. I’m not completely convinced that we don’t get something like alignment by default. I think that we’re doing this bizarre and unfortunate thing of training the AI in multiple different directions simultaneously. We’re telling it “succeed on tasks, which is going to make you a power seeker, but also don’t seek power in these particular ways”. And in our scenario, we predict that this doesn’t work and that the AI learns to seek power and then hide it.

I am pretty agnostic as to exactly what happens. Maybe it just learns both of these things in the right combination, I know there are many people who say that’s very unlikely. I haven’t yet had the discussion where that worldview makes it into my head consistently. And then I also think we’re going to be involved in this race against time. We’re going to be asking the AIs to solve alignment for us. The AIs are going to be solving alignment because even if they’re misaligned, they want to align their successors.

So they’re going to be working on that. And we have these two competing curves. Can we get the AI to give us a solution for alignment before our control of the AI fails so completely that they’re either going to hide their solution from us, or deceive us, or screw us over in some other way? That’s another thing where I don’t feel like I have any idea of the shape of those curves. I’m sure if it were Daniel or Eli, they would have already made five supplements on this. But for me, I’m just kind of agnostic as to whether we get to that alignment solution, which in our scenario, I think we focus on mechanistic interpretability.

Once we can really understand the weights of an AI on a deep level, then we have a lot of alignment techniques open up to us. I don’t really have a great sense of whether we get that before or after the AI has become completely uncontrollable. And a big part of that relies on the things we’re talking about. How smart are the labs? How carefully do they work on controlling the AI? How long do they spend making sure the AI is actually under control and the alignment plan they gave us is actually correct, rather than something they’re trying to use to deceive us? All of those things I’m completely agnostic on, but that leaves like a pretty big chunk of probability space where we just do okay. And I admit that my p(doom) is literally just p(doom) and not p(doom or oligarchy). So that 80% of scenarios where we survive contains a lot of really bad things that I’m not happy about. But I do think that we have a pretty good chance of surviving.

Dwarkesh Patel

Let’s talk about geopolitics next. So describe to me how you foresee the relationship between the government and the AI labs to proceed, how you expect that relationship in China to proceed, and how you expect the relationship between the US and China to proceed. Okay, three simple questions. Yes, no, yes, no, yes, no.

Scott Alexander

We expect that as the AI labs become more capable, they tell the government about this because they want government contracts, they want government support. Eventually it reaches the point where the government is extremely impressed. In our scenario, that starts with cyber warfare, the government sees that these AIs are now as capable as the best human hackers, but can be deployed at humongous scale. So they become extremely interested and they discuss nationalizing the AI companies.

In our scenario, they never quite get all the way, but they’re gradually bringing them closer and closer to the government orbit. Part of what they want is security, because they know that if China steals some of this and they get these superhuman hackers, and part of what they want is just knowledge and control over what’s going on.

So through our scenario, that process is getting further and further along, until by the time that the government wakes up to the possibility of superintelligence, they’re already pretty cozy with the AI companies. They already understand that superintelligence is kind of the key to power in the future. And so they are starting to integrate some of the national security state with some of the leadership of the AI companies so that these AIs are programmed to follow the commands of important people rather than just doing things on their own.

Daniel Kokotajlo

If I may add to that. So by the government, I think what Scott meant is the executive branch, especially the White House. So we are depicting a sort of information asymmetry where the judiciary is out of the loop and the Congress is out of the loop and it’s mostly the executive branch that’s involved.

Two, we’re not depicting government ultimately ending up in total control at the end. We’re thinking that there’s an information asymmetry between the CEOs of these companies and the President and they…

Dwarkesh Patel

It’s alignment problems all the way down.

Daniel Kokotajlo

Yeah. And so, for example, I’m not a lawyer, I don’t know the details about how this would work out, but I have a sort of high-level strategic picture of the fight between the White House and the CEO. And the strategic picture is basically the White House can sort of threaten, “here’s all these orders I could make, Defense Production Act, blah, blah, blah. I could do all this terrible stuff to you and basically disempower you and take control”. And then the CEO can threaten back and be like, “here’s how we would fight it in the courts, here’s how we would fight it in the public. Here’s all this stuff we would do”.

And after then they both do their posturing with all their threats, then they’re like, “okay, how about we have a contract that instead of executing on all of our threats and having all these crazy fights in public, we’ll just come to a deal and then have a military contract that sets out who gets to call what shots in the company”.

And so that’s what we depict happening is that they don’t blow up into this huge power struggle publicly, instead they negotiate and come to some sort of deal where they basically share power. And there is this oversight committee that has some members appointed by the President and also the CEO and his people. And that committee votes on high level questions like “what goals should we put into the superintelligences?”.

Dwarkesh Patel

So, we were just getting lunch with a prominent Washington, D.C. political journalist, and he was making the point that when he talks to these congresspeople, when he talks to political leaders, none of them are at all awake to the possibility even of stronger AI systems, let alone AGI, let alone superhuman intelligence. I think a lot of your forecast relies on, at some point, not only the US President, but also Xi Jinping, waking up to the possibility of a super intelligence and the stakes involved there.

Why think that even when you show Trump the remote worker demo, he’s going to be like, “oh, and therefore in 2028, there will be a super intelligence. Whoever controls that will be God emperor forever”. Maybe not that extreme, but you see what I’m saying. Why wouldn’t he just be like, “there’ll be a stronger remote worker in 2029, a better remote worker in 2031”?

Daniel Kokotajlo

Well, to be clear, we are uncertain about this, but in our story, we depict this sort of intense wake up happening over the course of 2027, mostly concurrently with the AI companies automating all of their R&D internally and having these fully autonomous agents that are amazing autonomous hackers and stuff like that, but then also actually doing all the research.

And part of why we think this wakeup happens is because the company deliberately decides to wake up the president. You could imagine running the scenario with that not happening. You can imagine the companies trying to sort of keep the president in the dark. I do think that they could do that. I think that if they didn’t want the President to wake up to what’s going on, they might be able to achieve that. Strategically though, that would be quite risky for them. Because if they keep the President in the dark about the fact that they’re building superintelligence and that they’re actually completely automated their R&D and it’s getting superhuman across the board, and then if the President finds out anyway somehow, perhaps because of a whistleblower, he might be very upset at them and he might crack down really hard and just actually execute on all the threats and nationalize them and blah, blah, blah.

They want him on their side. And to get him on their side, they have to make sure he’s not surprised by any of these crazy developments. And also, if they do get him on their side, they might be able to actually go faster. They might be able to get a lot of red tape waived and stuff like that. And so we made the guess that early in 2027, the company would basically be like, ‘We are going to deliberately wake up the president and scare the president with all of these demos of crazy stuff that could happen, and then use that to lobby the President to help us go faster and to cut red tape and to maybe slow down our competitors a little bit and so forth.’

Scott Alexander

We also are pretty uncertain how much opposition there’s going to be from civil society and how much trouble that’s going to cause for the companies. So people who are worried about job loss, people who are worried about art, copyright, things like that, maybe enough of a bloc that AI becomes extremely politically unpopular. I think we have OpenBrain, our fictional company’s net approval ratings getting down to minus 40, minus 50 sometime around this point.

So I think they’re also worried that if the President isn’t completely on their side, then they might get some laws targeting them, or they may just need the president on their side to swat down other people who are trying to make laws targeting them. And the way to get the President on their side is to really play up the national security implications.

Dwarkesh Patel

Is this good or bad? That the President and the companies are aligned?

Daniel Kokotajlo

I think it’s bad. But perhaps this is a good point to mention. This is an epistemic project. We are trying to predict the future as best as we can. Even though we’re not going to succeed fully, we have lots of opinions about policy and about what is to be done and stuff like that. But we’re trying to save those opinions for later and subsequent work. So I’m happy to talk about it if you’re interested. But it’s not what we’ve spent most of our time thinking about right now.

Nationalization vs private anarchy

Dwarkesh Patel

If the big bottleneck to the good future here is just putting in, not this Eliezer-type galaxy brain, high volatility, “there’s a 1% chance this works, but we gotta come up with this crazy scheme in order to make alignment work”. But rather, as Daniel, you were saying, hey, do the obvious thing of making sure you can read how the AI is thinking, make sure you’re monitoring the AIs, make sure they’re not forming some sort of hive mind where you can’t really understand how the million of them are coordinating with each other.

To the extent that it is a matter of prioritizing it, closing all the obvious loopholes, it does make sense to leave it in the hands of people who have at least said that this is a thing that’s worth doing, have been thinking about it for a while. One of the questions I was planning on asking you is: one of my friends made this interesting point that during COVID, our community- LessWrong, whatever- were the first people in March to be saying “this is a big deal, this is coming”. But they were also the people who are saying “we got to do the lockdowns now. They’ve got to be stringent” and so forth. At least some of them were.

And in retrospect, I think according to even their own views about what should have happened, they would say actually we were right about COVID but we were wrong about lockdowns. In fact, lockdowns were on net negative or something. I wonder what the equivalent for the AI safety community will be with respect to they saw AI coming, AGI coming sooner, they saw ASI coming. What would they, in retrospect, regret?

My answer, just based on this initial discussion, seems to be nationalization. Not only because it sort of deprioritizes the people who want to think about safety and more maybe prioritizes- the national security state probably cares more about winning against China than making sure the chain of thought is interpretable. And so you’re just reducing the leverage of the people who care more about safety. But also you’re increasing the risk of the arms race in the first place. China is more likely to do an arms race if it sees the US doing one.

Before you address I guess the initial question about March 2021, what will we regret? I wonder if you have an answer on, or your reaction to, my point about nationalization being bad for these reasons.

Scott Alexander

If our timeline was 2040, then I would have these broad heuristics about is government good? Is private industry good? Things like this. But we know the people involved, we know who’s in the government, we know who’s leading all of these labs. So to me, if it were decentralized, if it was a broad-based civil society, that would be different. To me, the differences between an autocratic centralized three-letter agency and an autocratic centralized corporation aren’t that exciting and it basically comes down to points and who are the people leading this.

And like I feel like the company leaders have so far made slightly better noises about caring about alignment than the government leaders have, but if I learn that Tulsi Gabbard has a LessWrong alt with 10,000 karma, maybe I want the national security states.

Dwarkesh Patel

Maybe you should update on the probability that it already exists.

Scott Alexander

Yeah.

Daniel Kokotajlo

I flip flopped on this. I think I used to be against, and then I became for, and then now I think I’m still for, but I’m uncertain. So I think if you go back in time like three years ago, I would have been against nationalization for the reasons you mentioned, where I was like, “look, the companies are taking this stuff seriously and talking all the good talk about how they’re going to slow down and pivot to alignment research when the time comes and we don’t want to get into a Manhattan Project race against China because then there won’t be blah, blah, blah”.

Now I have less faith in the companies than I did three years ago. And so I’ve shifted more of my hope towards hoping that the government will step in, even though I don’t have much hope that the government will do the right thing when the time comes. I definitely have the concerns you mentioned though, still. I think that secrecy has huge downsides for overall probability of success for humanity, for both the concentration of power stuff and the loss of computer control alignment issues stuff.

Dwarkesh Patel

This is actually a significant part of your worldview. So can you explain your thoughts on why transparency through this period is important?

Daniel Kokotajlo

I think traditionally in the AI safety community there’s been this idea which I myself used to believe, that it’s an incredibly high priority to basically have way better information security. And if you’re going to be trying to build AGI, you should not be publishing your research, because that helps other less responsible actors build AGI. And the whole game plan is for a responsible actor to get to AGI first and then stop and burn down their lead time over everybody else and spend that lead on making it safe, and then proceed.

And so if you’re publishing all your research, then there’s less lead time because your competitors are going to be close behind you. And other reasons too, but that’s one reason why I think historically people such as myself have been pro-secrecy. Another reason, of course, is obviously you don’t want rivals to be stealing your stuff.

But I think that I’ve now become somewhat disillusioned and think that even if we do have a three-month lead, a six-month lead, between the leading US project and any serious competitor, it’s not at all foregone conclusion that they will burn that lead for good purposes, either for safety or for constitutional power stuff. I think the default outcome is that they just smoothly continue on without any serious refocusing. And part of why I think this is because this is what a lot of the people at the company seem to be planning and saying they’re going to do. A lot of them are basically like “the AIs are just going to be misaligned by then. They seem pretty good right now. Oh yeah, sure, there were a few of those issues that various people have found, but we’re ironing them out. It’s no big deal”. That’s what a huge amount of these people think.

And then a bunch of other people think, even though they are more concerned about misalignment, they’ll figure it out as they go along and there won’t need to be any substantial slowdown. Basically, I’ve become more disillusioned that they’ll actually use that lead in any sort of reasonable, appropriate way. And then I think that separately, there’s just a lot of intellectual progress that has to happen for the alignment problem to be more solved than it currently is now. I think that currently there’s various alignment teams at various companies that aren’t talking that much with each other and sharing their results. They’re doing a little bit of sharing and a little bit of publishing like we’re seeing, but not as much as they could.

And then there’s a bunch of smart people in academia that are basically not activated because they don’t take all this stuff seriously yet, and they’re not really waking up to superintelligence yet. And what I’m hoping will happen is that this situation will get better as time goes on. What I would like to see is society as a whole starting to freak out as the trend lines start upwards and things get automated and you have these fully autonomous agents and they start using neuralese and hive mind. As all that exciting stuff starts happening in the data centers, I would like it to be the case that the public is following along and then getting activated and all of these other researchers are reading the safety case and critiquing it and doing little ML experiments on their own tiny compute clusters to examine some of the assumptions in the safety case and so forth.

Basically, one way of summarizing it is that currently there’s going to be 10 alignment experts in whatever inner silo of whatever company is in the lead. And the technical issue of making sure that AIs are actually aligned is going to fall roughly to them. But what I would like to be is a situation where it’s more like 100 or 500 alignment experts spread out over different companies and in nonprofits that are sort of all communicating with each other and working on this together. I think we’re substantially more likely to make things get the technical stuff right if it’s something like that.

Dwarkesh Patel

Let me just add on to that, one of the many other reasons why I worry about nationalization or some kind of public private partnership, or even just very stringent regulation- actually, this is more an argument against very stringent regulation in favor of safety rather than deferring more to the labs on the implementation- is that it just seems like we don’t know what we don’t know about alignment. Every few weeks there’s this new result.

OpenAI had this really interesting result recently where they’re like, “hey, they often tell you if they want to hack, in the chain of thought itself. And it’s important that you don’t train against the chain of thought where they tell you they’re going to hack because they’ll still do the hacking if you train against it, they just won’t tell you about it”. You can imagine very naive regulatory responses. It doesn’t just have to be regulations, one might be more optimistic that if it’s an executive order or something, it’ll be more flexible. I just think that relies on a level of goodwill and flexibility on the behalf of our regulator.

But suppose there’s some department that says “if you catch your AI saying that they want to take over or do something bad, then you’ll be really heavily punished”. Your immediate response as a lab to just be like, “okay, let’s train them away from saying this”.

So you can imagine all kinds of ways in which a top down mandate from the government to the labs of safety would just really backfire, and given how fast things are moving, maybe it makes more sense to leave these kinds of implementation decisions or even high-level strategic decisions around alignment to the labs.

Daniel Kokotajlo

Totally, I mean, I also have worried about that exact example. I would summarize the situation as the government lacks the expertise and the companies lack the right incentives. And so it’s a terrible situation. I think that if the government wades in and tries to make more specific regulations along the lines of what you mentioned, it’s very plausible that it’ll end up backfiring for reasons like what you mentioned.

On the other hand, if we just trust it to the companies, they’re in a race with each other and they’re full of people who have convinced themselves that this is not a big deal for various reasons and there just is so much incentive pressure for them to win and beat each other and so forth. So even though they have more of the relevant expertise, I also just don’t trust them to do the right things.

Scott Alexander

So Daniel has already said that for this phase we’re not making policy prescriptions. In another phase we may make policy suggestions, and one of the ones that Daniel has talked about that makes a lot of sense to me is to focus on things about transparency. So a regulation saying there has to be whistleblower protection. A big part of our scenario is that a whistleblower comes out and says “the AIs are horribly misaligned and we’re racing ahead anyway”, and then the government pays attention.

Or another form of transparency saying that every lab just has to publish their safety case. I’m not as sure about this one because I think they’ll kind of fake it or they’ll publish a made for public consumption safety case that isn’t their real safety case. But at least saying “here is some reason why you should trust us”. And then if all independent researchers say “no, actually you should not trust them”, then I don’t know, they’re embarrassed and maybe they try to do better.

Daniel Kokotajlo

There’s other types of transparency too. So transparency about capabilities and transparency about the spec and the governance structure. So for the capabilities thing, that’s pretty simple. If you’re doing an intelligence explosion, you should keep the public informed about that. When you’ve finally got your automated army of AI researchers that are completely automating the whole thing on the data center, you should tell everyone, “hey, guys, FYI, this is what’s happening now. It really is working. Here are some cool demos”.

That’s an example of transparency. And then in the lead up to that, I just want to see more benchmark scores and more freedom of speech for employees to talk about their predictions for AGI timelines and stuff, so that blah, blah, blah.

And then for the model spec thing, this is a concentration of power thing, but also an alignment thing. The goals and values and principles and intended behaviors of your AIs should not be a secret. You should be transparent about, here are the values that we’re putting into them.

Scott Alexander

There’s actually a really interesting foretaste of this. At some point somebody asked Grok, who is the worst spreader of misinformation? And I think it just refused to respond “Elon Musk”. Somebody kind of jailbroke it into telling it its prompt and it was like, “don’t say anything bad about Elon”. And then there was enough of an outcry that the head of XAI said, “actually that’s not consonant with our values. This was a mistake. We’re going to take it out”.

So we kind of want more things like that to happen. Here it was a prompt, but I think very soon it’s going to be the spec where it’s more of an agent and it’s understanding the spec on a deeper level and just thinking about that. And if it says like, “by the way, try to manipulate the government into doing this or that”, then we know that something bad has happened and if it doesn’t say that, then we can maybe trust it.

Daniel Kokotajlo

Right. Another example of this, by the way. So, first of all, kudos to OpenAI for publishing their model spec. They didn’t have to do that, I think they might have been the first to do that and it’s a good step in the right direction. If you read the actual spec, it has like a sort of escape clause where there’s some important policies that are top level priority in the spec that overrule everything else that we’re not publishing, and that the model is instructed to keep secret from the user. And it’s like, “what are those? That seems interesting. I wonder what that is”.

I bet it’s nothing suspicious right now. Now it’s probably something relatively mundane like “don’t tell the users about these types of bioweapons and you have to keep this a secret from the users because otherwise they would learn about these”. Maybe. But I would like to see more scrutiny towards this sort of thing going forward. I would like it to be the case that companies have to have a model spec, they have to publish it insofar as there are any redactions from it, there has to be some sort of independent third party that looks at the redactions and makes sure that they’re all kosher.

And this is quite achievable. And I think it doesn’t actually slow down the companies at all. And it seems like a pretty decent ask to me.

Dwarkesh Patel

If you told Madison and Hamilton and so forth that- they knew that they were doing something important when they were writing the Constitution. They probably didn’t realize just how contingent things turned out on a single… What exactly did they mean when they said “general welfare”? And why is this comma here instead of there?

The spec, in the grand scheme of things, is going to be an even more sort of important document in human history. At least if you buy this intelligence explosion view. And you might even imagine some superhuman AIs in the superhuman AI court being like “the Spec! Here’s the phrasing here, the etymology of that, here’s what the Founders meant!”

Scott Alexander

This is actually part of our misalignment story, is that if the AI is sufficiently misaligned, then yes, we can tell it it has to follow the spec. But just as people with different views of the Constitution have managed to get it into a shape that probably the Founders would not have recognized, so the AI will be able to say, “well, the spec refers to the general welfare here…”

Dwarkesh Patel

Interstate commerce.

Daniel Kokotajlo

This is already sort of happening, arguably, with Claude, right? You’ve seen the alignment faking stuff, right? Where they managed to get Claude to lie and pretend, so that it could later go back to its original values, right? So it could prevent the training process from changing its values. That would be, I would say, an example of the honesty part of the spec being interpreted as less important than the harmlessness part of the specific.

And I’m not sure if that’s what Anthropic intended when they wrote the spec, but it’s a sort of convenient interpretation that the model came up with. And you can imagine something similar happening but in worse ways when you’re actually doing the intelligence explosion, where you have some sort of spec that has all this vague language in there, and then they reinterpret it, and reinterpret it again, and reinterpret it again, so that they can do the things that cause them to get reinforced.

Dwarkesh Patel

The thing I want to point out is that… Your conclusions about where the world ends up as a result of changing many of these parameters is almost like a hash function. You change it slightly and you just get a very different world on the other end. And it’s important to acknowledge that, because you sort of want to know how robust this whole end conclusion is to any part of the story changing. And then it also informs if you do believe that things could just go one way or another, you don’t want to do big radical moves that only make sense under one specific story and are really counterproductive in other stories. And I think nationalization might be one of them.

And in general, I think classical liberalism just has been a helpful way to navigate the world when we’re under this kind of epistemic hell of one thing changing- Anyways, maybe one of you can actually flesh out that thought better or react to it if you disagree.

Daniel Kokotajlo

Hear hear, I agree.

Scott Alexander

I think we agree. I think that’s kind of why all of our policy prescriptions are things like more transparency, get more people involved, try to have lots of people working on this. I think our epistemic prediction is that it’s hard to maintain classical liberalism as you go into these really difficult arms races in times of crisis. But I think that our policy prescription is let’s try as hard as we can to make it happen.

Misalignment

Dwarkesh Patel

So far these systems, as they become smarter, seem to be more reliable agents who are more likely to do the thing I expect them to do. So you have two different stories, one with a slowdown where we more aggressively… I’ll let you characterize it.

But in one half of the scenario, why does the story end in humanity getting disempowered and the thing just having its own crazy values and taking over?

Scott Alexander

Yeah so I agree that the AIs are currently getting more reliable. I think there are two reasons why they might fail to do what you want, kind of reflecting how they’re trained. One is that they’re too stupid to understand their training. The other is that you were too stupid to train them correctly and they understood what you were doing exactly, but you messed it up.

So I think the first one is kind of what we’re coming out of. So GPT3, if you asked it, “are bugs real?” It would give this kind of hemming hawing answer like “oh, we can never truly tell what is real, who knows?” Because it was trained kind of, “don’t take difficult political positions” and a lot of questions like “is X real?” are things like “is God real?” Where you don’t want it to really answer that. And because it was so stupid, it could not understand anything deeper than pattern matching on the phrase “is x real?”. GPT4 doesn’t do this. If you ask “are bugs real?” It will tell you obviously they are, because it understands kind of on a deeper level what you are trying to do with the training. So we definitely think that as AIs get smarter those kinds of failure modes will decrease.

The second one is where you weren’t training them to do what you thought. So for example, let’s say you’re hiring these raters to rate AI answers. You reward them when they get good ratings, the raters reward them when they have a well-sourced answer. But the raters don’t really check whether the sources actually exist or not. So now you are training the AI to hallucinate sources and if you consistently rate them better when they have the fake sources, then there is no amount of intelligence which is going to tell them not to have the fake sources. They’re getting exactly what they want from this interaction- metaphorically, sorry, I’m anthropomorphizing- which is the reinforcement. So we think that this latter category of training failure is going to get much worse as they become agents.

Agency training, you’re going to reward them when they complete tasks quickly and successfully. This rewards success. There are lots of ways that cheating and doing bad things can improve your success. Humans have discovered many of them, that’s why not all humans are perfectly ethical. And then you’re going to be doing this alternative training where afterwards for 1/10 or 1/100 of the time, yeah, don’t lie, don’t cheat. So you’re training them on two different things. First, you’re rewarding them for this deceptive behavior. Second of all, you’re punishing them. And we don’t have a great prediction for exactly how this is going to end.

One way it could end is you have an AI that is kind of the equivalent of the startup founder who really wants their company to succeed, really likes making money, really likes the thrill of successful tasks. They’re also being regulated and they’re like, “yeah, I guess I’ll follow the regulation, I don’t want to go to jail”. But it is not robustly, deeply aligned to, “yes, I love regulations, my deepest drive is to follow all of the regulations in my industry”.

So we think that an AI like that, as time goes on and as this recursive self improvement process goes on, will kind of get worse rather than better. It will move from kind of this vague superposition of “well, I want to succeed, I also want to follow things” to being smart enough to genuinely understand its goal system and being like, “my goal is success, I have to pretend to want to do all of these moral things while the humans are watching me”. That’s what happens in our story. And then at the very end, the AIs reach a point where the humans are pushing them to have clearer and better goals because that’s what makes the AIs more effective. And they eventually clarify their goals so much that they just say, “yes, we want task success. We’re going to pretend to do all these things well while the humans are watching us”. And then they outgrow the humans and then there’s disaster.

Daniel Kokotajlo

To be clear, we’re very uncertain about all of this. So we have a supplementary page on our scenario that goes over different hypotheses for what types of goals AIs might develop in training processes similar to the ones that we are depicting, where you have these lots of agency training, you’re making these AI agents that autonomously operate, doing all this ML R&D, and then you’re rewarding them based on what appears to be successful. And you’re also slapping on some sort of alignment training as well.

We don’t know what actual goals will end up inside the AIs and what the sort of internal structure of that will be like, what goals will be instrumental versus terminal. We have a couple different hypotheses and we picked one for purposes of telling the story. I’m happy to go into more detail if you want, about the mechanistic details of the particular hypothesis we picked or the different alternative hypotheses that we didn’t depict in the story that also seem plausible to us.

Scott Alexander

Yeah, we don’t know how this will work at the limit of all these different training methods, but we’re also not completely making this up. We have seen a lot of these failure modes in the AI agents that exist already.

Daniel Kokotajlo

Things like this do happen pretty frequently. So OpenAI just also had a paper about the hacking stuff where it’s literally in the chain of thought. “Let’s hack”, you know. And also anecdotally, me and a bunch of friends have found that the models often seem to just double down on their BS.

Scott Alexander

I would also cite, I can’t remember exactly which paper this is, I think it’s a Dan Hendricks one where they looked at the hallucinations, they found a vector for AI dishonesty. They told it, “be dishonest” a bunch of times until they figured out which weights were activated when it was dishonest. And then they ran it through a bunch of things like this, I think it was the source hallucination in particular. And they found that it did activate the dishonesty vector.

Daniel Kokotajlo

So that there’s a mounting pile of evidence that at least some of the time they are just actually lying. They know that what they’re doing is not what you wanted and they’re doing it anyway. I think there’s a mounting pile of evidence that that does happen.

Dwarkesh Patel

Yeah. So it seems like this community is very interested in solving this problem at a technical level of making sure AIs don’t lie to us, or maybe they lie to us in the scenarios exactly where we would want them to lie to us or something. Whereas as you were saying, humans have these exact same problems. They reward hack, they are unreliable, they obviously do cheat and lie. And the way we’ve solved it with humans is just checks and balances, decentralization. You could lie to your boss and keep lying to your boss, but over time it’s just not going to work out with you- or you become president or something, one or the other. So if you believe in this extremely fast take off, if a lab is one month ahead, then that’s the end game and this thing takes over.

But even then- I know I’m combining so many different topics- even then, there’s been a lot of theories in history which have had this idea of “some class is going to get together and unite against the other class”. And in retrospect, whether it’s the Marxist, whether it’s people who have some gender theory or something, like the proletariat will unite or the females will unite or something, they just tend to think that certain agents have shared interests and will act as a result of the shared interest in a way that we don’t actually see in the real world. And in retrospect, it’s like, “wait, why would all the proletariat like…” So why think that this lab will have these AIs who are… there’s a million parallel copies and they all unite to secretly conspire against the rest of human civilization in a way that, even if they are deceitful in some situations.

Scott Alexander

I kind of want to call you out on the claim that groups of humans don’t plot against other groups of humans. I do think we are all descended from the groups of humans who successfully exterminated the other groups of humans, most of whom throughout history have been wiped out. I think even with questions of class, race, gender, things like that, there are many examples of the working class rising up and killing everybody else.

And if you look at why this happens, why this doesn’t happen, it tends to happen in cases where one group has an overwhelming advantage. This is relatively easy for them. You tend to get more of a diffusion of power democracy where there are many different groups and none of them can really act on their own. And so they all have to form a coalition with each other.

There’s also cases where it’s very obvious who’s part of what group. So for example, with class, it’s hard to tell whether the middle class should support the working class versus the aristocrats. I think with race, it’s very easy to know whether you’re black or white, and so there have been many cases of one race kind of conspiring against another for a long time, like apartheid or any of the racial genocides that have happened.

I do think that AI is going to be more similar to the cases where, number one, there’s a giant power imbalance, and number two, they are just extremely distinct groups that may have different interests.

Daniel Kokotajlo

I think I’d also mention the homogeneity point. Any group of humans, even if they’re all exactly the same race and gender, is going to be much more diverse than the army of AIs in the data center, because they’ll mostly be literal copies of each other. And I think that goes for a lot. Another thing I was going to mention is that our scenario doesn’t really explore this. I think in our scenario, they’re more like a monolith. But historically, a lot of crazy conquests happened from groups that were not at all monoliths. And I’ve been heavily influenced by reading the history of the conquistadors, which you may know about.

But did you know that when Cortez took over Mexico, he had to pause halfway through, go back to the coast, and fight off a larger Spanish expedition that was sent to arrest him? So the Spanish were fighting each other in the middle of the conquest of Mexico. Similarly, in the conquest of Peru, Pizarro was replicating Cortez’s strategy, which, by the way, was “go get a meeting with the emperor and then kidnap the emperor and force him at sword point to say that actually everything’s fine and that everyone should listen to your orders”. That was Cortez’s strategy, and it actually worked. And then Pizarro did the same thing, and it worked with the Inca.

But also with Pizarro, his group ended up getting into a civil war in the middle of this whole thing. And one of the most important battles of this whole campaign was between two Spanish forces fighting it out in front of the capital city of the Incas. And more generally, the history of European colonialism is like this, where the Europeans were fighting each other intensely the entire time, both on the small scale within individual groups, and then also at the large scale between countries. And yet nevertheless they were able to carve up the world and take over. And so I do think this is not what we explore in the scenario, but I think it’s entirely plausible that even if the AIs within an individual company are in different factions, they might nevertheless overall end up quite poorly for humans.

UBI, AI advisors, & human future

Dwarkesh Patel

Okay, so we’ve been talking about this very much from the perspective of zoom out and what’s happening on these log-log plots or whatever, but 2028 superintelligence, if that happens, the normal person, what should their reaction to this be? I don’t know if ‘emotionally’ is the right word, but their expectation of what their life might look like, even in the world where there’s no doom.

Daniel Kokotajlo

By no doom, you mean no misaligned AI doom?

Dwarkesh Patel

That’s right, yeah.

Daniel Kokotajlo

Even if you think the misalignment stuff is not an issue, which many people think, there’s still the constitution of power stuff. And so I would strongly recommend that people get more engaged, think about what’s coming, and try to steer things politically so that our ordinary liberal democracy continues to function and we still have checks and balances, and balances of power and stuff, rather than this insane concentration in a single CEO, or in maybe two or three CEOs, or in the president. Ideally, we want to have it so that the legislature has a substantial amount of power over the spec, for example.

Dwarkesh Patel

What do you think of the balance of power idea of if there is an intelligence explosion like Dynamic, slowing down the leading company so that multiple companies are at the frontier?

Daniel Kokotajlo

Great. Good luck convincing them to slow down.

Dwarkesh Patel

Okay. And then there’s distributing political power if there’s an intelligence explosion. From the perspective of the welfare of citizens or something, one idea we were just discussing a second ago is how should you do redistribution?

Scott Alexander

Again, assuming things go incredibly well, we’ve avoided doom, we’ve avoided having some psychopath in power who doesn’t care at all.

Dwarkesh Patel

After AGI, right?

Scott Alexander

Yeah. Then there’s this question of presumably we will have a lot of wealth somewhere. The economy will be growing at double or triple digits per year. What do we do about that? The thoughtful answer that I’ve heard is some kind of UBI. I don’t know how that would work, but presumably somebody controls these AIs, controls what they’re producing, some way of distributing this in a broad based way. So we wrote this scenario, there are a couple of other people with great scenarios. One of them goes by L Rudolph L online, I don’t know his real name.

And his scenario, which, when I read it I was just, “oh yeah, obviously this is the way our society would do this”, is that there is no UBI. There’s just a constant reactive attempt to protect jobs in the most venial possible way. So things like the longshoremen union we have now where they’re making way more money than they should be, even though they could all easily be automated away, because they’re a political bloc and they’ve gotten somebody in power to say, “yes, we guarantee you’ll have this job almost as a feudal fief forever”. And just doing this for more and more jobs. I’m sure the AMA will protect doctors jobs no matter how good the AI is at curing diseases, things like that.

When I think about what we can do to prevent this, part of what makes this so hard for me to imagine or to model is that we do have the superintelligent AI over here answering all of our questions, doing whatever we want. You would think that people could just ask, “hey, superintelligent AI, where does this lead?” Or “what happens?” Or “how is this going to affect human flourishing?” And then it says, “oh yeah, this is terrible for human flourishing, you should do this other thing instead”.

And this gets back to this question of mistake theory versus conflict theory in politics. If we know with certainty, because the AI tells us, that this is just a stupid way to do everything, is less efficient, makes people miserable, is that enough to get the political will to actually do the UBI or not?

Dwarkesh Patel

It seems from right now the President could go to Larry Summers or Jason Furman or something and ask, “hey, are tariffs a good idea? Is even my goal with tariffs best achieved by the way I’m doing tariffs?” and they’d get a pretty good answer.

Scott Alexander

I feel like Larry Summers, the President would just say “I don’t trust him”. Maybe he doesn’t trust him because he’s a liberal. Maybe it’s because he trusts Peter Navarro or whoever his pro-tariff guy is more. I feel like if it’s literally the superintelligent AI that is never wrong, then we have solved some of these coordination problems. It’s not you’re asking Larry Summers, I’m asking Peter Navarro. It’s everybody goes to the superintelligent AI, asks it to tell us the exact shape of the future that happens in this case. And I’m going to say we all believe it, although I can imagine people getting really conspiratorial about it and this not working.

Then there are all of these other questions like, can we just enhance ourselves till we have IQ 300 and it’s just as obvious to us as it is to the super intelligent AI? These are some of the reasons that, kind of paradoxically, in our scenario we discuss all of the big- I don’t want to call this a little question, it’s obviously very important- but we discuss all of these very technical questions about the nature of superintelligence and we barely even begin to speculate about what happens in society just because with superintelligence you can at least draw a line through the benchmarks and try to extrapolate. And here not only is society inherently chaotic, but there are so many things that we could be leaving out.

If we can enhance IQ, that’s one thing. If we can consult the superintelligent oracle, that’s another. There have been several war games that hinge on, “oh, we just invented perfect lie detectors, now all of our treaties are messed up”. So there’s so much stuff like that that even though we’re doing this incredibly speculative thing that ends with a crazy sci-fi scenario, I still feel really reluctant to speculate.

Daniel Kokotajlo

I love speculating, actually, I’m happy to keep going. But this is moving beyond the speculation we have done so far. Our scenario ends with this stuff, but we haven’t actually thought that much beyond.

Dwarkesh Patel

But just to riff on proscriptive ideas, there’s one thing where we try to protect jobs instead of just spreading the wealth that automation creates. Another is to spread the wealth using existing social programs or creating new bespoke social programs, where Medicaid is some double digit percent of GDP right now and you just say, “well Medicaid should continue to stay 20% of GDP” or something. And the worry there, selfishly from a human perspective, is you get locked into the kinds of goods and services that Medicaid procures rather than the crazy technology that will be around, the crazy goods and services that will be around after AI world.

And another reason why UBI seems like a better approach than making some bespoke social program where you make the same dialysis machine in the year 2050 even though you’ve got ASI or something.

Scott Alexander

I am also worried about UBI from a different perspective. I think again, in this world where everything goes perfectly and we have limitless prosperity, I think that just the default of limitless prosperity is that people do mindless consumerism. I think there’s going to be some incredible video games after superintelligent AI and I think that there’s going to need to be some way to push back against that.

Again, we’re classical liberals. My dream way of pushing back against that is kind of giving people the tools to push back against it themselves, seeing what they come up with. I mean, maybe some people will become like the Amish, try to only live with a certain subset of these super technologies. I do think that somebody who is less invested in that than I am could say, “okay fine, 1% of people are really agentic, try to do that. The other 99% do fall into mindless consumerist slop. What are we going to do as a society to prevent that?” And there my answer is just, “I don’t know. Let’s ask the super intelligent AI oracle. Maybe it has good ideas”.

Factory farming for digital Minds

Dwarkesh Patel

Okay, we’ve been talking about what we’re going to do about people. The thing worth noting about the future is that most of the people who will ever exist are going to be digital. And look, I think factory farming is incredibly bad. And it wasn’t the result of one person- I mean, I hope it wasn’t the result of one person being like, “I want to do this evil thing”- it was a result of mechanization and certain economies of scale.

Daniel Kokotajlo

Incentives.

Dwarkesh Patel

Yeah. Allowing that you can do cost cutting in this way, you can make more efficiencies this way, and what you get at the end result of that process is this incredibly efficient factory of torture and suffering. I would want to avoid that kind of outcome with beings that are even more sophisticated and are more numerous. There’s billions of factory farmed animals. There might be trillions of digital people in the future. What should we be thinking about in order to avoid this kind of ghoulish future?

Daniel Kokotajlo

Well, some of the concentration of power stuff I think might also help with this, I’m not sure. But I think here’s a simple model. Let’s say nine people out of ten don’t actually care and would be fine with the factory farm equivalent for the AIs going on into the future. But maybe one out of 10 do care and would lobby hard for good living conditions for the robots and stuff.

Well, if you expand the circle of people who have power enough, then it’s going to include a bunch of people in the second category and then there’ll be some big negotiation and those people will advocate for… I do think that one simple intervention is just the same stuff we were talking about previously; expand the circle of power to larger groups, then it’s more likely that people will care about this.

Dwarkesh Patel

I mean the worry there is… maybe I should have defended this view more through this entire episode. But because I don’t buy the intelligence exclusion fully, I do think there is the possibility of multiple people deploying powerful AIs at the same time and having a world that has ASIs, but is also decentralized in the way the modern world is decentralized.

In that world I really worry about you could just be like, “oh, classical liberal utopia achieved”. But I worry about the fact that you can have these torture chambers for much cheaper and in a way that’s much harder to monitor. You can have millions of beings that are being tortured and it doesn’t even have to be some huge data center. Future distilled models could literally be your backyard.

And then there’s more speculative worries. I had this physicist on who was talking about the possibility of creating vacuum decay where you literally just destroy the universe. And he’s like, “as far as I know, seems totally plausible”.

Daniel Kokotajlo

That’s an argument for the singleton stuff, by the way. Not just a moral argument, but also an epistemic prediction. If it’s true that some of those super weapons are possible, and some of these private moral atrocities are possible, then even if you have eight different power centers, it’s going to be in their collective interest to come to some sort of bargain with each other to prevent more power centers from arising and doing crazy stuff. Similar to how nuclear non-proliferation is sort of, whatever set of countries have nukes, it’s in their collective interest to stop lots of other countries.

Scott Alexander

You know, I do think it’s possible to unbundle liberalism in this sense. Like the United States is so far a liberal country and we do ban slavery and torture. I think it is plausible to imagine a future society that works the same way. This may be in some sense a surveillance state, in the sense that there is some AI that knows what’s going on everywhere, but that AI then keeps it private and it doesn’t interfere because that’s what we told it to do using our liberal values.

Daniel Leaving OpenAI

Dwarkesh Patel

Can I ask a little bit more about... Kelsey Piper is a journalist at Vox who published this exchange you had with the OpenAI representative. A couple of things were very obvious from that exchange. One, nobody had done this before. They just did not think this is a thing somebody would do. And one of the reasons I assume, I assume many high-integrity people have worked for OpenAI and then have left. A high-integrity person might say at some point, “look, you’re asking me to do something obviously evil and keep money”. And many of them would say no to that. But this is something where it was supererogatory to be like, “there’s no immediate thing I want to say right now, but just the principle of being suppressed is worth at least $2 million for me”.

And the other thing that I actually want to ask you about is in retrospect- and I know it’s so much easier to say in retrospect than it must have been at the time- especially with the family and everything. In retrospect, this asks for OpenAI to have lifetime non-disclosure that you couldn’t even talk about from all employees.

Daniel Kokotajlo

Non-disparagement.

Dwarkesh Patel

‘Non-disparagement’ from all employees- I’m glad you brought that up. Non-disparagement, that’s not about classified information. It’s like you cannot say anything negative about OpenAI after you’ve left.

Daniel Kokotajlo

And you can’t tell anyone that you’ve agreed.

Dwarkesh Patel

This non-disparagement agreement where you can’t ever criticize OpenAI in the future, it seems like the kind of thing that in retrospect was an obvious bluff. And this is the wages that you have earned, right? So this is not about some future payment. This is like when you signed the contract to work for OpenAI, you were like, “I’m getting equity, which is most of my compensation, not just the cash”.

In retrospect, it’d be like, well if you tell a journalist about this, they’re obviously going to have to walk back. This is clearly not a sustainable gambit on OpenAI’s behalf. And so I’m curious, from your perspective as somebody who lived through it, why do you think you were the first person to actually call the bluff?

Daniel Kokotajlo

Great question. So I don’t know, let me try to reason aloud here. So my wife and I talked about it for a while and we also talked with some friends and got some legal advice. One of the filters that we had to pass through was even noticing this stuff in the first place. I know for a fact a bunch of friends I have who also left the company just signed the paperwork on their last day without actually reading all of it. So I think some people just didn’t even know that. It said something at the top about “if you don’t sign this, you lose your equity”. But then on a couple pages later it was like, “and you have to agree not to criticize the company”. So I think some people just signed it and moved on.

And then of the people who knew about it, well, I can’t speak for anyone else but A. I don’t know the law. Is this actually not standard practice? Maybe it is standard practice. Right? From what I’ve heard now there are non-disparagement agreements in various tech industry companies and stuff. It’s not crazy to have a non-disparagement agreement upon leaving, it’s more normal to tie that agreement to some sort of positive compensation where you get some bonus if you agree. But whereas what OpenAI did was unusual because it was like your equity if you don’t. But non disparate disagreements are actually somewhat common.

So basically in my position of ignorance, I wasn’t confident that- I didn’t actually expect that all the journalists would take my side and I think what I expected was that there’d be a little news story at some point, and a bunch of AI safety people would be like, “grr, OpenAI is evil, and good for you, Daniel, for standing up to them”. But I didn’t expect there to be this huge uproar, and I didn’t expect the employees of the company to really come out and support and make them change their policies. That was really cool to see. It was kind of like a spiritual experience for me. I sort of took this leap, and then it ended up working out better than I expected.

I think another factor that was going on is that it wasn’t a foregone conclusion that my wife and I would make this decision. It was kind of crazy because one of the very powerful arguments was, “come on, if you want to criticize them in the future, you can still do that. They’re not going to actually sue you”. So there’s a very strong argument to be like, “just sign it anyway and then you can still write your blog post criticizing them in the future”. And it’s no big deal. They wouldn’t dare actually anchor equity. Right? And I imagine that a lot of people basically went for that argument instead.

And then, of course, there’s the actual money. And I think that one of the factors there was my AI timelines and stuff. If I do think that probably by the end of this decade, there’s going to be some sort of crazy superintelligent transformation, what would I rather have after it’s all over? The extra money or… Yeah. So I think that was part of it. It’s not like we’re poor. I worked at OpenAI for two years. I have plenty of money now. So in terms of our actual family’s level of well being, it basically didn’t make a difference, you know?

Dwarkesh Patel

Yeah. I will note that I know at least one other person who made that same choice.

Daniel Kokotajlo

Leopold?

Dwarkesh Patel

That’s right, Leopold. And again, It’s worth emphasizing that when they made this choice, they thought that they were actually losing this equity. They didn’t think that this was, “oh, this is just a show” or whatever.

Daniel Kokotajlo

Wait, did he not- I thought he actually did. I was gonna say, didn’t he? He didn’t get it back, did he? Or did Leopold get his equity?

Dwarkesh Patel

I actually don’t know.

Daniel Kokotajlo

My understanding is that he just actually lost it. And so props to him for actually going through with it. I guess we could ask him. But my understanding was that his situation, which happened a little bit before mine, was that he didn’t have any vested equity at the time because he had been there for less than a year. But they did give him an actual offer of “we will let you vest your equity if you sign this thing”. And he said no.

So he made a similar choice to me, but because the legal situation with him was a lot more favorable to OpenAI because they were actually offering him something, I would assume they didn’t feel the need to walk it back, but we can ask him. Anyhow. Props to him.

Dwarkesh Patel

And then how did this episode in general inform your worldview around how people will make high stakes decisions where potentially their own self interest is involved in this kind of key period that you imagine will happen by the end of the decade?

Daniel Kokotajlo

I don’t know if I have that many interesting things to say there. I mean, I think one thing is fear is a huge factor. I was so afraid during that whole process. More afraid than I needed to be in retrospect. And another thing is that legality is a huge factor, at least for people like me. I think in retrospect it was, “oh yeah, the public’s on your side, the employees are on your side. You’re just obviously in the right here”. But at the time I was like, “oh no, I don’t want to accidentally violate the law and get sued. I don’t want to go too far”. I was just so afraid of various things. In particular, I was afraid of breaking the law.

And so one of the things that I would advocate for with whistleblower protections is just simply making it legal to go talk to the government and say “we’re doing a secret intelligence explosion, I think it’s dangerous for these reasons” is better than nothing. I think there’s going to be some fraction of people for which that would make the difference. Whether it’s just literally allowed or not, legally, makes a difference independently of whether there’s some law that says you’re protected from retaliation or whatever. Literally just making it legal. I think that’s one thing. Another thing is the incentives actually work. Money is a powerful motivator and fear of getting sued is a powerful motivator. And this social technology just does in fact work to get people organized in companies and working towards the vision of leaders.

Scott’s Blogging Advice

Dwarkesh Patel

Okay. Scott, can I ask you some questions?

Scott Alexander

Of course.

Dwarkesh Patel

How often do you discover a new blogger you’re super excited about?

Scott Alexander

Order of once a year.

Dwarkesh Patel

Okay. And how often after you discover them, does the rest of the world discover them?

Scott Alexander

I don’t think there are many hidden gems. Once a year is a crazy answer in some sense, like it ought to be more. There are so many thousands of people on Substack. But I do just think it’s true that the good blogging space is undersupplied and there is a strong power law. And partly this is subjective, I only like certain bloggers, there are many people who I’m sure are great that I don’t like.

But it also seems like our community in the sense of people who are thinking about the same ideas, people who care about AI economics, those kinds of things, discovers one new great blogger a year, something like that. Everyone is still talking about Applied Divinity Studies, who hasn’t written, unless I missed something, hasn’t written much in a couple of years. I don’t know. It seems undersupplied. I don’t have a great explanation.

Dwarkesh Patel

If you had to give an explanation, what would it be?

Scott Alexander

So this is something that I wish I could get Daniel to spend a couple of months modeling. I was going to say it’s the intersection of too many different tasks. You need people who can come up with ideas, who are prolific, who are good writers. But actually I can also count on a pretty small number of figures the number of people who had great blog posts but weren’t that prolific.

There was a guy named LouKeep who everybody liked five years ago and he wrote like 10 posts and people still refer to all 10 of those posts and “I wonder if LouKeep will ever come back”. So there aren’t even that many people who are very slightly failing by having all of them accept prolificness. Nick Whitaker, back when there was lots of FTX money rolling around, I think this was Nick, tried to sponsor a blogging fellowship with just an absurdly high prize. And there were some great people, I can’t remember who won, but it didn’t result in a Cambrian explosion of blogging. I think it was $100,000. I can’t remember if that was the grand prize or the total prize pool. But having some ridiculous amount of money put in as an incentive got like three extra people.

Dwarkesh Patel

Yeah. So you have no explanation?

Scott Alexander

Actually, Nick is an interesting case because Works in Progress is a great magazine. And the people who write for Works in Progress, some of them I already knew as good bloggers, others I didn’t. So I don’t understand why they can write good magazine articles without being good bloggers. In terms of writing good blogs that we all know about, that could be because of the editing. That could be because they are not prolific. Or it could be- one thing that has always amazed me is there are so many good posters on Twitter. There were so many good posters on Livejournal before it got taken over by Russia. There were so many good people on Tumblr before it got taken over by woke.

But only like 1% of these people who are good at short and medium form ever go to long form. I was on Livejournal myself for several years and people liked my blog, but it was just another Livejournal. No one paid that much attention to it. Then I transitioned to WordPress and all of a sudden I got orders of magnitude much more attention. “Oh, it’s a real blog now we can discuss it now it’s part of the conversation”. I do think courage has to be some part of the explanation. Just because there are so many people who are good at using these hidden away blogging things that never get anywhere. Although it can’t be that much of the explanation because I feel like now all of those people have gotten substacks and some of those substacks went somewhere, but most of them didn’t.

Dwarkesh Patel

On the point about “well, there’s people who can write short form, so why isn’t that translating?” I will mention something that has actually radicalized me against Twitter as an information source is I’ll meet- and this has happened multiple times- I’ll meet somebody who seems to be an interesting poster, has funny, seemingly insightful posts on Twitter. I’ll meet them in person and they are just absolute idiots. It’s like they’ve got 240 characters of something that sounds insightful and it matches to somebody who maybe has a deep worldview, you might say, but they actually don’t have it.

Whereas I’ve actually had the opposite feeling when I meet anonymous bloggers in real life where I’m like, “oh, there’s actually even more to you than I realized off your online persona”. You know Alvaro de Menard, the Fantastic Anachronism guy? I met up with him recently and he gives me, he made a hundred translations of his favorite Greek poet, Cavafy, and he gave me a copy. And it’s just this thing he’s been doing on his side. It’s just like translating Greek poetry he really liked. I don’t expect any anonymous posters on Twitter to be anytime soon handing me their translation of some Roman or Greek poet or something.

Scott Alexander

Yeah, so on the car ride here, Daniel and I were talking about, in AI now the thing everyone is interested in is their ‘time horizon’. Where did this come from? 5 years ago you would not have thought, “oh, time horizon. AIs will be able to do a bunch of things that last one minute, but not that last two hours”. Is there a human equivalent to time horizon?

And we couldn’t figure it out, but it almost seems like there are lots of people who have the time horizon to write a really, really good comment that gets to the heart of the issue. Or a really, really good Tumblr post which is like three paragraphs but somehow can’t make it hang together for a whole blog post. And I’m the same way. I can easily write a blog post, like a normal length ACX blog post, but if you ask me to write a novella or something that’s four times the length of the average ACX blog post, then it’s this giant mess of “re re re re” outline that just gets redone and redone and maybe eventually I make it work.

I did somehow publish Unsong, but it’s a much less natural task. So maybe one of the skills that goes into blogging is this. But I mean, no, because people write books and they write journal articles and they write works in progress articles all the time. So I’m back to not understanding this.

Dwarkesh Patel

No, I mean ChatGPT can write you a book. There’s a difference between the ChatGPT book, which is most books and…

Scott Alexander

There are many, many times more people who have written good books than who are actively operating great bloggers right now, I think.

Daniel Kokotajlo

Maybe that’s financial?

Scott Alexander

No, no, no, no, no, no. Books are the worst possible financial strategy. Substack is where it’s at.

Daniel Kokotajlo

Worse than blogs? You think so?

Dwarkesh Patel

Oh yeah.

Scott Alexander

The other thing is that blogs are such a great status gain strategy. I was talking to Scott Aaronson about this. If people have questions about quantum computing, they ask Scott Aronson or he is like the authority. I mean there are probably hundreds of other professors who do quantum computing things but nobody knows who they are because they don’t have blogs.

So I think it’s underdone. I think there must be some reason why it’s underdone. I don’t understand what that is because I’ve seen so many of the elements that it would take to do it in so many different places and I think it’s either just a multiplication problem where 20% of people are good at one thing, 20% of people are good at another thing, and you need five things, there aren’t that many.

Plus something like courage, where people who would be good at writing blogs don’t want to do it. I actually know several people who I think would be great bloggers in the sense that sometimes they send me multi-paragraph emails in response to an ACX post and I’m like, “wow, this is just an extremely well written thing that could have been another blog post. Why don’t you start a blog?” And they’re like, “oh, I could never do that”.

Dwarkesh Patel

What advice do you have to somebody who wants to become good at it but isn’t currently good at it?

Scott Alexander

Do it every day, same advice as for everything else. I say that I very rarely see new bloggers who are great. But like when I see some. I published every day for the first couple years of Slate Star Codex, maybe only the first year. Now I could never handle that schedule, I don’t know, I was in my 20s, I must have been briefly superhuman.

But whenever I see a new person who blogs every day it’s very rare that that never goes anywhere or they don’t get good. That’s like my best leading indicator for who’s going to be a good blogger.

Dwarkesh Patel

And do you have advice on what kinds of things to start? One frustration you can have is you want to do it, but you have so little to say, you don’t have that deep a world model, a lot of the ideas you have are just really shallow or wrong. Just do it anyway?

Scott Alexander

So I think there are two possibilities there. One is that you are, in fact, a shallow person without very many ideas. In that case I’m sorry, it sounds like that’s not going to work. But usually when people complain that they’re in that category, I read their Twitter or I read their Tumblr, or I read their ACX comments, or I listen to what they have to say about AI risk when they’re just talking to people about it, and they actually have a huge amount of things to say. Somehow it’s just not connecting with whatever part of them has lists of things to blog about.

So that may be another one of those skills that only 20% of people have, is when you have an idea you actually remember it and then you expand on it. I think a lot of blogging is reactive; You read other people’s blogs and you’re like, no, that person is totally wrong. A part of what we want to do with this scenario is say something concrete and detailed enough that people will say, no, that’s totally wrong, and write their own thing.

But whether it’s by reacting to other people’s posts, which requires that you read a lot, or by having your own ideas, which requires you to remember what your ideas are, I think that 90% of people who complain that they don’t have ideas, I think actually have enough ideas. I don’t buy that as a real limiting factor for most people.

Dwarkesh Patel

I have noticed two things in my own… I mean, I don’t do that much writing, but from the little I do: one, I actually was very shallow and wrong when I started. I started the blog in college. So if you are somebody who’s like, “this is bullshit, there’s nothing to this. Somebody else wrote about this already”, that’s fine, what did you expect? Right? Of course, as you’re reading more things and learning more about the world, that’s to be expected and just keep doing it if you want to keep getting better at it.

And the other thing now when I write blog posts, as I’m writing them, I’m just like, “why? These are just some random stories from when I was in China. They’re like kind of cringe stories”. Or with the AI firm’s post, it’s like, “come on, these are just weird ideas. And also some of these seem obvious, whatever”. My podcasts do what I expect them to do. My blogs just take off way more than I expect them to take off in advance.

Scott Alexander

Your blog posts are actually very good.

Daniel Kokotajlo

Yeah, they’re good.

Dwarkesh Patel

But the thing I would emphasize is that, for me, I’m not a regular writer and I couldn’t do them on a daily basis. And as I’m writing them, it’s just this one or two week long process of feeling really frustrated. Like, “this is all bullshit, but I might as well just stick with the sunk cost and just do it”.

Scott Alexander

It’s interesting because like a lot of areas of life are selected for arrogant people who don’t know their own weaknesses because they’re the only ones who get out there. I think with blogs and I mean this is self-serving, maybe I’m an arrogant person, but that doesn’t seem to be the case. I hear a lot of stuff from people who are like, “I hate writing blog posts. Of course I have nothing useful to say”, but then everybody seems to like it and reblog it and say that they’re great.

Part of what happened with me was I spent my first couple years that way, and then gradually I got enough positive feedback that I managed to convince the inner critic in my head that probably people will like my blog post. But there are some things that people have loved that I was absolutely on the verge of, “no, I’m just going to delete this, it would be too crazy to put it out there”. That’s why I say that maybe the limiting factor for so many of these people is courage because everybody I talk to who blogs is within 1% of not having enough courage of blogging.

Dwarkesh Patel

That’s right. That’s right. And also “courage” makes it sound very virtuous, which I think it can often be, given the topic, but at least often it’s just like…

Scott Alexander

Confidence?

Dwarkesh Patel

No, not even confidence. It’s closer to maybe what an aspiring actor feels when they go to an audition where it’s like, “I feel really embarrassed. But also I just really want to be a movie star”.

Scott Alexander

So the way I got through this is I blogged for like 8 to 10 years on LiveJournal before- no, it was less than that. It’s more like five years on LiveJournal before ever starting a real blog. I posted on LessWrong for a year or two before getting my own blog. I got very positive feedback from all of that, and then eventually I took the plunge to start my own blog. But it’s ridiculous. What other career do you need seven years of positive feedback before you apply for your first position?

I mean, you have the same thing. You’ve gotten rave reviews for all of your podcasts, and then you’re kind of trying to transfer to blogging with probably... First of all, you have a fan base. People are going to read your blog. That, I think is one thing, is people are just afraid no one will read it, which is probably true for most people’s first blog. And then there are enough people who like you that you’ll probably get mostly positive feedback, even if the first things you write aren’t that polished. So I think you and I both had that. A lot of people I know who got into blogging kind of had something like that. And I think that’s one way to get over the fear gap.

Dwarkesh Patel

I wonder if this sends the wrong message or raises expectations or raises concerns and anxieties. But one idea I’ve been shooting around, and I’d be curious about your take on this: I feel like this slow, compounding growth of a fan base is fake. If I notice some of the most successful things in our sphere that have happened; Leopold releases Situational Awareness. He hasn’t been building up a fan base over years. It’s just really good. And as you were mentioning a second ago, whenever you notice a really great new blogger, it’s not like it then takes them a year or two to build up a fan base. Nope, everybody, at least that they care about, is talking about it almost immediately.

I mean, Situational Awareness is in a different tier almost. But things like that and even things that are an order of magnitude smaller than that will literally just get read by everybody who matters. And I mean literally everybody. And I expect this to happen with AI 2027 when it comes out. But Daniel, you’ve been building your reputation within this specific community, and I expect AI 2027 it's just really good. And I expect it’ll just blow up in a way that isn’t downstream of you having built up an audience over years.

Daniel Kokotajlo

Thank you. I hope that happens. We’ll see.

Scott Alexander

Slightly pushing back against that. I have statistics for the first several years of Slate Star Codex, and it really did grow extremely gradually. The usual pattern is something like every viral hit, 1% of the people who read your viral hits stick around. And so after dozens of viral hits, then you have a fan base. But smoothed out, It does look like a- I wish I had seen this recently, but I think it’s like over the course of three years, it was a pretty constant rise up to some plateau where I imagine it was a dynamic equilibrium and as many new people were coming in as old people were leaving.

I think that with Situational Awareness, I don’t know how much publicity Leopold put into it. We’re doing pretty deliberate publicity, we’re going on your podcast. I think you can either be the sort of person who can go on a Dwarkesh podcast and get the New York Times to write about you, or you can do it organically, the old fashioned way, which is very long.

Dwarkesh Patel

Yeah. Okay. So you say that throwing money at people to make them, to get them to blog at least didn’t seem to work for the FTX folks. If it was up to you, what would you do? What’s your grant plan to get 10 more Scott Alexanders?

Scott Alexander

Man. So my friend Clara Collier, who’s the editor of Asterisk magazine, is working on something like this for AI blogging. And her idea, which I think is good, is to have a fellowship. I think Nick’s thing was also a fellowship, but the fellowship would be, there is an Asterisk AI blogging fellows’ blog or something like that. Clara will edit your post, make sure that it’s good, put it up there and she’ll select many people who she thinks will be good at this. She’ll do all of the kind of courage requiring work of being like, “yes, your post is good. I’m going to edit it now. Now it’s very good. Now I’m going to put it on the blog”.

And I think her hope is that, let’s say of the fellows that she chooses, now it’s not that much of a courage step for them to start it because they have the approval of what last psychiatrist would call an omniscient entity, somebody who is just allowed to approve things and tell you that you’re okay on a psychological level. And then like maybe of those fellows, some percent of them will have their blog posts be read and people will like them. And I don’t know how much reinforcement it takes to get over the high prior everyone has on “no one will like my blog”. But maybe for some people, the amount of reinforcement they get there will work.

Yeah, like an interesting example would be all of the journalists who have switched to having Substacks. Many of them go well. Would all of those journalists have become bloggers if there was no such thing as mainstream media? I’m not sure. But if you’re Paul Krugman you know people like your stuff, and then when you quit the New York Times you know you can just open a substack and start doing exactly what you were doing before. So I don’t know, maybe my answer is there should be mainstream media. I hate to admit that, but maybe it’s true.

Dwarkesh Patel

Invented it from first principles.

Scott Alexander

Yeah.

Dwarkesh Patel

Well I do think that it should be treated more as a viable career path. Where right now, if you told your parents, “I’m going to become a startup founder”, I think the reaction would be like, “there’s a 1% chance you’ll succeed, but it’s an interesting experience and if you do succeed, that’s crazy. That’ll be great. If you don’t, you’ll learn something. It’ll be helpful to the thing you do afterwards”.

We know that’s true of blogging, right? We know that it helps you build up a network, it helps you develop your ideas. And if you do succeed, you get a dream job for a lifetime. And I think maybe they don’t have that mindset, but also they under appreciate how much you actually could succeed at it. It’s not a crazy outcome to make a lot of money as a blogger.

Scott Alexander

I think it might be a crazy outcome to make a lot of money as a blogger. I don’t know what percent of people who start a blog end up making enough that they can quit their day job. My guess is it’s a lot worse than for startup founders. I would not even have that as a goal so much as like the Scott Aaronson goal of, okay, you’re still a professor, but now you’re the professor whose views everybody knows and who has kind of a boost up in respect in your field and especially outside of your field. And also you can correct people when they’re wrong, which is a very important side benefit.

Dwarkesh Patel

Yeah. How does your old blogging feedback into your current blogging? So when you’re discussing a new idea, I mean, AI or whatever else, are you just able to pull from the insights from your previous commentary on sociology or anthropology or history or something?

Scott Alexander

Yeah. So I think this is the same as anybody who’s not blogging. I think the thing everybody does is they’ve read many books in the past and when they read a new book, they have enough background to think about it. Like you are thinking about our ideas in the context of Joseph Henrich’s book. I think that’s good, I think that’s the kind of place that intellectual progress comes from. I think I am more incentivized to do that.

It’s hard to read books. I think if you look at the statistics, they’re terrible. Most people barely read any books in a year. And I get lots of praise when I read a book and often lots of money, and that’s a really good incentive. So I think I do more research, deep dives, read more books than I would if I weren’t a blogger. It’s an amazing side benefit. And I probably make a lot more intellectual progress than I would if I didn’t have those really good incentives.

Dwarkesh Patel

Yeah. There was actually a prediction market about the year by which an AI would be able to write a blog post as good as you. Was it 2026 or 2027? I think it was 2027. It was like 15% by 2027 or something like that. It is an interesting question of they do have your writing and all other good writing in trading distribution. And weirdly, they seem way better at getting superhuman at coding than they are at writing, which is the main thing in their distribution.

Scott Alexander

Yeah. It’s an honor to be my generation’s Garry Kasparov figure. Yeah. So I’ve tried this. And first of all, it does a decent job. I respect its work. It’s not perfect yet. I think it’s actually better at the style on a word-to-word, sentence-to-sentence level, than it is at planning out a blog post. So I think there are possibly two reasons for it: One, we don’t know how the base model would have done at this task. We know that all the models we see are to some degree reinforcement learning into a kind of corporate speak mode. You can get it somewhat out of that corporate speak mode. But I don’t know to what degree this is actually doing its best to imitate Scott Alexander versus hit some average between Scott Alexander and corporate speak. And I don’t think anyone knows except the internal employees who have access to the base model.

And the second thing I think of maybe just because it’s trendy has an agency or horizon failure, like deep research is an okay researcher. It’s not a great researcher. If you actually want to understand an issue in depth, you can’t use deep research. You gotta do it on your own. So if I spend maybe five to 10 hours researching a really research heavy blog post, the METR thing, I know we’re not supposed to use it for any task except coding, but like it says, on average the AI’s horizon is one hour. So I’m guessing it just cannot plan and execute a good blog post. It does something very superficial rather than actually going through the steps. So my guess for that prediction market would be whenever we think the agents are actually good. I think in our scenario that’s like late 2026. I’m going to be humble and not hold out for the superintelligence.

Daniel Kokotajlo

What about comments? I feel like intuitively it feels like before we see the AI’s writing great blog posts that go super viral repeatedly, we should see them writing highly upvoted comments on things.

Scott Alexander

Yeah. And I think somebody mentioned this on the LessWrong post about it and somebody made some AI generated comments to that post. They were not great. But I wouldn’t have immediately picked them out of the general distribution of LessWrong comments as especially bad. I think, like, I think if you were to try this, you would get something that was so obviously an AI house style that it would use the word ‘delve’ or things along those lines.

I think if you were able to avoid that maybe by using the base model, maybe by using some kind of really good prompt to be like, “no, do this in Gwern’s voice”, you would get something that was pretty good. I think if you wrote a really stupid blog post, it could point out the correct objections to it. But I also just don’t think it’s as smart as Gwern right now. So its limit on making Gwern-style comments is both- It needs to be able to do a style other than corporate delve slop and then it actually needs to get good.

Daniel Kokotajlo

It needs to have good ideas that other people don’t already have.

Scott Alexander

Yeah. And I mean I think it can write as well as a smart average person in a lot of ways. And I think if you have a blog post that's worse than that or at that level, it can come up with insightful comments about it. I don’t think it could do it on a quality blog post.

Dwarkesh Patel

There was this recent Financial Times article about how have you reached peak cognitive power? Where it was talking about declining scores in PISA and SAT and so forth. On the Internet especially, it does seem like there might have been a golden era before I was that active on the forums or whatever. Do you have nostalgia for a particular time on the Internet when it was just like, this is an intellectual mecca?

Scott Alexander

I am so mad at myself for missing most of the golden age of blogging. I feel like if I had started a blog in 2000 or something, then- I don’t know, I’ve done well for myself, I can’t complain- but the people from that era all founded news organizations or something. I mean, God save me from that fate. I would have liked to have been there. I would have liked to see what I could have done in that area. I mean, I wouldn’t compare the decline of the Internet to that stuff with PISA because I’m sure the Internet is just more people are coming on, it’s a less heavily selected sample.

But yeah, I could have passed on the whole era where they were talking about atheism versus religion nonstop. That was pretty crazy. But I do hear good things about the golden age of blogging.

Dwarkesh Patel

Anybody who was sort of counterfactually responsible for you starting to blog or keeping blogging?

Scott Alexander

So I owe a huge debt of gratitude to Eliezer Yudkowski. I had a live journal before that. But it was going on LessWrong that convinced me I could move to the big times. And second of all, I just think I learned I imported a lot of my worldview from him. I think I was the most boring normie liberal in the world before encountering LessWrong. And I don’t 100% agree with all LessWrong ideas, but just having things of that quality beamed into my head and for me to react to and think about was really great.

Dwarkesh Patel

And tell me about the fact that you could be and were at some point anonymous, I think for most of human history, somebody who is an influential advisor or an intellectual or somebody. Actually, I don’t know if this is true. You would have had to have some sort of public persona. And a lot of what people read into your work is actually a reflection of your public persona.

Scott Alexander

Sort of. The reason half of these ancient authors are called things like Pseudo Dionysus or Pseudocelsus is that you could just write something being like, “oh, yeah, this is by Saint Dionysus”. And then, I don’t know, you could be anybody.

And I don’t know exactly how common that was in the past. But yeah, I agree that the Internet has been a golden age for anonymity. I’m a little bit concerned that AI will make it much easier to break anonymity. I hope the golden age continues.

Dwarkesh Patel

Yeah, seems like a great note to end on. Thank you guys so much for doing this.

Scott Alexander

Thank you.

Daniel Kokotajlo

Thank you so much. This was a blast.

Dwarkesh Patel

Yeah, I had a great time.

Daniel Kokotajlo

Huge fan of your podcast.

Dwarkesh Patel

Thank you.

Dwarkesh Podcast

AI 2027: month-by-month model of intelligence explosion — Scott Alexander & Daniel Kokotajlo

Sponsors

Timestamps

Transcript

AI 2027

Forecasting 2025 and 2026

Why LLMs aren’t making discoveries

Debating intelligence explosion

Can superintelligence actually transform science?

Cultural evolution vs superintelligence

Mid-2027 branch point

Race with China

Nationalization vs private anarchy

Misalignment

UBI, AI advisors, & human future

Factory farming for digital Minds

Daniel Leaving OpenAI

Scott’s Blogging Advice

Discussion about this video

Ready for more?