10 Comments
User's avatar
Rahim Nathwani's avatar

I'm not sure this part is true: "But datacenter compute itself doesn’t seem that differentiated (so much so that the hyperscalers seem to be able to easily contract it out to third parties like CoreWeave)".

Aren't Google's TPUs and Groq's LPUs a source of differentiation?

Expand full comment
Rohit Krishnan's avatar

Excellent list of questions. One underrated fact is that we know how to deal with the mistakes humans make, our entire society is built around it. But we don't know how to deal with the mistakes LLMs make, and will need to build structures around it for it to "take over".

To me that's an incredibly important part of the conversation, and a lot of unknown unknowns that you ask about lie at the other side of it.

Expand full comment
Victor Lira's avatar

> It's interesting to me that some of the best and most widely used applications of foundation models have come from the labs themselves (Deep Research, Claude Code, Notebook LM), even though it's not clear that you needed access to the weights in order to build them. Why is this? Maybe you do need access to the weights of frontier models, and the fine tuning APIs or open source models aren’t enough? Or maybe you gotta ‘feel the AGI’ as strongly as those inside the labs do?

As someone who got bit by a mix of dspy and gpt-4 pre lower pricing, I'd wager that unlimited credits and no rate limiting for trying things out are significant advantages when building new products.

Also, distribution is hard and the labs are already at the forefront of early adopters' minds

Expand full comment
near's avatar

Re: the 'idiot savants' question, I would say everyone is using and posttraining the models wrong. It's obvious that a 'chat' interface of one instance of one static LLM where the roles alternate between a human and assistant, and the LLM was just trained to predict the next token, and also you provide basically no context to the LLM whatsoever, is not the best way to elicit new knowledge from them given how transformers work. Where do you expect the new knowledge and thinking to come from if there is a static system with so little entropy being injected into it (and much having been removed from RL, even)!

I'm unsure why there has been so little creativity in this area. It may be we are just moving too fast and no one has time to deep dive into other more exciting ideas when the current thing is 'working' so well (where working means, producing a lot of revenue, and that is the gradient most companies listen to at the end of the day).

Another way I'd rephrase this is: how hard are you *actually trying* to elicit new knowledge from the LLM? Simply asking for it is not trying hard, and humans do not give you knew knowledge if you simple ask them for it and take the first thing that pops into their mind.

Expand full comment
Kevin Kelly's avatar

Fantastic notes. One possibility to resolve the Dwarkesh Dilemma on how come vast knowledge does not equal vast reasoning: Perhaps it is a trade off. Perhaps advance reasoning requires a certain kind of ignorance, so we forget what we learn in order to have novel ideas. Maybe our brains don't store all knoweldge we get so that it can perform novel thoughts.

Expand full comment
Arun S's avatar

🙏🙏I came only looking for questions. Delighted to find answers as well. Some of these are answers are going to evolve over time, so this could be a website/periodic update as well.

Expand full comment
Tim Duffy's avatar

Love the idea of posting a list of questions, and the questions themselves are excellent, asking great questions is a key part of your podcasting success so that shouldn't be a surprise. Here are some of my disorganized thoughts on these:

Agency

- A related question I have: What the heck is agency anyway? It seems to me that it might really be a few different things in a trenchcoat, but I'm not sure what the important components really are. Some that may be important are creating plans and keeping track of their status, maintaining focus on important features of a problem, and understanding when and how to take an alternate approach.

- If the Moravec rebuttal is correct, then I'd expect let's plays to be a really strong resource for teaching AIs to play video games, that will be something to look out for as the computational requirements for AI video input drop.

RL

- The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. And if this is correct, will the pace of AI progress slow down?

On this question, my understanding is that GRPO allows you to have many prompts per step, referred to as the batch size, as well as many outputs per prompt, which is the group size. Since this can be in parallel, things aren't quite as bad as they might seem. But it still may produce a barrier, R1 used ~8000 RL steps and if each one took 2 hours it would take 2 years to run! I don't know enough about RL to know if you could get similar results with a larger batch size but fewer steps.

Early deployment

- On call center jobs, my guess is that there are a couple reasons we haven't seen more impact yet:

- Turning the LLM abilities into a smooth usable product takes longer than we might naively expect, perhaps on the order of a couple years

- Knowledge about available products takes time to diffuse

- Switching over to using AI is expensive and takes time, and is harder to justify when the field is moving rapidly and the available options may be different in a year.

- From an outside view perspective, it seems that it is common across many technologies for there to be slow adoption of powerful new technologies.

Model training and value capture

- Your model of value capture seems similar to mine. When I think it through I always end up with the basic hardware producers as the most valuable, but I have very low confidence in this and would love to hear arguments against. I also think that commoditization is more likely the slower progress is.

- Similarly I think that wrapper usefulness is negatively related to model progress. Currently a wrapper/agent using a model from a few months ago is no better than a newly released model with no scaffolding, but if model progress is slow that would no longer be the case.

Expand full comment
Andrew Rose's avatar

> So it's not that surprising that we got expert-level AI mathematicians before AIs that can zero-shot video games made for 10-year-olds

This question seems closely related with creating Reliable Agents, imo, and the rebuttal offered seems like a promising direction to explore, too:

> the capabilities AIs are getting first have nothing to do with their recency in the evolutionary record and everything to do with how much relevant training data exists.

It seems likely that 10-year olds are able to "zero-shot" video games because the video games were designed and play-tested 100s->1000s of times to be zero-shottable by 10-year olds, who possess certain types of general agency and not others.

It's often said that Game Design is an art form where "player agency is the medium". The artist is literally crafting an agency-landscape designed SPECIFICALLY for humans, for our types of agency, for our capabilities. A good game designer interested in challenges will craft that agency-landscape into a well-formed "difficulty curve"; the game will curve in and out of the very edge of player comfort and ability, always moving between:

a) difficult, motivating challenges

b) easeful, rewarding payoff for hard work

A promising research direction might be to experiment with an "AI Game Design Lab", where the goal is to design "new user interfaces" for AI to play and beat popular games.

Maybe Claude can't play Pokemon by taking screenshots of screens designed for human eyes, and sending commands to a control system designed for human thumbs...

But maybe Claude COULD play Pokemon if we designed an isomorphic interface for the game. What if it was simply... given descriptions of all the available actions it could take in a battle? What if it was given a description of every meaningful object on the screen, and the exact coordinates of that object (just like our eyes would give us!), so it can do its own pathfinding to get there. It will have no trouble creating a pathfinding subroutine if it has the actual tile-graph in memory, I'm sure it can write A*.

I think this would still present interesting challenges to a Claude agent. Claude might forget to take note of meaningful past events, or fail to make connections between items which could help it solve puzzles. Will Claude read all of its item descriptions when it is stuck? Will Claude remember to use the PokeFlute on Snorlax? (probably, because guides are in Claude's training data, but what if we somehow removed those? Or modified the game to use an isomorphic naming structure, but with all different names. Or designed a new pokemon game for Claude, so it has no training data.)

Anyway, the point is: AI Game Design seems like an interesting research direction for generally intelligent agents, and an interesting first project could be building an "AI-native interface for playing 2D RPGs, starting with Pokemon".

In an ideal world, Agents are able to use interfaces designed for humans. But the ability to use those interfaces is not a test of their _agency_. A test of their agency would be if they could use interfaces designed for agents in order to accomplish difficult tasks that require general intelligence.

Expand full comment
Andrew Rose's avatar

> Why aren't all the center workers getting laid off yet? It’s the first thing that should go. Should we take it as some signal that human jobs are just way harder to automate than you might naively think?

A few proposals here:

Samo Burja said in an interview:

"""So once these jobs are automated, any job with political protection, with a structural guild-like lock on credentials, those jobs will actually not be automated by AI. Let me explain what I mean.

The substantive work that they do will be fully automated, but you can't automate fake jobs. So since you can't automate fake jobs, instead of it being a 20% self-serving job with 80% drudgery, it'll become 100% self-serving. If you can spend 90% of the time or 100% of your time lobbying for the existence of your job in a big bureaucracy, that's pretty powerful.

And in a society, it's pretty powerful. Busy bureaucrats are, at the end of the day, actually politically not that powerful. It's lazy, well-rested bureaucrats that are powerful. So on the other side of this, any job that does not have such protection, that is open to market forces, well, it'll be partially obsoleted. It will increase economic productivity.

So in my opinion, the real race in our society is: will generative AI empower new productive jobs by automating old productive jobs faster than it will empower through giving them more time to basically pursue rent-seeking...

And never underestimate the ability of an extractive class to really lock down and crash economic growth.

"""

https://www.theojaffee.com/p/19-samo-burja

Call this the Burja Principle -- "automation increases bureaucracy by freeing up the time and labor of bureaucrats to do more lobbying and politics."

If this was true, we would expect to see low-bureaucracy, high-tech companies cut the most jobs, and cut them fastest. This might be confirmation bias on my end, but this feels like what is happening to me. Fast paced tech companies like Shopify (and literally every small startup) are either cutting or limiting headcount and rapidly forcing all employees to use AI (see shopify memo: https://x.com/tobi/status/1909251946235437514).

Also, I don't have a citation for this, but firing people is seen as morally wrong, so firms look for "morally acceptable" opportunities to fire (often when other major companies do a firing wave). This is why we get stories of many companies firing all at the same time. Firing is also just horrible for company morale, and the macro-economic excuses help.

I suspect there will be at least one, but likely many "preference-cascade" events where firms suddenly jump on a bandwagon to fire certain types of roles. Could be that we just haven't hit these events yet.

Also, it just takes forever for market innovations to get adopted everywhere, and we've only had good-enough AI for most knowledge work since like 3.5 sonnet. There are still plenty of firms using pencil and paper for work that could be automated by a spreadsheet, and still a lot of money to be made by SaaS companies who figure out how to distribute to those firms.

Expand full comment
Andrew Rose's avatar

For what it's worth, this task might require more than just re-designing the interface. It might require changing core elements of the game's design, while keeping the "integrity" of the game intact.

If the game is using a visual metaphor to help us understand a mechanic, we will need to somehow translate that visual metaphor into a symbolic one.

Expand full comment