> Cool, that’s really the only point I’m making.
To be clear, I'm saying that I don't know if they are, not that we know that it's not the same.
It's not at all clear that humans do much more than "that basic token sequence prediction" for our reasoning itself. There are glaringly obvious auxiliary differences, such as memory, but we just don't know how human reasoning works, so writing off a predictive mechanism like this is just as unjustified as assuming it's the same. It's highly likely there are differences, but whether they are significant remains to be seen.
> Not necessarily scaling limitations fundamental to the architecture as such, but limitations in our ability to develop sufficiently well developed training texts and strategies across so many problem domains.
I think there are several big issues with that thinking. One is that this constraint is an issue now in large part because GPT doesn't have "memory" or an ability to continue learning. Those two need to be overcome to let it truly scale, but once they are, the game fundamentally changes.
The second is that we're already at a stage where using LLMs to generate and validate training data works well for a whole lot of domains, and that will accelerate, especially when coupled with "plugins" and the ability to capture interactions with real-life users [1]
E.g. a large part of human ability to do maths with any kind of efficiency comes down to rote repetition and generating large sets of simple quizzes for such areas is near trivial if you combine an LLM at tools for it to validate its answers. And unlike with humans where we have to do this effort for billions of humans, once you have an ability to let these models continue learning you make this investment in training once (or once per major LLM effort).
A third is that GPT hasn't even scratched the surface in what is available in digital collections alone. E.g. GPT3 was trained on "only" about 200 million Norwegian words (I don't have data for GPT4). Norwegian is a tiny language - this was 0.1% of GPT3's total corpus. But the Norwegian National Library has 8.5m items, which includes something like 10-20 billion words in books alone, and many tens of billions more in newspapers, magazines and other data. That's one tiny language. We're many generations of LLM's away from even approaching exhausting the already available digital collections alone, and that's before we look at having the models trained on that data generate and judge training data.
[1] https://sharegpt.com/