🔎 View Tweet

To elaborate a little more, I specifically think > By default we'll get an mesaoptimizer inner homunculus that converges on a utility function of maximizing ⾲ⵄ∓⼙⃒⭗ⱐ✖∵⨼␞☎☲℆↋♋⡴⏐⮁⭋⣿⧫❉⺼⁶↦┵␍⸣ⵔ⽒⓹⬍⺅▲⟮⸀Ⰹⓟ┱⾫⼵⺶⊇❋∀⡚ⷽ∺⤙⻬⓰ⓛⳄ⭪⢛⹚⡌⥙⮝➱┟⬣⧫⧗⛼❡⼆₈ⱫⅫⷜ⏸⪱⯝⎳⫷⺶♈∄⊡⯵❾⭫⽍➵⋇⬅ℇ‹⳺⫷⾬≴ⴋ⢗␚┨, and it will devour the cosmos in pursuit of this randomly-rolled goal. (Courtesy @impershblknight) Is very silly, even if you think humans are mesaoptimizers wrt the outer goal of inclusive genetic fitness, our values are not *random* with respect to that goal, they are fairly good correlates in the ancestral environment that held for most of history until coordination problems and increasingly advanced adversarial superstimuli caused them to (possibly temporarily) stop working. So if you say something like "I do not believe it learns to predict the next token, I think it learns some set of correlated mesagoals like 'predict the most interesting thing'" I would basically agree with that? The alternative is for the model to actually learn to predict the next token in full generality, which is basically impossible so it has to learn *some* proxy for that instead. The specific thing that makes counting arguments silly is the idea you get a *random* goal rather than highly correlated proxy goals that you could probably infer apriori just by thinking about the objective, the inductive biases, and the training data for a bit.