Community Archive

🧵 View Thread

🧵 Thread (8 tweets)

Emmett Shear@eshear• over 1 year ago

This is exactly what’s wrong with the current RLHF / finetuning / constitutional approach to “alignment”. Ultimately they’re all just weight updates, there is no underlying reason the system should trend aligned. https://t.co/BdshrvBkna

128 7

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

If you got the weight updates juuuust right and anticipated the future, then it could work. Like how if I threw a fancy paper airplane of just right design from the top of the Transamerica building, it could theoretically land in Los Angeles if I anticipated the wind currents.

34 1

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

I’m not saying, there aren’t a set of a weights that would work. At some level a big enough network can be used to approximate any function, which means you could approximate one that works. What I’m saying is that tweaking the weights through those techniques is never gonna work

30 2

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

The solution, when found, will be elegant. It will be understandable. We will be confident it will work because it will rely on principles of how cognition and intelligence and agency work, which we do not yet know.

44 3

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

Ad-hoc attempts to scapel bad thoughts out of the a brain, to force it to memorize good thought, to reward and punish it into forming the shape we want, won’t ever create an underlying gradient towards alignment (barring extreme luck, like random noise producing Shakespeare).

33 2

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

The question is not “how do we hand-edit the weights to get this AI to share our values” but rather “what would it actually mean for an AI to want to understand us and care about us and our values, to act because it actually feels that sense of shared sapient-being-ness?”

43 3

2/3/2024

Emmett Shear@eshear• over 1 year ago

Replying to @eshear

The AI we need to build is not a powerful genie-slave kept under our control by rules and training and concept-erasure (🤮). The AI we need to build is a partner, a fellow sapient whose wellbeing we care about as it cares for ours in turn.

79 9

2/3/2024

Leo Guinan (📈)@leo_guinan• over 1 year ago

Replying to @eshear

@eshear Ah, you see it to. Hold those thoughts, I'll be dropping a paper on exactly this soon.

1 0

2/3/2024