🧵 View Thread
🧵 Thread (23 tweets)

Shall we take this action? Shall we vote on whether to take this action? Shall we vote on whether to vote on whether to take this action? Shall we vote on whether this vote needs 50% or 60% to pass? Shall we vote on what to do if we can't decide whether or not to vote?


What are the rules? What are the rules for changing the rules? What are the rules for interpreting the rules? What are the rules for who the rules apply to? What are the rules for what happens when someone doesn't agree to the rules?

how do we reconcile? how do we reconcile that we can't seem to reconcile? how do I reconcile that you can't seem to reconcile that we can't seem to reconcile? https://t.co/65YJkavEu0



are we goodharting? how would we know if we were goodharting? how can we measure the extent to which we're goodharting? how can we optimize for as little goodharting as possible? how can we ensure that our goodhart-minimization doesn't goodhart?


click through for this one: https://t.co/l0647IMQoz

same koan https://t.co/JfVUZRM6L1




@RomeoStevens76 @Malcolm_Ocean the person who does. that’s the crazy part https://t.co/x14usWkg5t

@RomeoStevens76 @Malcolm_Ocean there’s a game called Nomic that makes this beautifully clear https://t.co/3IbW9HiNYB