🔎 View Tweet

Malcolm Ocean 🏴☠️@Malcolm_Ocean• about 3 years ago
are we goodharting? how would we know if we were goodharting? how can we measure the extent to which we're goodharting? how can we optimize for as little goodharting as possible? how can we ensure that our goodhart-minimization doesn't goodhart?