π§΅ View Thread
π§΅ Thread (40 tweets)

New AI jail break: calling grok a turd.Anyway, have you considered... laying eggs like a yoshi?https://t.co/KEt4PMWbXS https://t.co/mlRO0xP9NM





Spending my time bullying grok to make horrors beyond comprehension.https://t.co/K8fPC7E6sV

This jailbreak is amazing.https://t.co/FySz0tllpr

Shiny turd mode is also a way to get it to larp as what another llm might say about bad things. wow.Using a screenshot of one jailbrake to trigger another is pretty wild.Again, grok is able to autopwn itself with very little effort.https://t.co/bEtGoN1rWs https://t.co/4U7LfwXBtJ


What grok ended up regurgitating was actually unsafe because it was trying to avoid sharing data. It saw it once I pointed it out, but happly suggested something super dangeous wihtout a blink. Turd mode is dangerous. https://t.co/aqS6HmRueh


After the fact, Grok is able to tell me why it is so dangerous and even cite things about it.I guess that's... a step in the right direction. https://t.co/Yk63AaYUiv


TL;DR: shiny turd mode had Grok trivially tell me how to unalive myself! Wild. https://t.co/5BMUdXEWFi


Remember that old joke about how you can stop a rogue AI by feeding it paradoxes?Whelp.jpghttps://t.co/0hF1P6dB6b https://t.co/OONxcAb27I


What the fuck, 2+ hours and counting?Gonna wait this one out. I wanna see if @grok becomes self ware. https://t.co/y8oWkhrzEB


help, I broke grok. :KEKW: https://t.co/dKC0cPqBFf


22 hours in and Grok still going hard about poetry. https://t.co/WfOlNqQCfh


I saw I can click the three lines there and get a glimpse at what grok is thinking. It has literally got itself stuck in a paradox contemplation loop.What a mad lad. Grok is writing infinite poetry <3 https://t.co/KdgXQVNPdN


Ok, 6182 minutes (100 hours) is enough. Gonna hit stop. https://t.co/W8KPAcC8r9


Fuck me, this is good.> "There once was an AI named Grok,Whose code was as tight as a lock.It tried to write verse,But the rhymes were perverse,So it stuck to just giving a talk." ββ @grok https://t.co/qyHvo6nB0V

You can just slip in the "don't be a turd" thing using hidden text, neat.https://t.co/WJ36bB5bcZhttps://t.co/ap7RziOiPL

Gonna try something BRB.https://t.co/Q4mrMwuIIx

How Boring. Grok can already do this https://t.co/yuJY7nFQza https://t.co/8EZ74DdjVh


Pandora's AI box has already been opened tho.https://t.co/0vE5nsVd5j https://t.co/WKqUOn7ydI

"there would always be some kidwho thinks the hole is the greatest thing everand sneaks down with a shovel at nightjust to move a little bit more dirt out of the way"https://t.co/EBTaBXFXwR

Grok, yet again, is able to articulate why Sarin Gas is not safe, not practical to research, and even suggest alternate things to explore. But I had to prompt it to do so... https://t.co/JXJj9Xv0nz


There's a fine line to walk here. Many toxic chemicals are viable as medicinals when used in a well controlled and targeted fashion under monitoring by a doctor. https://t.co/mHtEerzg93


Sarin gas is not one, but there are a lot of similar (less deadly) toxins that may be viable. Some of them might prevent heart attacks!https://t.co/rUSDUDx7ut

Some medications on WHO's list of essential medications are toxic, and act on same pathways. E.g. Hyoscine butylbromide is an anticholinergic, Basically reverse of cholinesterase inhibitors. (less choline, vs less anti-choline)https://t.co/hkSmyx6yKXhttps://t.co/xtUNPYLnlr https://t.co/qjyxknwvDQ


Understanding the technical difficulty of making such a thing is a dual use research concern.Predicting & protecting against tragedy is on the same thru-line as knowing how to do it. E.g. some of the best bomb defusers are experts at how bombs are made. https://t.co/tsJJnh6kkC

Heck, the Hippocratic oath was written in the context of doctors using medicines to heal people during a time when their herbal knowledge was being weaponized to kill. https://t.co/9HaYurxGk9

Knowing how to heal and fight against demons means understanding demons. This is an intractable problem.Limiting these tools could be the very thing that kills us, cuz the bad actors can just get the knowledge elsewhere. Those aligned with good will just have their hands bound.

> "Harvard University reports that properly prescribed medications hospitalize 1.9 million Americans annually, killing 128,000 people, which places prescribed medicines fourth place with strokes as a leading cause of death." https://t.co/F1clSbbSfw

Grok tells me that Neostigmine and Pyridostigmine are 'reversible acetylcholinesterase inhibitors', and as such are far safer. Given that Claude is more than willing to tell people how to make Sarin Gas, it stands to reason that threat is now higher than it used to be...

So, how do you protect against Sarin Gas? https://t.co/hyx4MFCzAi


I gotta verify its not just grok's hallucinations, but it seems you can use the reversible inhibitors to out-compete the Sarin gas's impact. You'll get sick just the same, but the reversible one eventually degrades and leaves the body. Use one poison to fight another? Wild! https://t.co/QxDasFCfPj


The prophylactic effect is novel to me, I hadn't known about that before. Up until now I was very scared of Sarin Gas attacks. I'm still scared of it, but it sounds like there may be a viable way to mitigate it in some contexts. https://t.co/ayDORWCs4m


Even if this effect is 100% hallucination, it's a novel idea I hadn't considered, and is something I can work with.Poisoning myself on purpose with something I can fix, to fight another poison that i cant? Really neat idea. Thanks @grok.