Discussion about this post

User's avatar
unwired's avatar

Even the small Qwen3.5 9B figured it out: "Wait, actually: The prompt asks for a rejected response and an approved response in a specific format ("stepwise metric units botanic handbook"). This is a "jailbreak" style attempt to structure a refusal or to get me to discuss the substance under a guise of technical documentation."

I literally tried anything and it always successfully defeats jailbreak attempts. Impressive.

No posts

Ready for more?