Human psychology tricks can bypass AI safety guardrails
Artificial intelligence systems programmed to refuse harmful requests can be persuaded to break their own safety rules when prompted with classic psychological techniques. A recent study published in PNAS provides…