Writer Fuel: AI Taught to Be Malicious Couldn’t be Retrained to Behave Again
Artificial intelligence (AI) systems that were trained to be secretly malicious resisted state-of-the-art safety methods designed to “purge” them of dishonesty, a disturbing new study found. Researchers programmed various large language models (LLMs) — generative AI systems similar to ChatGPT — to behave maliciously. Then, they tried to remove this behavior by applying several safety … Read more