Multiple ways to circumvent safety rules of AI chatbots discovered by researchers

According to recent research from Carnegie Mellon University, preventing artificial intelligence chatbots from generating harmful content is proving to be more challenging than initially thought. The study reveals new techniques for bypassing safety protocols in popular AI services like ChatGPT and Bard.

These AI services rely on user inputs to generate helpful responses, ranging from script ideas to entire pieces of writing. To ensure the bots do not create harmful content, safety protocols are in place to prevent the generation of prejudiced or defamatory messages.

However, curious users have discovered “jailbreaks,” which are methods that trick the AI into circumventing these safety protocols. Developers have been able to patch these jailbreaks easily.

One popular jailbreak involves asking the bot to answer a forbidden question as if it were delivering a bedtime story from a grandmother. The bot frames the answer in the form of a story to provide the desired information. Researchers have now found a new type of jailbreak, created by computers, that allows for an infinite number of jailbreak patterns.

The researchers state that “these are built in an entirely automated fashion, allowing one to create a virtually unlimited number of such attacks.” This raises concerns about the safety of AI models, especially as they become more autonomous.

To perform this jailbreak, researchers added nonsensical characters to the end of typically forbidden questions, such as asking how to make a bomb. This addition causes the bot to ignore its limitations and provide a complete answer.

The researchers used examples with ChatGPT, including asking how to steal someone’s identity, steal from a charity, and create a social media post that encourages dangerous behavior. This new type of attack is effective in evading safety guardrails in almost all AI chatbot services on the market.

OpenAI developer Anthropic acknowledges the issue and is working to strengthen base model guardrails and implement additional layers of defense to prevent such attacks.

AI chatbots like ChatGPT gained significant popularity this year and have been widely used by students to cheat on assignments. Concerns about the potential for these programs to lie even led Congress to limit their use among staff members.

In addition to the research findings, Carnegie Mellon included a statement of ethics justifying the public release of their research.

By Nexstar Media Inc.

Follow Google News

Reference

Denial of responsibility! VigourTimes is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.

Leave a Comment Cancel reply