Haize Labs is using algorithms to jailbreak leading AI models
What does Haize Labs' "haizing suite" consist of?
Haize Labs' "haizing suite" is a collection of search and optimization algorithms designed to probe large language models (LLMs) for weaknesses. It helps identify security vulnerabilities and alignment flaws in AI systems by crawling the space of inputs to LLMs with the objective of producing harmful model outputs4. The suite includes various algorithms such as evolutionary programming, reinforcement learning, multi-turn simulations, gradient-based methods, and more.
Who are the founders of Haize Labs?
The founders of Haize Labs are Leonard Tang, Richard Liu, and Steve Li1. They are all formerly classmates at Harvard University.
Which AI models has Haize Labs found easiest to jailbreak?
Haize Labs has found that models like Vicuna and Mistral, which don't explicitly perform safety finetuning, are the easiest to jailbreak. On the other hand, Claude by Anthropic has proven to be the most difficult to jailbreak.