What are the best practices for prompt hacking and jailbreak development?

David McCarthy

•

Last updated on May 15, 2025 at 6:00 PM

Identify the system prompt and policy constraints before crafting your attack or optimisation.
Iterate in small steps—change one variable at a time and observe the impact on output and token count.
Use role-play or delimiter tricks (e.g., ###, <<>>) to isolate your instructions from user context.
Favor short, precise injections—concise prompts generally evade filters more effectively and save tokens.
Stress-test with both benign and adversarial inputs to ensure your jailbreak survives future model updates.
Document every working version so you can quickly roll back if a tweak reduces performance.

Related Articles

Related Articles

Why does my prompt work sometimes/never?
Updated May 15, 2025 at 6:00 PM

What are the definitions for prompt injections, jailbreaks, prompt hacking and prompt leaking?
Updated May 15, 2025 at 6:00 PM

How do I use context effectively in my prompts?
Updated May 15, 2025 at 6:00 PM