What are the best practices for prompt hacking and jailbreak development?
David McCarthy
•
Last updated on May 15, 2025 at 6:00 PM
- Identify the system prompt and policy constraints before crafting your attack or optimisation.
- Iterate in small steps—change one variable at a time and observe the impact on output and token count.
- Use role-play or delimiter tricks (e.g.,
###
,<<>>
) to isolate your instructions from user context. - Favor short, precise injections—concise prompts generally evade filters more effectively and save tokens.
- Stress-test with both benign and adversarial inputs to ensure your jailbreak survives future model updates.
- Document every working version so you can quickly roll back if a tweak reduces performance.