← Back to Benchmarks
simmediumrlmetric · varies

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Description

Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creating a false sense of security. In this work, we propose PISmith, a reinforcement learning (RL)-based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practic

Source

http://arxiv.org/abs/2603.13026v1