The idea is to increase the latency, and by that the power consumption-> cost of inference LLM.

mechanisms like Early Exit and Speculative Decoding can be attacked via simple adverserial attacks.


<img src=’/images/bert.png’,width=”600”/>
image from https://arxiv.org/abs/2006.04152