The idea is to increase the latency, and by that the power consumption-> cost of inference LLM.
mechanisms like Early Exit and Speculative Decoding can be attacked via simple adverserial attacks.
<img src=’/images/bert.png’,width=”600”/>
image from https://arxiv.org/abs/2006.04152