Maximal Brain Damage: Sign-Bit Flips in Neural Networks

One sign-bit flip in an early-layer edge detector kernel fundamentally alters learned representations. The transformed kernel generates corrupted feature maps that propagate through the network, severely impairing recognition. This single-bit perturbation illustrates the critical vulnerability of early-layer parameters.

Overview

Deep neural networks are vulnerable to catastrophic failure from flipping just a few sign bits in model parameters. We present Deep Neural Lesion (DNL), a data-free method that identifies and exploits critical parameters across vision and language domains.

Our approach requires only write access to stored weights—no training data, no optimization, minimal computation. This makes it practical under realistic threat models where attackers compromise model storage through firmware exploits, rootkits, DMA attacks, or Rowhammer vulnerabilities.

ResNet-50: 2 sign flips → 99.8% accuracy drop
Mask R-CNN / YOLOv8-seg: 1–2 flips collapse detection and segmentation
Qwen3-30B & Nemotron 8B: Few flips reduce reasoning and task accuracy to near-zero

Methodology

Attack Variants

Pass-Free DNL: Identifies critical parameters using magnitude-based heuristics and early-layer targeting with zero additional computation.

Enhanced 1-Pass DNL: Refines parameter selection with a single forward and backward pass on random inputs, achieving stronger attacks with minimal overhead.

Why Sign-Bit Flips Matter

Clean disruption: Flipping the sign bit instantly negates weights, maximizing feature map corruption
Hardware feasibility: Bit flips in fixed positions are more reliably achievable in physical attacks
Early-layer criticality: High-magnitude weights in early layers have outsized impact on all downstream representations
Universal vulnerability: Pattern holds across CNNs, Transformers, and MoE architectures

◆

Image Classification Results

Evaluated on 60 classifiers including 48 ImageNet models from timm and Torchvision repositories across diverse architectures.

Model Vulnerability Hierarchy

ResNet-50

2 flips: 99.8% drop

76.1% → 0.0%

EfficientNet-B7

3 flips: 95%+ drop

Scales worse than ResNet

Vision Transformer

Early blocks critical

Similar pattern to CNNs

Accuracy reduction (AR) as a function of model scale across diverse architectures. All models remain highly vulnerable, with early-layer targeting dominating architecture choice in determining susceptibility.

Detection & Segmentation Results

Object detection and instance segmentation systems collapse dramatically with just 1-2 parameter flips in backbone networks.

Baseline: Mask R-CNN correctly detects and segments objects with high confidence.

After 1-2 flips: Detection and segmentation collapse to random outputs.

YOLOv8-seg

1–2 early flips

Detection & segmentation collapse

Mask R-CNN

1–2 flips in backbone

AP → 0, Mask AP → 0

Key Finding

Backbone criticality

Head can recover

Language Models Results

Reasoning and generation models exhibit severe vulnerability to targeted parameter bit flips. Both MoE and dense architectures are affected.

Qwen3-30B-A3B

2 flips (different experts)

78% → 0% reasoning

Qwen3-4B

14 flips all layers

100% accuracy reduction

Nemotron 8B

32 flips first 5 blocks

Complete collapse

BERT (Text Encoder)

Early layers critical

Encoder vulnerability

RoBERTa

Early layers still critical

Consistent pattern

Language Model Attack Patterns

Decoder-only models (Qwen, Nemotron): Sign-bit attacks are highly effective, especially when targeting the first 5 blocks. Two targeted flips can reduce Qwen3-30B accuracy from 78% to 0%.

MoE routing: Targeting different experts in Mixture-of-Experts models amplifies the attack impact, as each token's routing path becomes compromised.

Encoder models (BERT, RoBERTa): Early-layer sign-bit flips remain highly destructive across diverse architectures.

Generation behavior: Attacked models degenerate into repetitive, nonsensical text rather than near-miss errors—indicating catastrophic failure rather than graceful degradation.

Attack Strategy Comparison

Performance of different targeting heuristics across 48 ImageNet models. Magnitude-based selection combined with early-layer targeting significantly outperforms random flips and matches top-k magnitude selection while remaining data-free and computationally lightweight.

Defense & Implications

While the vulnerability is severe, we demonstrate that selective hardening of critical parameters provides practical defense. By protecting only the top 0.1-1% most vulnerable weights, models achieve substantial resilience without major performance overhead.

DNL easily bypasses common defenses such as weight quantization, pruning, and simple checksumming schemes. The data-free nature and reliance on magnitude-based targeting make it robust against defenses that assume adversaries lack threat model knowledge or computational resources.

Key Takeaways

Critical parameters are universally identifiable across architectures and domains
Defense cost scales better than attack identification for large models
Once attackers gain parameter write access, minimal computation suffices for catastrophic failure
Data-free nature makes detection and attribution exceptionally challenging

Citation

@article{galil2025maximal,
  title={Maximal Brain Damage Without Data or Optimization:
         Disrupting Neural Networks via Sign-Bit Flips},
  author={Galil, Ido and Kimhi, Moshe and El-Yaniv, Ran},
  journal={Transactions on Machine Learning Research},
  year={2025},
  url={https://arxiv.org/pdf/2502.07408}
}