The progression of hash function cryptanalysis spans from theoretical vulnerabilities in SHA-0 (1998) to practical chosen-prefix collisions (CPC) in MD5 (utilized by the Flame malware in 2012). Understanding the eventual cryptanalysis and creation of SHA-1 CPCs requires examining the fundamental hash function mechanics and of earlier attacks importantly: Message Expansion, Differential Trails, and Disturbance Vectors (DVs).
While Flame targeted MD5, it served as a practical validation of chosen-prefix collisions. The theoretical foundation that enabled such attacks provided the roadmap for researchers to eventually solve the higher complexity of SHA-1.
1. Background
To analyze these attacks at an implementation level, several core primitives of state-transformation must be defined:
- Merkle-Damgård / Davies-Meyer: The iterative framework of the MD-SHA family. A 512-bit message block transforms a 160-bit Internal Hash Value (IHV) via a compression function.
- Differential Cryptanalysis: The study of how an input difference
\Delta Mpropagates through the nonlinear state-update transformation to produce a specific output difference\Delta IHV. - Signed Bit Differences: Tracking differences at the bit level using signed notation
+1, -1, 0for each bit position, rather than simple XOR. A $+1$ at bit $i$ represents a difference of $+2^i$ in modular arithmetic, allowing precise tracking of carry propagation. - Sufficient Conditions: The specific boolean constraints required in the message and internal state to ensure a target differential path is followed.
- Degrees of Freedom: The remaining entropy in a 512-bit message block after all Sufficient Conditions have been satisfied. These variables are adjusted during Message Modification to find a collision.
2. The Message Expansion Engine
The diffusion step of SHA-1 relies on its message expansion logic. It maps 16 message words W_0 \dots W_{15} into 80 words W_{16} \dots W_{79} to ensure input differences avalanche across the 80-round execution.
flowchart TD
M[512-bit Message Block] --> ME[Message Expansion Engine]
subgraph ME_Process [Message Expansion Process]
W0[Words W_0 to W_15<br/>Direct from Message]
W16[Words W_16 to W_79<br/>Linear Expansion<br/>XOR + Rotation]
W0 -.->|Avalanche Effect| W16
end
ME --> W0
ME --> W16
W0 --> R1[Round 1: Steps 0 - 19<br/>Boolean Function: IF]
W16 --> R2[Round 2: Steps 20 - 39<br/>Boolean Function: XOR]
W16 --> R3[Round 3: Steps 40 - 59<br/>Boolean Function: MAJ]
W16 --> R4[Round 4: Steps 60 - 79<br/>Boolean Function: XOR]
R1 --> R2 --> R3 --> R4 --> Out[Final State / IHV]
(Fig 1. SHA-1 pipeline showing message expansion and round structure.)
In SHA-0, the expansion is strictly linear: W_t = W_{t-3} \oplus W_{t-8} \oplus W_{t-14} \oplus W_{t-16}. SHA-1 modified this by appending a 1-bit left rotation \text{ROL}_1. ~This rotation was introduced specifically to disrupt the symmetry of the differential paths utilized in SHA-0 cryptanalysis.~
3. State Reconvergence: The 6-Step Local Collision
In 1998, Chabaud and Joux identified the concept of the Local Collision in SHA-0. If a perturbation (disturbance) is introduced at step t, the internal state diverges. Because the SHA step function operates as a rotating state register, an attacker can inject five specific "correction" differences in the subsequent steps to force the internal state back to zero-difference.
flowchart TD
classDef normal fill:#f9f9f9,stroke:#333,stroke-width:2px;
classDef disturbance fill:#ffcccc,stroke:#cc0000,stroke-width:2px;
classDef correction fill:#e6f3ff,stroke:#0066cc,stroke-width:2px;
classDef clean fill:#ccffcc,stroke:#00aa00,stroke-width:3px;
S_t[Step t: Normal State]:::normal -->|"Inject Disturbance δW_t = 2^b"| S_t1[Step t+1: State Diverged]:::disturbance
S_t1 -->|"Correction 1: δW_t+1"| S_t2[Step t+2: Correcting...]:::correction
S_t2 -->|"Correction 2: δW_t+2"| S_t3[Step t+3: Correcting...]:::correction
S_t3 -->|"Correction 3: δW_t+3"| S_t4[Step t+4: Correcting...]:::correction
S_t4 -->|"Correction 4: δW_t+4"| S_t5[Step t+5: Correcting...]:::correction
S_t5 -->|"Correction 5: δW_t+5"| S_t6[Step t+6: Reconverged!]:::clean
(Fig 2. The Local Collision. A perturbation at Step t is neutralized by five subsequent corrections, achieving state reconvergence at Step t+6.)
# Run 1: Clean Execution
$ ./sha0 "abc" 3
t=00: 77d3b8c4 67452301 7bf36ae2 98badcfe 10325476
t=01: 34cba07c 77d3b8c4 59d148c0 7bf36ae2 98badcfe
t=02: 0ebd3007 34cba07c 1df4ee31 59d148c0 7bf36ae2
t=03: 86391517 0ebd3007 0d32e81f 1df4ee31 59d148c0
t=04: da0aa580 86391517 c3af4c01 0d32e81f 1df4ee31
t=05: 4e8696ba da0aa580 e18e4545 c3af4c01 0d32e81f
t=06: b5cbb9bb c9d85109 7a492243 d3e1a077 d105ca0f
# Run 2: Execution with Local Collision Difference Mask applied
$ ./sha0 "abc" 3 [apply_mask]
t=00: 77d3b8c4 67452301 7bf36ae2 98badcfe 10325476
t=01: 34cba07e 77d3b8c4 59d148c0 7bf36ae2 98badcfe <-- Diff injected into A (Bit 1)
t=02: 0ebd3007 34cba07e 1df4ee31 59d148c0 7bf36ae2 <-- Diff shifts to B
t=03: 86391517 0ebd3007 8d32e81f 1df4ee31 59d148c0 <-- Diff shifts to C (Rotated to Bit 31)
t=04: da0aa580 86391517 c3af4c01 8d32e81f 1df4ee31 <-- Diff shifts to D
t=05: 4e8696ba da0aa580 e18e4545 c3af4c01 8d32e81f <-- Diff shifts to E
t=06: b5cbb9bb c9d85109 7a492243 d3e1a077 5105ca0f <-- A, B, C, D successfully reconverged!
In a valid local collision, modular carries must be controlled so the state perfectly realigns. The exact bit positions of these corrections depend on the rotation constants and the active boolean function—IF, XOR, or MAJ—in that round.
4. Disturbance Vectors (DVs)
A single local collision only neutralizes six steps. To bypass the full 80-round execution, these collisions must be chained. The global template for this sequence is the Disturbance Vector (DV).
Because the Message Expansion engine relies entirely on linear operations (XOR and bit-shifts), differences injected into the message follow the exact same expansion rules as the message itself. A DV is a 16-word input difference that has been projected through the expansion engine to form an 80-word matrix.
Every 1 in this matrix represents a perturbation that must be countered with a 6-step local collision. Therefore, finding an optimal DV is a Hamming weight minimization problem: researchers seek a starting difference that produces the lowest possible number of 1s in the final 60 rounds. Lower Hamming weight corresponds to fewer disturbances, which reduces the number of sufficient conditions that must be satisfied and increases collision probability. DVs are categorized by their expansion logic:
- Type I DVs: Follow the standard forward expansion rules of the hash.
- Type II DVs: (Detailed by S. Manuel) Utilize a backward-shifted expansion logic, frequently resulting in lower Hamming weights in the target rounds, optimizing the collision search.
flowchart LR
subgraph DV_Blueprint [Disturbance Vector DV_t]
direction TB
DV0[DV_0 = 1]:::dvActive
DV1[DV_1 = 1]:::dvActive
DV2[DV_2 = 0]:::dvInactive
DV3[DV_3 = 1]:::dvActive
end
LC0[Local Collision 1<br/>Steps 0-5]:::lcBox
LC1[Local Collision 2<br/>Steps 1-6]:::lcBox
LC3[Local Collision 3<br/>Steps 3-8]:::lcBox
DV0 -->|Triggers| LC0
DV1 -->|Triggers| LC1
DV3 -->|Triggers| LC3
classDef dvActive fill:#ff9999,stroke:#333;
classDef dvInactive fill:#cccccc,stroke:#333;
classDef lcBox fill:#f0f0f0,stroke:#666,stroke-width:2px;
(Fig 3. Overlapping Collisions. The DV dictates where disturbances are injected to survive the full 80-round execution.)
5. Implementation: Linearized Model Code Snippets
When building theoretical differential paths, modular addition ($+$ modulo $2^{32}$) introduces unpredictable carry bits. To isolate the differential path from carry interference, Chabaud and Joux (1998) utilized a linearized model of SHA-0.
By removing the SHA-1 \text{ROL}_1 (reverting to SHA-0 expansion logic), the expansion becomes purely linear. Replacing modular addition with XOR throughout makes the hash a strictly linear system. This allows observation of a local collision diverging and perfectly reconverging without the disruption of modular carries.
The core logic of a 6-step local collision is shown below, applied to the expanded message schedule W:
// Applies a 6-step local collision difference mask at step i
void ApplyLocalCollision(uint32_t *W, uint8_t i) {
W[i] ^= (1 << 1); // Perturbation
W[i+1] ^= (1 << 6); // Correction 1: Cancels effect of A's 5-bit rotation
W[i+2] ^= (1 << 1); // Correction 2: Accounts for B
W[i+3] ^= (1 << 31); // Correction 3: Accounts for C (ROL 30)
W[i+4] ^= (1 << 31); // Correction 4: Accounts for D
W[i+5] ^= (1 << 31); // Correction 5: Accounts for E
}
A Type II Disturbance Vector is seeded as follows. Note that in this representation, the matrix uses a uint8_t array where each set bit logically represents a full 32-bit word disturbance in the target round:
// Seeds a Type II Disturbance Vector utilizing backward-shifted logic
void CreateDisturbanceVector(uint8_t *dv_matrix) {
dv_matrix[9] |= (1 << 5); // Seed bit at position 9, bit 5
dv_matrix[9] |= (1 << 1);
dv_matrix[8] |= (1 << 1);
ExpandBits(dv_matrix); // Expands 16-word seed across 80 rounds
}
(Note: See Appendix A for the complete, compilable reference implementation of this linearized model).
6. Message Modification
Hans Dobbertin introduced Message Modification to improve the probability of fulfilling early-round differential paths.
Rather than treating the initial steps as a probabilistic filter, Message Modification treats the first 16 steps as a system of equations. Because the input message is controlled by the attacker, bits can be specifically set to satisfy the Sufficient Conditions of the first round with a probability of 1. This process consumes initial Degrees of Freedom, preserving the remaining entropy to satisfy conditions in the later, higher-complexity rounds.
7. The Collision Frontier: Modular vs. XOR
Transitioning from a linearized model to the actual SHA-1 algorithm requires resolving the discrepancies between modular addition differences and bitwise XOR differences.
In the cryptanalysis of MD5, X. Wang et al. exploited signed bit differences and message modification to control carry propagation through the boolean functions. In SHA-1, the inclusion of the \text{ROL}_1 expansion increased the complexity beyond manual calculation. This necessitated the development of automated Path-Search Algorithms. Utilizing backtracking and meet-in-the-middle techniques, these algorithms identify the narrow set of trails where modular carries align to emulate the XOR differential path, ultimately achieving a zero-difference output.
Conclusion
The cryptanalysis of SHA-1 succeeds by translating theoretical differential structures into practical collision searches. The identification of optimal Disturbance Vectors, combined with Message Modification to control early-round state variables, reduces the complexity of finding a collision from pre-image resistance bounds to computationally feasible levels.
The MD5 chosen-prefix collision utilized by the Flame malware demonstrated the practical application of this methodology against digital signature schemes. As cryptographic standards evolve, the analysis of differential path propagation and message expansion linearity remains a foundational metric for hash function security.
References
- Chabaud & Joux (1998): Differential Collisions in SHA-0. [pdf]
- Wang et al. (2005): Finding Collisions in the Full SHA-1. [pdf]
- Manuel, S. (2011): Classification and Generation of Disturbance Vectors for Collision Attacks against SHA-1.
- Marc Stevens (2012): Attacks on Hash Functions and Applications. [pdf]
- Hans Dobbertin (1996): Cryptanalysis of MD5 Compress. [Eurocrypt ’96 rump session papers]
Appendix A: Linearized SHA-0 Implementation
A reference implementation of the linearized SHA-0 model is available at [link to be added]. This code demonstrates Type II DV reconvergence without modular carry interference.