HSM Testing in Automotive: Prove the Boundary, Not Just the Crypto

That gap is what this is about. An HSM is not valuable because it runs AES. It is valuable because of the boundary it enforces: a place where keys live that the application core cannot directly access under the assumed threat model, even if application software is compromised. So the real question for HSM testing is not "does the crypto work." It is "can the crypto be bypassed." Those are two different questions, and the attacker only cares about the second one.

1. Why the bus believes anyone

In-vehicle networks were built for trust, not for adversaries. On a classic CAN bus a well-formed frame is an authentic frame by default. Nothing on the wire is required to prove who sent it. The industry has spent the last decade moving away from this trust model, but many deployed architectures still inherit its assumptions.

Thieves turned that assumption into a business. In the CAN-injection thefts of 2022 and 2023, criminals reached the wiring behind a headlight, clipped onto two CAN lines, and injected spoofed traffic that impersonated trusted ECUs in the authorization flow. The bus believed them, because nothing on it was ever asked to prove otherwise. One of the primary mitigations proposed was message authentication backed by hardware-protected keys. That is what an HSM is for.

The lesson is older than that. The 2015 remote Jeep Cherokee compromise pivoted onto the CAN bus and triggered a recall of 1.4 million vehicles, widely described as the industry's first cybersecurity recall. Authentication on the bus was the missing control then too.

2. What HSM testing usually means, and where it stops

Most OEM and supplier verification plans focus on demonstrating intended behavior under expected conditions:

Functional crypto: known-answer vectors for AES-CMAC, ECC and RSA sign and verify, the random number generator, key derivation, and the key lifecycle of provisioning, update and isolated storage.
Secure boot: signature verification at each stage, anti-rollback, and the correct behavior when the image is tampered.
SecOC: a message authentication code (the AUTOSAR SecOC authenticator), truncated to fit CAN payload limits, plus a freshness value. The length is configurable per PDU and is itself a security trade-off, for example 24 to 32 bits, trading forgery resistance against bus bandwidth. You test that a forged MAC is rejected, that a replayed frame is rejected, and that freshness desynchronization is handled.

One detail decides whether SecOC holds in the field. Freshness is a monotonic value maintained per Freshness Manager, but only its low-order bits travel on the bus; the receiver reconstructs the full value, and the MAC is computed over Data ID, payload and full freshness. That is why freshness desynchronization, sender and receiver counters drifting apart, is the real-world failure mode you have to test, not just a clean replay.

All of that is necessary. None of it is sufficient. Those tests can pass while the key is still extractable, because functional correctness says nothing about what an attacker can do to the hardware.

3. What you have to satisfy before an HSM earns its keep

Dropping an HSM into a design does not make it secure. It is justified only if a set of requirements is satisfied first, and each one is something you have to verify, not assume:

A threat analysis that sets the bar. ISO/SAE 21434 starts with a TARA in Clause 15. The HSM class you choose has to match the threat it faces. SHE provides a fixed AES-128 block (CMAC and CBC) with a small number of key slots and secure boot, no asymmetric engine. EVITA Light is symmetric-only for sensor and actuator nodes. EVITA Medium adds an internal CPU and internal-only asymmetric for ECU-to-ECU authentication. EVITA Full adds hardware-accelerated asymmetric (ECC or RSA) and is sized for gateway and V2X. SHE and the EVITA levels are broadly aligned rather than identical. Choosing Medium where the threat model needs Full's external-interface crypto is a class-mismatch finding; choosing far above the threat is cost you cannot justify.
Keys that stay inside the boundary. Long-term secrets are generated, stored and used inside the HSM, with any exposure outside that boundary minimized and justified. The RAV4 case is what "in software" costs.
A complete secure boot chain. Every stage authenticates the next, with no gap where an unsigned image slips in, and anti-rollback anchored in a hardware monotonic counter (fuse- or OTP-backed) inside the HSM, not a value the application core can rewrite. Otherwise the rollback check is itself in the bypassable surface.
A SecOC configuration that actually closes the gap. Which messages are protected, the MAC length, the freshness scheme, and what happens to unprotected traffic sharing the same bus.
A performance budget that holds. Authentication adds latency and load, and a steering or braking message has a deadline. If the secure path cannot meet it under worst-case load, the design fails for a different reason.
Independent assurance appropriate to the risk claim. Where required by the product or market, evidence may include FIPS 140-3 for the module, Common Criteria for physical attack resistance, SESIP for a faster composable evaluation of the root of trust, or equivalent. Many production deployments carry none of these; the point is to match the evidence to the claim.

Each of these is a claim you will defend in front of an assessor. Which is exactly where testing alone starts to run out.

4. The tests that actually mirror the attacker

Fault injection, through voltage or clock glitching, EM or laser: skip an instruction, corrupt a comparison, bypass a secure boot check or a read-out protection. This is the RAV4 technique.
Side-channel analysis, power and EM: recover a key by correlating physical leakage of intermediate values during a crypto operation, for example CPA or DPA on Hamming-weight leakage, with no logical flaw at all. It is defeated only by masking and hiding countermeasures that themselves need verification.
Negative testing and fuzzing of the HSM command interface and the diagnostic interface: malformed lengths, illegal key handles, illegal state transitions, boundary values. Include debug and manufacturing interfaces left accessible in production configurations.
Penetration testing: an end-to-end adversarial assessment against the real ECU, in the car.

In AUTOSAR the chain matters here. Requests flow through the Crypto Service Manager, then the Crypto Interface, then a Crypto Driver, and that driver dispatches either to a software implementation or to the HSM. SecOC sits above the Crypto Service Manager for bus authentication. The security question is which driver instance a given key is bound to: a key whose interface channel resolves to the software driver never gains hardware isolation, however correct the higher-level call looks. That is structurally the RAV4 failure.

Run only the functional suite and you are testing the lock by turning it with the right key. Run these and you are testing it the way the person who wants in will.

5. Where the standards point, and where they stay silent

ISO/SAE 21434 governs the process: TARA in Clause 15, secure product development in Clauses 9 and 10, and verification and validation across the lifecycle. It recognizes secure boot and roots of trust as important technical controls. Crucially, its validation step requires demonstrating that cybersecurity goals are met through penetration testing, and feeding any weakness found back into vulnerability management. In other words, the standard already mandates the adversarial testing argued for here. It is not a crypto test specification.

Regulation made the process non-optional, on a two-certificate structure. UNECE R155 requires a manufacturer Cyber Security Management System, and R156 a separate Software Update Management System. Each is independently certified and is a prerequisite for type approval. The CSMS applied to new vehicle types from July 2022, and from July 2024 it extends in effect to newly registered vehicles of categories in scope, subject to national grandfathering. The HSM is the hardware that makes the required technical controls enforceable.

AUTOSAR routes cryptography through a stack that can land on a software driver or a hardware one, with SecOC above it for bus authentication. The conformance regimes certify the module. But none of them, on their own, prove that your specific boot, your specific SecOC configuration and your specific key isolation hold on every path, including the path where a fault lands at the worst possible microsecond.

Test broadly. Prove the critical properties.

Testing samples inputs. An attacker searches the space that matters. A glitch at one exact microsecond, or a replay at one exact counter value, is a state your test campaign was never going to enumerate. So reframe the guarantees that matter most as invariants to prove, not cases to run:

A key never leaves the HSM boundary in plaintext, on any path.
Secure boot rejects an unsigned or rolled-back image, on every path.
SecOC accepts a frame only if the MAC verifies and the freshness strictly increases.

Frame each as a safety property, "the key never leaves in plaintext on any path," which is checkable on finite execution prefixes. Model a glitch as a non-deterministic adversarial transition in the state machine, so the model checker explores the off-nominal states a directed campaign cannot enumerate: instruction skip, corrupted compare, mid-boot reset. The proof then covers the fault classes you explicitly represented in the model, not every physical glitch in the universe, but precisely the ones a test plan keeps missing. This is the same discipline we apply at safety certification gates. It turns "we ran four hundred cases and they passed" into "we proved the property holds across the modeled state space, including under the modeled faults." Test for functional correctness. Prove the boundary.

6. How this changes the way you sign off

A passing HSM functional suite is the floor, not the ceiling. Add the adversary's tests, glitch, side-channel and negative, and for the properties that matter most, add a proof instead of a sample. Treat key isolation, boot integrity and message authenticity as release criteria, not just test results. Decide deliberately what runs in software and what runs behind the boundary, because the RAV4 case is what the software choice can cost.

Be honest: on your last program, did HSM sign-off rest on the cases you tested, or on a property you proved could not be bypassed?

Want a 30-min walkthrough on your project?

No NDA needed. Tell us the standard, the item or asset, the assessor, and your deadline. Within 48 hours you’ll get a one-page diagnostic mapped to the points above, yours to keep, whether or not you hire us.

Book a cyber walkthrough

Author: Adrian Valea, Founder & Managing Director, SafetyTrust Software Technology GmbH. ASPICE Provisional Assessor (intacs / VDA), Automotive SPICE for Cybersecurity (intacs), Functional Safety Engineer (TÜV Rheinland), Automotive Cybersecurity (TÜV NORD). Published 2026-06-09.

HSM Testing, prove the boundary, not just the crypto.