ECU Penetration Testing: A Threat-Led Methodology for Automotive Control Units

1. A penetration test is not a vulnerability scan

The first thing we say to a new program is also the thing nobody wants to hear: the vulnerability scan you already paid for did not test the attacks that matter. A scanner is tool-led. It walks a catalogue, open services, known CVEs, default credentials, weak TLS ciphers, and reports deviations from that catalogue. That is genuinely useful for the IT-style attack surface of a connectivity gateway or an infotainment Linux build, and we run those scans too. But an automotive control unit is not a web server, and the attacks that compromise it almost never appear in a CVE feed.

Consider a seed-key scheme in UDS SecurityAccess where the “random” seed is derived from a free-running timer that resets on every wake-up. No scanner flags it, the service responds correctly, the algorithm is “proprietary”, the firmware has no public CVE. Yet an attacker who records a handful of seed/key pairs and notices the seed never strays far from zero after a power cycle can reach the programming session in minutes. That is the interesting attack, and it is invisible to a tool that does not reason about the system as an adversary would.

Penetration testing is threat-led. We start from what an attacker wants, start the engine without a key, flash unsigned firmware, forge a torque request on the bus, exfiltrate a long-term key, and we work backwards to the concrete steps that would achieve it on this ECU, with its real connectors, its real diagnostic stack, its real boot chain. In our experience roughly half of the highest-severity findings we report would never surface in an automated scan, because they live in design assumptions, not in software versions.

2. Scope comes from the TARA (ISO/SAE 21434 Clause 15)

Threat-led only means something if the threats are written down, and in the automotive world they are: ISO/SAE 21434 Clause 15 defines the Threat Analysis and Risk Assessment. A good TARA is the single best test plan a pentester can be handed, and a weak one is the first finding we report.

The Clause 15 chain gives us exactly what we need to scope effort. Asset identification tells us what has cybersecurity properties worth protecting, a calibration table’s integrity, an immobiliser secret’s confidentiality, a safety actuator command’s authenticity. Threat scenarios and damage scenarios connect each asset to what goes wrong if its security property is violated, and how badly. Attack path analysis enumerates the routes an attacker could take, and attack feasibility ratingwhether you use the attack-potential approach (elapsed time, specialist expertise, knowledge of the item, window of opportunity, equipment) or a CVSS-based or attack-vector-based method, scores how realistic each path is.

That feasibility rating is where the money goes. A path rated high feasibility against a high-impact asset gets the deepest, most adversarial testing we offer, we will sit on it for days. A path rated low feasibility (remote, exotic equipment, narrow window) still gets a sanity check, because feasibility ratings are estimates and part of our job is to falsify them. When our bench result contradicts the TARA, a path the analysis rated “low” that we walk in an afternoon, that is one of the most valuable things a pentest produces, because it feeds straight back into the risk treatment decision.

3. The real attack surface of a modern ECU

Before we touch a tool we map the physical and logical surface, because an ECU presents far more of it than the datasheet implies. We look at:

Debug and trace interfacesJTAG, SWD, and vendor debug ports. We check whether they are truly fused or locked in production silicon or merely “disabled” in software, whether test points survive on the PCB, and whether a debug-authentication scheme can be downgraded or bypassed. An open debug port is game over for confidentiality.
UDS diagnostics over CAN, CAN-FD and Automotive Ethernet (DoIP, ISO 13400). The diagnostic stack is the richest legitimate interface into an ECU and therefore the richest attack surface, sessions, security access, routines, memory read/write, and data identifiers all live here.
The bootloader and secure flashing pathhow the ECU receives, verifies and activates new software, including the primary boot loader, any secondary or programming boot loader, and the signature and version checks around them.
In-vehicle networksthe buses the ECU shares with others. We assess what a compromised or spoofed neighbour can inject, whether message authentication (SecOC) is present and enforced, and what happens at gateway boundaries.
Wireless and companion interfaceswhere present, BLE, Wi-Fi, UWB, NFC and any companion-app or backend channel, plus the provisioning and pairing flows behind them.

The point of the map is not completeness for its own sake; it is to overlay the TARA attack paths onto real, physical entry points so that test effort lands where both the threat model and the hardware say it should.

4. Methodology, NIST SP 800-115, adapted to the bench

We run on the four phases of NIST SP 800-115PlanningDiscoveryAttack and Reportingbecause the structure is sound and auditors recognise it. What changes is the embedded automotive context that fills each phase.

In Planning we agree rules of engagement, the test object (sample ECUs, bench harness, vehicle, or HIL rig), the credentials and documentation we receive (black-, grey- or white-box), and crucially we translate TARA threat scenarios into concrete abuse cases: “an attacker on the diagnostic connector reaches the programming session and flashes modified calibration” is testable in a way that “integrity threat to calibration data” is not.

In Discovery we enumerate the surface for real, which UDS services and sessions respond, which DIDs and routines exist, what the boot and memory map look like, what the debug ports admit. Grey- or white-box access (A2L and ODX descriptions, schematics, source) makes this faster and deeper, which is why we recommend it for anything bound for type approval.

In Attack we execute the abuse cases, chaining primitives the way a real adversary would: a downgrade here, a replayed message there, a brute-forced seed to unlock a routine that then exposes memory. In Reporting every finding ships with reproducible evidence and a remediation path. The phases are iterative, not linear, an attack-phase discovery routinely sends us back to enumerate more.

5. Diagnostic security testing, UDS

The UDS stack is where we spend a large share of bench time, because it is the attacker’s most convenient front door. Our work centres on a handful of services.

SecurityAccess (0x27) is the headline. We assess the full seed-key lifecycle: entropyis the seed genuinely unpredictable, or derived from a timer, counter or constant that an attacker can model? Brute-force resistanceis the key space large enough, and does the ECU enforce a delay and a hard attempt limit so an exhaustive search is infeasible? Replaydoes a key captured in one session unlock a later one, i.e. is the seed actually fresh per request? We have seen all three fail: 16-bit effective key spaces, no attempt counter so a script grinds through unhindered, and seeds that repeat after a power cycle. Good seed-key design uses a cryptographically strong seed, a key derived with a real algorithm and a per-ECU secret (not a fleet-wide constant), a strictly enforced delay-and-lockout on failed attempts that survives reset, and protection levels that map sensible privileges to sensible sessions.

Authentication (0x29)the certificate and challenge-response service introduced for stronger diagnostic authentication, we test for correct certificate-chain validation, challenge freshness, and whether it can be skipped in favour of a weaker legacy 0x27 path.

DiagnosticSessionControl (0x10) is the gatekeeper to extended and programming sessions. We check whether those privileged sessions can be entered without the security access they should require, whether session state is enforced consistently, and whether timeouts and S3 keep-alive handling can be abused to hold a privileged session open.

RoutineControl (0x31) exposes the ECU’s internal actions, erase, self-test, calibration writes, key operations. We hunt for routines that perform sensitive operations behind insufficient session or security gating, accept unchecked parameters, or can be triggered in an unsafe vehicle state.

6. Secure boot & SecOC

If diagnostics are the front door, the boot chain is the foundation, and SecOC is the integrity of everything spoken on the bus. Both deserve adversarial attention.

For secure boot we verify the chain of trust end to end: does the immutable root actually measure and authenticate the next stage, does each stage authenticate the next, and is the signature check real rather than a return code that can be coerced? The most common substantive gaps we find are around versioning. Downgrade and anti-rollback weaknesses let an attacker install an older, validly signed image whose vulnerabilities have since been patched, the signature checks out, but version monotonicity does not, so the fix is undone. We confirm that rollback protection exists, is enforced before activation, and cannot be reset.

For SecOC (Secure Onboard Communication, the AUTOSAR mechanism) we look at freshness and key handling. Replay resistance depends on a freshness value that both sides agree on; we test what happens when freshness is desynchronised, when a message is replayed inside the verification window, and whether truncated MACs leave enough strength. We also examine how SecOC keys are stored and distributed, because a perfect protocol with a recoverable key protects nothing.

One class of attack belongs here at a conceptual level: fault injection and glitchingperturbing voltage, clock or the electromagnetic environment to make a processor skip an instruction such as a signature comparison. We assess whether a target is exposed to this class and whether countermeasures (redundant checks, randomised timing, sensors) are present. We do not publish glitch parameters or recipes; the deliverable is whether the protection holds, not a how-to.

7. Key management & HSM/SHE

Every protection above, secure boot, SecOC, diagnostic authentication, immobilisation, ultimately rests on keys, so we follow the keys. The questions are deliberately about design and exposure, not about extracting secrets for their own sake.

Where do the keys live? A hardware security module (HSM) or SHE (Secure Hardware Extension) exists precisely so that key material never appears in addressable application memory. We check that this is actually the case, that signing and verification happen inside the secure peripheral, that keys are not staged in RAM or logged in trace, and that the boundary between application core and secure subsystem is respected.

How are keys provisioned? Provisioning is a recurring weak point: fleet-wide shared keys that turn one recovered device into a master key for all of them; debug or test keys that survive into production; predictable key derivation from public identifiers. We examine the provisioning and personalisation flow for these patterns.

What does the key hierarchy assume? We map which key protects which asset, whether compromise of one key is contained or cascades, and whether key update and revocation are even possible in the field. Throughout, our framing is verification, we demonstrate that a protection is sound or that an assumption is false, without weaponising recovered material.

8. From finding to fix

A finding nobody can act on is wasted bench time, so every issue we raise is engineered to be fixed. Three things make that possible.

Reproducible evidence. Each finding ships with the exact steps, the bus traces or captures, the tooling state and the conditions needed to reproduce it. A development team should be able to re-run our result without us in the room, and so should the retest.

Severity that means something in two languages. We rate each finding with CVSS for a familiar cross-industry number, and alongside it with ISO/SAE 21434 attack feasibility so the result speaks the language of the cybersecurity case. A finding can be CVSS-high but feasibility-low, or the reverse; reporting both stops a single number from mis-ranking the work.

Root cause, not just symptom. Because we also build these stacks, bootloaders, diagnostic and security software, AUTOSAR integration, our findings tend to point at the underlying cause rather than the surface. “0x27 is brute-forceable” is a symptom; “the delay timer is reset by the same wake-up that reseeds, so lockout never accrues” is a root cause a developer can fix once. We then prioritise remediation by risk, and we retest, a fix that is not retested is a hope, not a result. If you want the engineering side of that loop, see our software engineering and functional safety work, since security and safety failures increasingly share root causes.

9. How pentest evidence feeds UNECE R155

None of this is testing for its own sake. Under UNECE R155, type approval requires a Cybersecurity Management System and, per vehicle type, evidence that the identified risks have been managed and that the implemented controls have been verifiednot merely designed. Penetration testing is one of the strongest forms of that verification, because it demonstrates rather than asserts.

Concretely, the pentest closes the loop opened by the TARA. The TARA claimed certain attack paths were treated to an acceptable feasibility; the pentest provides independent evidence that the controls actually hold, or names exactly where they do not, so the gap can be closed before approval rather than discovered after a recall. The findings, severities, remediation and retest results become traceable artefacts in the cybersecurity case and the type-approval evidence file, mapped back to the very damage and threat scenarios that scoped them. That traceability, threat scenario to attack path to test to evidence to fix, is what turns a pile of test reports into an approval-grade argument.

This is also why we run the pentest threat-led from Clause 15 rather than as a detached exercise: evidence that maps to the risk assessment is evidence an assessor can follow, and evidence an assessor can follow is the difference between a clean type-approval submission and a round of findings. For how this fits the wider regulatory picture, see our cybersecurity services and the ECU penetration testing offering behind this method, alongside our companion piece on embedded cybersecurity under UNECE R155.

Want a 30-min walkthrough on your project?

No NDA needed. Tell us the standard, the item or asset, the assessor, and your deadline. Within 48 hours you’ll get a one-page diagnostic mapped to the points above, yours to keep, whether or not you hire us.

Book a cyber walkthrough

Author: Adrian Valea, Founder & Managing Director, SafetyTrust Software Technology GmbH. ASPICE Provisional Assessor (intacs / VDA), Automotive SPICE for Cybersecurity (intacs), Functional Safety Engineer (TÜV Rheinland), Automotive Cybersecurity (TÜV NORD). Published 2026-06-04.

ECU Penetration Testingthreat-led from the TARA.