From TARA to Attack: Verifying Your Cybersecurity Mitigations Actually Hold

1. A TARA is a hypothesis, not evidence

Every threat analysis and risk assessment we read ends the same way: a table of risk treatment decisions. Reduce, share, retain, avoid. Against each “reduce” row sits a mitigation, secure boot, message authentication, access control, a hardened diagnostic gate, and an implicit promise that, once that mitigation exists, the residual risk is acceptable. That promise is the part nobody has tested yet.

A TARA is a structured argument built on assumptions. It assumes the attacker needs specialised equipment. It assumes a key is unrecoverable. It assumes secure boot rejects an unsigned image. ISO/SAE 21434 keeps these claims and their proof in separate places by design: the TARA methods in Clause 15 produce the threat scenarios, impact ratings and attack-feasibility ratings, and feed the cybersecurity goals defined in the concept phase (Clause 9); those goals become cybersecurity requirements in product development (Clause 10); and it is also within product development that you are obliged to verify the implementation against those requirements, before vehicle-level cybersecurity validation in Clause 11. The TARA is the hypothesis. The test campaign is the experiment.

In our experience the gap between the two is wider than teams expect. We routinely see a TARA that rates an attack as “Very Low” feasibility because it “requires chip-level access”, only for the same effect to be reachable over the diagnostic bus with a laptop because a debug interface was never fused off, or because a SecurityAccess seed turned out to be a 16-bit counter. The risk rating was honest; the assumption underneath it was wrong. Verification is the only mechanism that finds that out before the field does. On our benches roughly half of first-time submissions carry at least one finding that contradicts an assumption baked into the TARA.

So the question we ask of every TARA we are handed is blunt: where is the evidence? Not the design intent, not the requirement text, the verdict from a test that tried to do the thing the mitigation is supposed to prevent. If that verdict does not exist, the risk treatment is an opinion.

2. Turning threat scenarios into test cases

The bridge from analysis to evidence is mechanical, and that is the point. Each threat scenario in the TARA, expressed as a damage scenario realised through one or more attack paths, becomes one or more abuse cases: a test whose objective is to achieve the threat, not to confirm the feature works.

A functional test asks “does authenticated diagnostics succeed with the right credentials?” An abuse case asks “can I reach the protected service without them?” Same interface, inverted intent. We derive the abuse case directly from the attack path the TARA already documented, so the trace is unbroken: threat scenario → attack path → abuse case → verdict.

Each abuse case needs explicit, falsifiable pass/fail criteria written before testing begins, otherwise “we tried for two days and didn’t get in” masquerades as assurance. We define them as concretely as the mitigation allows. For example:

Spoofing of a safety-relevant frame → criterion: a frame injected with a manipulated SecOC payload and/or stale freshness value is rejected by the receiver and raises the expected error handling; PASS = rejected on every one of N attempts.
Bypass of authenticated diagnostics → criterion: no sequence of requests transitions the ECU into the unlocked state without a valid seed-key exchange; PASS = locked state holds under all replay, fuzz and timing variants tried.
Execution of unauthorised firmware → criterion: an image with a broken or downgraded signature does not execute; PASS = boot halts or rolls back to a known-good image.

Writing the criterion forces a useful argument up front: what observable would actually prove the mitigation held? If the team cannot state it, the requirement behind it is probably untestable, which is itself a finding worth raising back into the cybersecurity requirements in Clause 10.

3. Attack-feasibility as a prioritisation tool

You cannot test everything to the same depth, and ISO/SAE 21434 already hands you the prioritisation rationale: the attack-feasibility rating. The standard offers several approaches; the one we use most often is the attack-potential-based approach (aligned with the ISO/IEC 18045 vulnerability-analysis tradition), which scores each attack path across five factors:

Elapsed timehow long the attack takes to identify and execute, from minutes to many months.
Specialist expertiselayman, proficient, expert, or multiple experts.
Knowledge of the item or componentpublic information versus restricted, sensitive or strictly confidential internal detail.
Window of opportunityunlimited, easy, moderate or difficult access to the target, including how long that access must persist and whether it is remote or physical.
Equipmentstandard, specialised, bespoke, or multiple bespoke tools.

The factors combine into an attack potential, which maps to a feasibility rating from High down to Very Low. Here is how we actually use it in a verification plan, and it is not the obvious way. The high-feasibility paths get the most depth because they are the cheap attacks an opportunistic adversary will find first, the laptop-on-the-diagnostic-bus class. But we also deliberately probe the assumptions that lowered a rating. If a path was rated Low only because it “requires specialised equipment”, the test that matters is whether commodity tooling has quietly made that equipment standard since the TARA was written. Feasibility ratings decay. Glitching rigs, logic analysers and software-defined radios that were exotic five years ago are now hobbyist-grade.

So feasibility steers effort two ways: depth where exploitation is easy, and scrutiny where the rating depends on an assumption that may no longer be true. A rating is a prediction; the pentest is the calibration.

4. Verifying each mitigation class with concrete criteria

Generic test plans produce generic findings. Each mitigation class fails in characteristic ways, and the abuse cases have to target those specifics. This is the heart of an ECU penetration testand it is the level our penetration-testing methodology works at.

Secure boot, can the chain of trust be bypassed or downgraded?

The claim is that only authentic, current firmware executes. We attack the claim, not the cryptography. Can an image with an invalid or truncated signature still boot? Is the root of trust immutable, or does it live somewhere reflashable? Is there a downgrade pathwill the device accept an older, validly signed image that carries a known vulnerability, because anti-rollback counters are absent or resettable? Does a fault during verification fail open or fail closed? Is the debug/JTAG interface actually fused off in production silicon, or merely “disabled in software”? A chain of trust is only as strong as its weakest link, and in our experience the weak link is almost never the signature check itself, it is rollback protection or an unlocked debug port.

SecOC, freshness, replay and forgery

Secure Onboard Communication protects the authenticity and freshness of in-vehicle messages. We test all three properties. Forgery: a frame with a forged or absent MAC must be rejected. Replay: a previously valid, captured frame replayed later must be rejected because its freshness value is stale, this is where implementations leak, through truncated freshness, predictable counters, or receivers that accept too wide a freshness window. Verification-failure handling: a sustained stream of invalid MACs must not degrade into a fail-open state or a denial of service on the bus. We pay particular attention to freshness management: a perfect MAC over a guessable counter is not replay protection.

UDS SecurityAccess (0x27), entropy, brute-force and replay

This is the most common soft spot we find. The seed-key mechanism is only as strong as its seed entropy and its lockout policy. We assess seed entropy and predictability (is the “random” seed actually a counter, a timestamp, or a fixed value after reset?), we attempt seed-key replay (does a previously observed valid key still unlock the same seed?), we evaluate brute-force resistance against the key space and the delay and attempt-limiting behaviour, and we check that the unlocked session cannot be reached by skipping the exchange entirely through state-machine abuse. A 16-bit seed with no exponential back-off is not access control; it is a speed bump.

Key management, extraction and provisioning

Every mitigation above rests on keys staying secret. We assess whether key material can be extracted, from external flash, from logs or diagnostics, from debug interfaces, or because keys never moved into the HSM’s protected boundary in the first place. We look hard at provisioning: are keys unique per ECU or shared across a fleet (so one extraction compromises all of them)? Is the same key reused across development and production? Are test keys still trusted in the field? A flawless HSM is irrelevant if the key was logged in plaintext during end-of-line programming.

5. Negative and abuse testing

Conventional verification is dominated by the happy path: prove the system does what the requirements say it should. Security verification is the inverse discipline. We are testing what the system must never do, and you cannot demonstrate “never” by confirming the intended behaviour works.

Negative testing means deliberately feeding the conditions a positive requirement implicitly forbids: malformed frames, out-of-sequence diagnostic requests, signatures off by one byte, freshness values from the past, sessions entered through unexpected state transitions. Abuse testing goes further, it pursues an attacker’s goal rather than a single malformed input, chaining steps the way a real adversary would: probe the diagnostic surface, find an unprotected service, use it to read a memory region, recover a key, replay it elsewhere.

This is also where requirements that read perfectly well on paper reveal their gaps. “The ECU shall authenticate diagnostic requests” is a fine functional requirement and a poor security one, because it says nothing about what happens to the thousand request shapes that are not the authenticated path. We routinely surface, during abuse testing, that a service was protected in one session type but exposed in another, or that an error-response timing difference leaks whether a guess was close. Fuzzing the diagnostic and communication interfaces is part of this, not to prove robustness in the abstract, but because a crash under malformed input is frequently the first link in a chain that ends at code execution. None of these are visible from the happy path, by construction.

6. When the pentest invalidates the TARA

The uncomfortable, and most valuable, outcome is when a test succeeds where the TARA said it shouldn’t. An attack rated Very Low feasibility works in an afternoon. This is not a failure of the TARA, it is the TARA doing its job, by being falsifiable. What matters is that the loop closes.

When a finding contradicts the analysis, we drive a defined sequence rather than logging a defect and moving on:

Re-rate the attack feasibility with the evidence in hand. The window of opportunity, equipment or expertise that justified the original rating was wrong; correct it. The path that “needed an expert and bespoke equipment” needed a laptop and a forum post.
Re-assess the risk. A higher feasibility against the same impact pushes the risk value up, which can change the risk treatment decision entirely, what was acceptable to retain may now demand reduction.
Re-treat: derive or strengthen requirements. The new or amended cybersecurity requirements flow back into product development (Clause 10), then into the implementation. Anti-rollback counters get added; the seed generator gets real entropy; the debug port gets fused.
Re-verify. The fix gets its own abuse case and its own verdict. A mitigation that has not been re-tested after change is, again, a hypothesis.

This is what ISO/SAE 21434 means when it treats the TARA as a living work product rather than a milestone deliverable. The same loop runs post-SOP through continuous cybersecurity monitoring: a new vulnerability disclosed in a shared component is, in effect, a feasibility re-rating event arriving from outside, and it re-enters at step 1. A TARA finalised and filed is a TARA that has stopped telling the truth.

7. Evidence and traceability

Everything above is only worth as much as the chain that ties it together. The deliverable an assessor, and, behind the assessor, a UNECE R155 type-approval, actually wants is a single, navigable thread from asset to verdict:

asset → threat scenario → cybersecurity goal → cybersecurity requirement → mitigation → test case → verdict

Every link must be bidirectional. Forward, so that no cybersecurity goal lacks a requirement, no requirement lacks a verifying test, and no test lacks a recorded verdict. Backward, so that any given test result can be explainedwhy does this abuse case exist, which threat does it retire, which asset does it protect? An orphaned test proves nothing; an untested requirement protects nothing. The traceability is what converts a pile of activity into an argument.

This chain is also the natural home for residual-risk justification. Where a test fails and the residual risk is accepted rather than mitigated further, the verdict, the re-rated feasibility and the acceptance rationale all live on the same thread, so the decision is auditable, attributable and revisitable when feasibility shifts later.

For type approval under R155this is not paperwork for its own sake. R155 obliges the manufacturer to demonstrate, through the Cyber Security Management System, that risks are identified and managedand that the mitigations selected to address them have been tested for effectiveness. “Managed” and “tested” are exactly the two claims that an asset-to-verdict trace substantiates, the verification evidence is what turns a TARA from an internal analysis into a defensible part of the type-approval dossier. We treat the trace as the primary product of the whole exercise, with the test reports as its leaves; the relationship between R155, the CSMS and this evidence chain is something we unpack further in our note on embedded cybersecurity under UNECE R155. A risk assessment is where you decide what could go wrong. The traceable verification chain is where you prove you were right, or where you find out, on your own bench, that you were not.

Want a 30-min walkthrough on your project?

No NDA needed. Tell us the standard, the item or asset, the assessor, and your deadline. Within 48 hours you’ll get a one-page diagnostic mapped to the points above, yours to keep, whether or not you hire us.

Book a cyber walkthrough

Author: Adrian Valea, Founder & Managing Director, SafetyTrust Software Technology GmbH. ASPICE Provisional Assessor (intacs / VDA), Automotive SPICE for Cybersecurity (intacs), Functional Safety Engineer (TÜV Rheinland), Automotive Cybersecurity (TÜV NORD). Published 2026-06-02.

From TARA to attack verifying your mitigations actually hold.