Aikido

Using Reasoning Models in AutoTriage

Berg SeverensBerg Severens
|
#
#

TL;DR

Reasoning models aren't needed for most SAST rules, but for edge cases like path traversal in JavaScript, they catch twice as many false positives.

Are Reasoning Models All Hype?

“Reasoning Models” are having a moment. The big AI labs are locked in a leadership battle; pushing the limits of model size and performance through scaling laws, smarter pretraining, and fine-tuning with RLHF (Reinforcement Learning from Human Feedback). They’re also layering on chain-of-thought prompting to make models “think out loud” during inference. This lets them dominate over non-AI systems on logic tasks - and top the leaderboards while they’re at it. 

That’s impressive. But are they really useful in practice? It depends.

In the case of AutoTriage*, much depends on the complexity of the SAST** rule.

*Quick recap - AutoTriage is a feature that Aikido uses to filter out false positive SAST findings.
**
A SAST finding is a potential vulnerability discovered in source code, as reported by a hardcoded pattern detector.

Overly Expensive for Most SAST Rules

AutoTriage works in two steps: we first attempt to rule out the possibility of exploitability. If that’s possible, we can filter out a false positive. This requires some black-and-white thinking on reachability of user-controlled variables into vulnerabilities. The model basically checks if the variable is actually user-controlled and if there is sanitization/ validation/ casting in place. And if there is some form of mitigation, it works out if it is actually effective.

The second step only happens when we can’t rule out the possibility of exploitability in the first step. We will then focus on prioritization. The priority is defined by a likelihood that something could go wrong and the severity if it was to go wrong. This second step is less black-and-white, but also depends on subjective estimates - for example, a variable being user-controlled when we lack complete context.

For most rules, we can summarize how to do this in a reasonable number of ‘rules of thumb’, making reasoning models overkill: they tend to have a similar accuracy, but they come with a significantly higher price tag.

Why Small Reasoning Models Work

Some rules are surprisingly complex and non-reasoning models have difficulty grasping them. Imagine using the exact same mental space for each word you say: initially someone asks you: “what  is 1+1?” Then, that same person asks you “what is 26248 + 346237?” While normal models struggle with  varying levels of complexity, reasoning models can handle them by just using more words for complex inputs and breaking down larger problems into smaller, more manageable, subproblems.

Unfortunately, because they consume more tokens, they are generally also more expensive. However, models structured as reasoning models suffer less from downsizing the model than non-reasoning models. There are two reasons for going towards bigger models: (1) they have more capacity for storing more knowledge (but having a lot of knowledge is not really necessary in the case of triaging vulnerabilities). (2) Bigger models tend to be a bit more accurate per word. However, reasoning models can recover from making mistakes, thanks to their reasoning structure. So despite consuming more tokens, it is feasible in practice to work with smaller models with a lower per-token cost to compensate for the higher token usage.

Path Traversal in Javascript

Path traversal is a rule where reasoning models can really shine, because they are surprisingly complex to triage. Path traversal is a vulnerability where end users could read or write files outside of an intended directory. For example, imagine that Google Drive would have a folder dedicated to each user separately on a file system like this:

Google Drive/userId1/
Google Drive/userId2/…

Next time you want to download one of your files, you send a GET request from your browser client to Google Drive, e.g. with filename myDogEatingShoes.jpg. If that file exists, your download will promptly start. But what if you tried the following filename: ../userId2/mypasswords.txt. If Google Drive would not have protected its back-end against path traversal, then you might be able to download a ‘mypasswords.txt’ file from another user, if that file exists.

Different Path Traversal Attacks

In order to triage path traversal SAST findings, we need to understand different cases when something is vulnerable or not. Let’s start with the straightforward cases and gradually build up the complexity.

Pattern 1: ‘../’

The elephant in the room here is the ‘../’ pattern. If you read from or write to a filepath with ‘../’ in it, it could escape the intended directory and read/ write somewhere you did not intend. So if there is no check for ‘../’ in the filepath and the file is specified client-side, you have a real vulnerability. In the really bad cases, hackers could read files containing credentials on your system.

Pattern 2: ‘..\\’

Imagine you checked for ‘../’, but the code is running on a Windows system. You would have a problem again, since path traversal is still possible with ‘..\\’ patterns. So far so good, two rules of thumb to check for are still manageable, right?

Pattern 3: ‘..’

In order to get nice and clean paths without missing slashes, a lot of people use functions like path.resolve() or path.join(). Here is where the fun starts. Imagine something like this:

if (userControlledSubPath.includes(‘../’) || userControlledSubPath.includes(‘..\\’)|| filename.includes(‘../’) || filename.includes(‘..\\’)) 
{	
   throw new Error(‘Path traversal attempt detected);
}‍

const filepath = path.join(HARDCODED_BASE_PATH, userControlledSubPath, filename);

return fs.readFileSync(filepath);‍

It turns out that this is still vulnerable: if an attacker uses userControlledSubPath === '..', the path.join will still interpret it as going one directory up.

However, the last argument in path.join() is immune to that attack. If an attacker would specify the ‘..’ in the last argument, the path.join() function would return a directory instead of a filepath, which would result in invalid read/ write operation.

Pattern 4: ‘/*’

In a new example we again had a test like this:

if (filename.includes(‘..’)) 
{	
    throw new Error(‘Path traversal attempt detected);
}

const filepath = path.resolve(HARDCODED_BASE_PATH, filename);

return fs.readFileSync(filepath);

This looks safe, right? The check covers the ‘..’, the ‘../’ and the ‘..\\’ case - it’s elegant! Now comes the surprising way of how this is still vulnerable. Drum roll… trrrrrrrrr… When an argument in path.resolve() starts with a slash, it ignores all previous arguments. So when an attacker does something like filename = /etc/passwd, then path.resolve() will ignore the hardcoded base path and resolve to /etc/passwd. Scary, right? We should have checked for that trailing slash as well. Note that using path.join() would have made it safe.

Appreciating the Complexity

Charlie Chaplin once said ‘Simplicity is not a simple thing’. This applies here too: simple, effective remediation exists, but you first need to understand the range of possible attack vectors first. The simplest, most robust remediation against path traversal is to first resolve the path and check if it still starts with the intended base path. There is no way of escaping that check.

However, the AutoTriage team does not have the luxury of being able to choose remediation. We need to be able to mark alternative solutions as safe so we don’t unnecessarily overwhelm customers with false positives. We have now seen 4 different attack vectors of path traversal and they all come with specific checks. For each of these attack vectors, the LLM needs to check if it can possibly occur with all the requirements to either make a successful attack or to rule out any possibility of an attack.

Despite the fact that reasoning models are not the default for most rules, they are able to safely filter out twice as many false positives as compared to non-reasoning models for path traversal in JavaScript. That is a game changer for noise reduction.

Get secure for free

Secure your code, cloud, and runtime in one central system.
Find and fix vulnerabilities fast automatically.

No credit card required |Scan results in 32secs.