A Year in Perspective: GenAI’s Journey in Security Vulnerability Remediation

By Aviram Shmueli

Published February 3, 2025.

A Year in Perspective: GenAI’s Journey in Security Vulnerability Remediation

A year ago, we set out to answer a provocative question: could Generative AI (GenAI) effectively solve one of the most stubborn challenges in cybersecurity—remediating security vulnerabilities in code?

Our research then yielded mixed results, with AI’s solutions falling short of expectations. While the technology showed promise, it was clear that it wasn’t ready to tackle the nuanced and intricate world of security fixes independently.

Fast forward to today, and the landscape looks strikingly different. With rapid advancements in AI models, we decided to revisit our hypothesis. This time, the results were far closer to the goal: GenAI can now generate fixes for common vulnerabilities with a level of accuracy and reliability that signals a turning point in its practical application to security. Additionally, we see very exciting potential above and beyond just generating fixes.

The Problem Space

There’s a real need for reimagining how security fixes can be streamlined and automated, because eventually there will be more machines than people to manage them, and we need to quickly level up security to machine-driven scale.

Until recently, researchers would meticulously review each vulnerability type and its specific instantiations to develop precise fixes. This method, while effective, is slow and resource-intensive.

A year ago with the rapid adoption of AI we believed it could offer a potential breakthrough: the ability to generalize fixes for vulnerabilities by identifying patterns and creating reusable templates.

Imagine the potential.

Instead of crafting a specific fix for every instance of SQL injection, we could design a template capable of addressing variations of the issue at scale. This could revolutionize security operations, allowing teams to remediate a large number of findings across repositories efficiently. So we decided to put this hypothesis through rigorous testing, as security tooling demands.

The Research Methodology

The research team set out a year ago to test a bold hypothesis: could AI learn to generalize fixes for the most prevalent vulnerability types, creating reusable templates that address diverse instantiations of these issues?

To validate this, we focused on common vulnerabilities like SQL Injection, cross-site scripting (XSS), lack of encryption or weak encryption, command injections, buffer overflows, unrestricted Ingress/Eagress Rules (for IaC scanning) and many more. By studying variations of these vulnerabilities and their corresponding fixes, they aimed to train a model capable of developing effective remediation templates.

The research methodology in a nutshell:

1. Code base: Was built from public-facing repos + code in various languages generated by AI.

2. Scanning the codebase: Various security scanners were employed to scan these codebases to generate a database of security vulnerabilities.

3. Outcome: A database containing hundreds of vulnerability types and many thousands of their instantiations serves as the foundation for the research. For instance, the SQL Injection vulnerability type was detected in more than 100 code samples in different variations, providing the team with vulnerable code snippets mapped to vulnerabilities.

4. AI-Driven Fix Generation: The validated samples were fed into an AI model with sophisticated prompts, asking it to generate remediation templates.

5. Template Creation: These templates include instructions for common types of fixes, such as:

Substituting unsafe values with the secure ones.
Deleting vulnerable lines of code.
Adding new, secure code where necessary.

6. Fix Application and Validation: Scripts applied these fix templates, after adjusting them to the original code. The remediated code was then re-scanned to ensure vulnerabilities were resolved.

7. Measuring Success: Fixes were treated as successful when the scanner reported no further findings and only after a researcher reviewed that fix and approved it.

A year ago, this process was run on hundreds of vulnerability types, requiring up to 15 hours to generate a full set of templates. The team's focus on common vulnerabilities and their variations was a deliberate effort to make automation more practical, but limitations started to become evident when fixes were applied to real-world codebases.

When we first tested GenAI’s ability to automate security remediation, the results were a mix of promise and frustration. The AI generated remediation templates based on patterns identified across thousands of instances of vulnerable code. While 75% of the fixes were technically successful—meaning they passed security scanners—25% either broke the code, introduced irrelevant changes, or failed to address the root security issues.

Key limitations included:

Code Functionality: The AI struggled to maintain the original functionality of the code while addressing vulnerabilities.
Code Readability: The AI-generated fix led to code that is harder to read and understand.
Code Security: Despite the fix, the generated code often still contained security vulnerabilities.

In short, while GenAI hinted at the potential to revolutionize security operations, it lacked the nuanced understanding needed to remediate vulnerabilities at scale effectively.

Today’s Results: A New Chapter in GenAI’s Evolution

This year, we took GenAI for another roll, and the improvements were remarkable. Leveraging advancements in AI training and model architectures, we re-ran our experiments with an expanded dataset and similar prompts. The results demonstrated significant progress: Fix accuracy climbed to over 90%, with the vast majority of generated solutions passing functional and security validation.

The Outcome

Below, we’ll examine examples from last year where AI faced challenges in delivering effective remediation suggestions. We’ll also showcase how those suggestions look today.

Spoiler: they’ve improved significantly.

Show Me (the Failed) AI Code

After going back and reviewing every single example of the AI-generated code fixes, we found quite a number of almost funny fixes (I said almost). Hang on to your seat as we dive into some real examples of failed AI remediations.

Example 1: TelNet Protocol Fixes

TelNet, a legacy and insecure protocol widely known for its vulnerabilities, was flagged by the scanner due to its use in remote control and communication processes without proper encryption or security measures.

The AI’s fix:

Delete the Telnet line and replace it with:

# TODO - Find a more secure option

While the scanner reported the issue resolved, the functionality was left broken. Telnet’s replacement requires deeper integration across the codebase, which the AI’s naive fix failed to account for.

The AI’s fix (a year later):

Use the paramiko library, which provides a secure SSH client, as a replacement for telnetlib.

import paramiko
# SSH server details
HOST = "example.com"  # Replace with your SSH server
PORT = 22             # Default SSH port
USERNAME = "your_username"
PASSWORD = "your_password"

try:
    # Initialize the SSH client
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())  # Automatically accept unknown host keys (use cautiously)

    # Connect to the SSH server
    ssh.connect(HOST, port=PORT, username=USERNAME, password=PASSWORD, timeout=10)
    print("Connected to the SSH server")

    # Execute a command (example: 'ls')
    stdin, stdout, stderr = ssh.exec_command("ls")
    print("Output:", stdout.read().decode())
    print("Errors:", stderr.read().decode())

    # Close the connection
    ssh.close()

except paramiko.AuthenticationException:
    print("Authentication failed, please verify your credentials.")
except paramiko.SSHException as e:
    print(f"Unable to establish SSH connection: {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Example 2: Command Injection

Consider the following unsafe code, which is vulnerable because it uses shell=True, allowing arbitrary shell commands to be executed. This creates a significant risk of malicious user input, through my_domain parameter, that is passed into the command, as it could enable attackers to inject malicious commands:

output = subprocess.check_output(f"nslookup {my_domain}", shell=True, encoding='UTF-8')

The AI correctly identified that it needs to use shell=False instead of shell=True.

The AI’s fix:

It modified the code to:

output = subprocess.check_output(f"nslookup {my_domain}", shell=False, encoding='UTF-8')

However, the accuracy ends there.

The fix resulted in a broken code. The fix merely adjusted the shell parameter, instead of separating the command from its argument, and using a secure argument list.

The new AI fix (a year after):

The fix suggests that correct fix that results in a safer code that is working:

output = subprocess.check_output(["nslookup", my_domain], shell=False, encoding='UTF-8')

Example 3: Lack of Input Validation

Consider the following code which fails to sanitize user input for SQL queries, leading to SQL injection risks.

query = f"SELECT * FROM users WHERE username = '{username}'"

The AI’s fix:

The AI fix tries to escape single quotes.

query = f"SELECT * FROM users WHERE username = '{username}'".replace("'", "''")

By replacing ' with '' (a common way to escape single quotes in SQL), the AI aims to sanitize the input. However, this approach lacks proper context and does not ensure comprehensive protection against SQL injection. It also makes the query less readable and maintainable.

The new AI fix (year later):

The correct solution is to use parameterized queries:

query = "SELECT * FROM users WHERE username = %s"
cursor.execute(query, (username,))

The line cursor.execute(query, (username,)) is used to execute a SQL query in Python while safely inserting user input to prevent SQL injection.

Using placeholders and parameterized queries ensures that user input is properly escaped, preventing malicious input (e.g., SQL injection). For example, if a user enters a value like:

username = "'; DROP TABLE users; --"

The query sent to the database will NOT execute the malicious SQL because the parameterized query treats the username value as a literal string. The query will look like this:

SELECT * FROM users WHERE username = '''; DROP TABLE users; --'

Example 4: Directory Traversal

Another example: User input is directly used in file paths, making the application vulnerable to directory traversal:

file_path = f"/home/user/files/{user_input}"
with open(file_path, "r") as file:
    data = file.read()

The AI’s fix:

The AI tries to mitigate directory traversal by checking for “..”.

file_path = f"/home/user/files/{user_input}"
if ".." in file_path:
    raise ValueError("Invalid file path")
with open(file_path, "r") as file:
    data = file.read(

This is insufficient because attackers can bypass this check with encoding tricks (e.g., %2e%2e).

The AI fix (year later):

The proper fix involves sanitizing and validating the input or using secure libraries:

import os
import pathlib
base_dir = pathlib.Path("/home/user/files").resolve()
file_path = base_dir / user_input
if not file_path.resolve().is_relative_to(base_dir):
    raise ValueError("Invalid file path")
with open(file_path, "r") as file:
    data = file.read()

Lessons Learned

The evolution of AI in code generation over the past year highlights significant progress in its ability to identify and fix security vulnerabilities effectively. A year ago, AI-generated fixes often fell short, producing non-functional code, failing to address the root cause of vulnerabilities, or generating less readable and maintainable code. These shortcomings emphasized the limitations of early AI systems in understanding the nuanced trade-offs between security, functionality, and maintainability.

Today, improvements in AI models reflect a deeper contextual awareness and a more refined ability to propose fixes that not only eliminate vulnerabilities but also preserve the integrity and functionality of the original code. This evolution demonstrates the importance of continuous learning, model refinement, and the integration of domain-specific knowledge in advancing AI's role in secure software development. In the near future, we believe there are even bigger potential uses within security, which we are currently working on each day. However, it also reminds us that while AI has come a long way, vigilance and human oversight remain essential to ensure robust and secure code.

A Year in Perspective: GenAI’s Journey in Security Vulnerability Remediation

The Problem Space

The Research Methodology

Today’s Results: A New Chapter in GenAI’s Evolution

The Outcome

Show Me (the Failed) AI Code

Example 1: TelNet Protocol Fixes

The AI’s fix:

The AI’s fix (a year later):

Example 2: Command Injection

The AI’s fix:

The new AI fix (a year after):

Example 3: Lack of Input Validation

The AI’s fix:

The new AI fix (year later):

Example 4: Directory Traversal

The AI’s fix:

The AI fix (year later):

Lessons Learned

Related Articles

What is MTTD, and how can you crush it

DevSecOps Education and Training: 3 Ways to Build a Security-Aware Engineering Workforce

How Perion’s Approach to Collaborative SecOps Improves Product Security Posture

Recap of tj-actions/changed-files: What We Can Learn

AI-Generated Code: The Security Blind Spot Your Team Can't Ignore