Data Overload: Generative AI Can Help Make Sense of the Data Tsunami to Keep Systems Secure

Jan. 19, 2024

Katie Gray SENIOR PARTNER • IQT
William Morrison PRINCIPAL • IQT

This is the final post in our series on generative AI’s impact on cybersecurity. You can read the introductory post here, the second one on GenAI-powered content and code generation here, and the third one on GenAI-driven automation and augmentation here.

Data underlies everything that security teams are responsible for protecting: applications with millions of lines of code, networks with billions of event logs, and the myriad of daily digital interactions both internal and external to their organizations. Sifting through vast and never-ending waves of data to understand what is malicious or not is like searching for the proverbial needle in the haystack. As defenders know only too well, attackers just need to be right once; defenders need to be right 100% of the time.

In this blog, we’ll explore three ways that generative AI (GenAI) can turbocharge defensive tools through automating and simplifying the analysis and synthesis of large volumes of textual data—by finding and fixing vulnerabilities and misconfigurations that expose organizations to attack; by correlating and contextualizing implications from security events to identify anomalous behavior; and by revealing potential attack paths.

Where might you be vulnerable?

One primary vector that attackers use to compromise an organization is to find and exploit vulnerabilities, or flaws, in application code. Once they have identified a vulnerability, they can develop an exploit to breach a system—for instance, by installing ransomware or by running malware that provides them with remote access to cause disruption or steal data. Finding vulnerabilities in code is laborious. Hackers manually reverse engineer applications, comb through source code, and run “fuzzing” tools to find vulnerabilities. On the flip side, application developers do their best to release code that is free of such vulnerabilities. They, too, use code-scanning tools and conduct manual code reviews, as well as testing software before it’s released.

GenAI could upset the status quo. Our second post in this series discussed how GenAI can help developers build more secure code from the outset and automate the time-consuming task of finding, and then fixing, vulnerabilities before pushing code into production. GenAI could also assist in identifying vulnerable code in applications already running, based on knowledge of the techniques, tactics, and procedures (TTPs) that attackers have successfully used to exploit systems. Series C-stage Qwiet AI’s preZero Platform leverages GenAI, trained on both open source and proprietary libraries, to uncover high-risk vulnerabilities quickly and accurately.

Could GenAI help improve the security of open source software too? While 80-90% of the world’s software is powered by open source, no single organization is responsible for its security. An appropriately tuned GenAI model could conceivably comb through the billions of lines of open source software to identify vulnerable code and propose fixes. Socket.dev is an early-stage company applying GenAI to open source software security. It uses ChatGPT to examine open source JavaScript and Python packages for security issues to help developers assess the potential risk of downstream dependencies in their software applications. Alternatively, a GenAI model could rewrite risky open source code in memory-safe languages such as Rust. Memory-safe coding languages more effectively shield the underlying source code, making it harder for attackers to access it. Seed-stage company Modelcode.ai, for example, uses GenAI to enable developers to rebuild, refactor, and modernize their entire code bases into memory-safe languages.

Detecting threats in the noise

A second application of GenAI in analyzing large datasets is to help network defenders with log analytics and threat hunting. IQT portfolio company GreyNoise Intelligence, for instance, has released Sift to help make sense of the millions of internet requests many organizations see every day. Most of this online traffic is expected—such as employees using web applications, emailing, and sending files. But malicious actors try to hide in the daily digital noise and they often succeed, using common internet protocols to exfiltrate data or communicate with command-and-control infrastructure. Sift leverages LLMs and other techniques to surface and summarize the most relevant traffic that could be an indicator of adversarial activity, freeing up time for threat hunters and analysts to focus on more important tasks.

Finally, GenAI is in the early stages of being used to identify and mitigate system misconfigurations that attackers could use to infiltrate networks. Bad actors actively scan for ways to get inside an organization’s trusted network. They look for things such as unencrypted S3 data buckets (data stores) or network tools that use default passwords. They also probe databases of authorized users to track down overprivileged accounts—ones that have been given access to too much sensitive information or too many systems.

Former engineers and researchers from OpenAI, Bishop Fox, Rapid7, and CrowdStrike have come together to form seed-stage startup Runsybil, which aims to automate this hacker intuition and transform it into active-defense capabilities based on the state-of-the-art in GenAI and cyber exploitation. Furthermore, compliance tools such as Secureframe can provide continuous monitoring of the way systems are set up to reduce misconfigurations. Secureframe has also incorporated GenAI into a remediation tool called Comply AI that automatically generates code to fix misconfigurations.

The innovation mission ahead

With new, more powerful Ai-powered tools at their disposal, both criminal and nation-state threat actors alike will increase the sophistication and volume of their cyberattacks, as we noted in the first blog in this series. Sophisticated social engineering, deep fakes and phishing, AI-powered network-penetration, and new species of polymorphic malware are all likely to begin roiling the landscape. But using improved tools for data synthesis, content and code generation, and augmentation and automation (the subject of the third blog in our series), defenders are racing to harness the capabilities brought to bear by GenAI.

Like many innovations in cyber, these capabilities are being developed and delivered by engineers, researchers, and leaders drawn to entrepreneurship and with the desire to level the digital playing field in a world where, as we noted earlier, hackers only need to be right once while defenders can’t afford to slip up at all. IQT has been a prolific cybersecurity investor and will continue to seek out and invest in startups leading the way in the use of GenAI to build stronger, more resilient, and more scalable cyber defenses.

Data Overload: Generative AI Can Help Make Sense of the Data Tsunami to Keep Systems Secure

Where might you be vulnerable?

Detecting threats in the noise

The innovation mission ahead

Social Bots: The Emerging Social AI Market

Can We Learn a System’s Optimal Behavior?

Spatial Computing in a 3D World: the Next Generation of Information