The unreasonable effectiveness of LLMs for auditing Rust code
The lead of the Rust Secure Code Working Group utilized GPT-5.5 via Codex for Open Source to audit widely used Rust crates, discovering and reporting numerous issues, including a significant out-of-bounds write in the jxl-grid crate, which is part of the jxl-oxide JPEG XL decoder in Rust. This vulnerability, for instance, was not caught by fuzzers because it only occurs on 32-bit platforms with large image dimensions. The effectiveness of GPT-5.5 in identifying these issues, such as use-after-free, data races, and panic safety issues, underscores its potential as a valuable tool in code auditing. The use of miri, a tool that runs Rust code in an interpreter to verify adherence to language rules, was crucial in validating the issues found by GPT-5.5 and eliminating false positives.
The integration of large language models (LLMs) like GPT-5.5 into code auditing processes reflects a broader trend of leveraging AI to enhance software security and reliability. This development is particularly significant for Rust, a language designed with a strong focus on safety. The ability of GPT-5.5 to reason about Rust-specific concepts, such as panic safety, aliasing, and the Send/Sync traits, demonstrates the potential for LLMs to complement traditional auditing techniques. Moreover, this approach aligns with the goals of initiatives like the Rust Foundation's security initiative and Project Glasswing, which aim to improve the security of Rust code.
The findings from this audit have immediate implications for the maintenance and security of the affected crates. The issues identified, such as the out-of-bounds write in jxl-grid, will need to be addressed to prevent potential vulnerabilities. The use of GPT-5.5 and miri in this audit also highlights the importance of tooling in ensuring the security and reliability of software. As LLMs continue to be integrated into development workflows, it will be crucial to monitor their effectiveness and limitations in various contexts. Additionally, the coordination between different auditing efforts, such as those using GPT-5.5 and Mythos, will be essential in maximizing the efficiency and coverage of code audits.
Key Takeaways
GPT-5.5 successfully identified dozens of issues in widely used Rust crates, including a serious out-of-bounds write in the jxl-grid crate.
The miri tool played a critical role in verifying the issues found by GPT-5.5 and eliminating false positives.
The audit demonstrated the potential of LLMs to reason about Rust-specific concepts and complement traditional auditing techniques.
The findings underscore the importance of tooling in ensuring the security and reliability of software, particularly in languages like Rust that prioritize safety.
About the Source
This analysis is based on reporting by Medium. Here is a short excerpt for context:
As a lead of the Rust Secure Code Working Group, I got access to GPT-5.5 via the Codex for Open Source. Since then I’ve found and reported… Continue reading on Medium »Read the original at Medium