Skip to main content

Benchmarking AI-assisted developers (and their tools) for superior AI governance

A quick browse of LinkedIn, DevTok, and X would lead you to believe that almost every developer has jumped on board the vibe coding hype train with full gusto. And while it’s not that far-fetched, with 84% of developers confirming they are currently using (or planning to use) AI coding tools in their daily workflows, a full surrender to vibe coding autonomous agents is still unusual. Stack Overflow’s 2025 AI Survey revealed that most respondents (72%) are not (yet) vibe coding. Still, adoption is trending upwards, and AI is currently generating 41% of all code, for better or worse.

Tools like Cursor and Windsurf represent the latest generation of AI coding assistants, each with a powerful autonomous mode that can make decisions independently based on preset parameters. The speed and productivity gains are undeniable, but a worrying trend is emerging: many of these tools are being deployed in enterprise environments, and these teams are not equipped to address the inherent security issues associated with their use. Human governance is paramount, and too few security leaders are making an effort to modernize their security programs to adequately shield themselves from the risk of AI-generated code.

If the tech stack lacks tools that oversee not only developer security proficiency, but also the trustworthiness of approved AI coding companions each developer uses, then it is likely that efforts to uplift the overall security program and the developers working within it will be short of the appropriate data insights to effect change.

AI and human governance should be a priority

The drawing card of agentic models is their ability to work autonomously and make decisions independently, and these being embedded into enterprise environments at scale without appropriate human governance is inevitably going to introduce security issues that are not particularly visible or easy to stop.

Long-standing security problems like sensitive data exposure and insufficient logging and monitoring remain, and emerging threats like memory poisoning and tool poisoning are not issues to take lightly. CISOs must take steps to reduce developer risk, and provide continuous learning and skills verification within their security programs in order to safely implement the help of agentic AI agents.

Powerful benchmarking lights your developer’s path

It’s very difficult to make impactful, positive improvements to a security program based solely on anecdotal accounts, limited feedback, and other data points that are more subjective in nature. These types of data, while helpful in correcting more glaring faults (such as a particular tool continuously failing or personnel time being wasted on a low-value and frustrating task), will do little to uplift the program to a new level. Sadly, the “people” part of an enterprise security (or, indeed, Secure by Design) initiative is notoriously tricky to measure, and too often neglected as a piece of the puzzle that must be a priority to solve.

This is where governance tools that deliver data points on individual developer security proficiency – categorized by language, framework and even industry – can be the difference between executing yet another flat training and observability exercise, as opposed to proper developer risk management, where the tools are working to collect the insights needed to plug knowledge gaps, filter security-proficient devs to the most sensitive projects, and importantly, monitor and approve the tools they use in their day, such as AI coding companions.

Assessment of agentic AI coding tools and LLMs

Three years on, we can confidently conclude that not all AI coding tools are created equal. More studies are emerging that assist in differentiating the strengths and weaknesses of each model, for a variety of applications. Sonar’s recent study on the coding personalities of each model was quite eye-opening, revealing the different traits of models like Claude Sonnet 4, OpenCoder-8B, Llama 3.2 90B, GPT-4o, and Claude Sonnet 3.7, with insight into how their individual approaches to coding affect code quality and, subsequently, associated security risk. Semgrep’s deep dive into the capabilities of AI coding agents for detecting vulnerabilities also yielded mixed results, with findings that generally demonstrated that a security-focused prompt can already identify real vulnerabilities in real applications. However, depending on the vulnerability class, a high volume of false positives created noisy, less valuable results.

Our own unique benchmarking data supports much of Semgrep’s findings. We were able to show that the best LLMs perform comparably with proficient people at a range of limited secure coding tasks. However, there is a significant drop in consistency among LLMs across different stages of tasks, languages, and vulnerability categories. Generally, top developers with security proficiency outperform all LLMs, while average developers do not.

With studies like this in mind, we must not lose sight of what we as an industry are allowing into our codebases: AI coding agents have increasing autonomy, oversight and general use, and they must be treated like any other human with their hands on the tools. This, in effect, requires careful management in terms of assessing their security proficiency, access level, commits and mistakes with the same fervor as the human operating them, with no exceptions. How trustworthy is the output of the tool, and how security proficient is its operator?

If security leaders cannot answer these questions and plan accordingly, the attack surface will continue to grow by the day. If you don’t know where the code is coming from, make sure it’s not going in any repository, with no exceptions.

The post Benchmarking AI-assisted developers (and their tools) for superior AI governance appeared first on SD Times.



from SD Times https://ift.tt/GxHZ470

Comments

Popular posts from this blog

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet...

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are dec...

Olive and NTT DATA Join Forces to Accelerate the Global Development and Deployment of AI Solutions

U.S.A., March 14, 2021 — Olive , the automation company creating the Internet of Healthcare, today announced an alliance with NTT DATA , a global digital business and IT services leader. The collaboration will fast track the creation of new healthcare solutions to transform the health experience for humans — both in the traditional healthcare setting and at home. As a member of Olive’s Deploy, Develop and Distribute Partnership Programs , NTT DATA is leveraging Olive’s open platform to innovate, build and distribute solutions to Olive’s customers, which include some of the country’s largest health providers. Olive and NTT DATA will co-develop new Loops — applications that work on Olive’s platform to provide humans real-time intelligence — and new machine learning and robotic process automation (RPA) models. NTT DATA and Olive will devote an early focus to enabling efficiencies in supply chain and IT, with other disciplines to follow. “This is an exciting period of growth at Olive, so...