Businesses using artificial intelligence to generate code are experiencing downtime and security issues. The team at Sonar, a provider of code quality and security products, has heard first-hand stories of consistent outages at even major financial institutions where the developers responsible for the code blame the AI.
Amongst many other imperfections, AI tools are not perfect at generating code. Bilkent University researchers found that the latest versions of ChatGPT, GitHub Copilot, and Amazon CodeWhisperer generated correct code just 65.2%, 46.3%, and 31.1% of the time, respectively.
Part of the problem is that AI is notoriously bad at maths because it struggles to understand logic. Plus, programmers are not known for being great at writing prompts because “AI doesn’t do things consistently or work like code,” according to Wharton AI professor Ethan Mollick.
SEE: OpenAI Unveils ‘Strawberry’ Model, Optimized for Complex Coding and Math
In late 2023, more than half of organisations said they encountered security issues with poor AI-generated code “sometimes” or “frequently,” as per a survey by Snyk. But the issue could worsen, as 90% of enterprise software engineers will use AI code assistants by 2028, according to Gartner.
Tariq Shaukat, CEO of Sonar and a former president at Bumble and Google Cloud, is “hearing more and more about it” already. He told TechRepublic in an interview, “Companies are deploying AI-code generation tools more frequently, and the generated code is being put into production, causing outages and/or security issues.
“In general, this is due to insufficient reviews, either because the company has not implemented robust code quality and code-review practices, or because developers are scrutinising AI-written code less than they would scrutinise their own code.
“When asked about buggy AI, a common refrain is ‘it is not my code,’ meaning they feel less accountable because they didn’t write it.”
SEE: 31% of Organizations Using Generative AI Ask It to Write Code (2023)
He stressed that this is not from want of care on the developer’s part but rather a lack of interest in “copy-editing code” on top of quality control processes being unprepared for the speed of AI adoption.
Furthermore, a 2023 study from Stanford University that looked at how users interact with AI code assistants found that those who use them “wrote significantly less secure code” but were “more likely to believe they wrote secure code.” This suggests that simply by using AI tools, programmers will automatically adopt a more laissez-faire attitude to reviewing their work.
It is human nature to be tempted by an easier shortcut, particularly when under pressure by a manager or launch schedule, but putting full trust in AI can have an impact on the quality of code reviews and understanding how the code interacts with an application.
The CrowdStrike outage in July highlighted just how widespread disruption can be if a critical system fails. While that incident was not specifically related to AI-generated code, the cause of the outage was a bug in the validation process, which allowed “problematic content data” to be deployed. This demonstrates the importance of a human element when vetting critical content.
Developers are also not unaware of the potential pitfalls of using AI on their job. According to a report by Stack Overflow, only 43% of developers trust the accuracy of AI tools, just 1% higher than in 2023. AI’s favorability rating amongst developers also fell from 77% last year to 72% this year.
But, despite the risk, engineering departments have not been deterred from AI coding tools, largely due to the efficiency benefits. A survey from Outsystems found that over 75% of software executives reduced their development time by up to a half thanks to AI-driven automation. It’s making developers happier too, Shaukat told TechRepublic, because they spend less time on routine tasks.
The time savings from productivity gains could be offset by the effort needed to fix issues caused by AI-generated code.
Researchers at GitClear inspected 153 million lines of code originally written between January 2020 and December 2023 — when use of AI coding assistants skyrocketed — that had been altered in some way. They noted a rise in the amount of code that had to be fixed or reverted less than two weeks after it was authored, so-called “code churn,” which indicates instability.
The researchers project that instances of code churn will double in 2024 over the pre-AI 2021 baseline and that more than 7% of all code changes will be reverted within two weeks.
Furthermore, within the study period, the percentage of copy-pasted code also increased notably. This goes against the popular “DRY,” or “Don’t Repeat Yourself,” mantra amongst programmers, as repeated code can lead to increased maintenance, bugs, and inconsistency across a codebase.
But, on whether the productivity time savings associated with AI code assistants are being negated by the clean-up operations, Shaukat said it is too early to say.
SEE: Top Security Tools for Developers
“Our experience is that typical developers accept suggestions from code generators about 30% of the time. That is meaningful,” he said. “When the system is designed properly, with the right tooling and processes in place, any clean-up work is manageable.”
However, developers still need to be held accountable for the code they submit, especially when AI tools are used. If they aren’t, that’s when the downtime-causing code will slip through the cracks.
Shaukat told TechRepublic, “CEOs, CIOs, and other corporate leaders need to look at their processes in light of the increased usage of AI in code generation and prioritise taking the assurance steps needed.
“Where they can’t, they will see frequent outages, more bugs, a loss of developer productivity, and increased security risks. AI tools are meant to be both trusted and verified.”