Professionals across industries are exploring generative AI for various tasks — including creating information security training materials — but will it truly be effective?
Brian Callahan, senior lecturer and graduate program director in information technology and web sciences at Rensselaer Polytechnic Institute, and Shoshana Sugerman, an undergraduate student in this same program, presented the results of their experiment on this topic at ISC2 Security Congress in Las Vegas in October.
The main question of the experiment was “How can we train security professionals to administer better prompts for an AI to create realistic security training?” Relatedly, must security professionals also be prompt engineers to design effective training with generative AI?
To address these questions, researchers gave the same assignment to three groups: security experts with ISC2 certifications, self-identified prompt engineering experts, and individuals with both qualifications. Their task was to create cybersecurity awareness training using ChatGPT. Afterward, the training was distributed to the campus community, where users provided feedback on the material’s effectiveness.
The researchers hypothesized that there would be no significant difference in the quality of training. But if a difference emerged, it would reveal which skills were most important. Would prompts created by security experts or prompt engineering professionals prove more effective?
SEE: AI agents may be the next step in increasing the complexity of tasks AI can handle.
The researchers distributed the resulting training materials — which had been edited slightly, but included mostly AI-generated content — to the Rensselaer students, faculty, and staff.
The results indicated that:
Callahan noted that it seemed odd for people trained by security experts to feel they were better at prompt engineering. However, those who created the training didn’t generally rate the AI-written content very highly.
“No one felt like their first pass was good enough to give to people,” Callahan said. “It required further and further revision.”
In one case, ChatGPT produced what looked like a coherent and thorough guide to reporting phishing emails. However, nothing written on the slide was accurate. The AI had invented processes and an IT support email address.
Asking ChatGPT to link to RPI’s security portal radically changed the content and generated accurate instructions. In this case, the researchers issued a correction to learners who had gotten the inaccurate information in their training materials. None of the training takers identified that the training information was incorrect, Sugerman noted.
“ChatGPT may very well know your policies if you know how to prompt it correctly,” Callahan said. RPI is, he noted, a public university and all of its policies are publically available online.
The researchers only revealed the content was AI-generated after the training had been conducted. Reactions were mixed, Callahan and Sugerman said:
Callahan said any IT team using AI to create real training materials, as opposed to running an experiment, should disclose the use of AI in the creation of any content shared with other people.
“I think we have tentative evidence that generative AI can be a worthwhile tool,” Callahan said. “But, like any tool, it does come with risks. Certain parts of our training were just wrong, broad, or generic.”
Callahan pointed out a few limitations of the experiment.
“There is literature out there that ChatGPT and other generative AIs make people feel like they have learned things even though they may not have learned those things,” he explained.
Testing people on actual skills, instead of asking them to report whether they felt they had learned, would have taken more time than had been allotted for the study, Callahan noted.
After the presentation, I asked whether Callahan and Sugarman had considered using a control group of training written entirely by humans. They had, Callahan said. However, dividing training makers into cybersecurity experts and prompt engineers was a key part of the study. There weren’t enough people available in the university community who self-identified as prompt engineering experts to populate a control category to further split the groups.
The panel presentation included data from a small initial group of participants — 15 test takers and three test makers. In a follow-up email, Callahan told TechRepublic that the final version for publication will include additional participants, as the initial experiment was in-progress pilot research.
Disclaimer: ISC2 paid for my airfare, accommodations, and some meals for the ISC2 Security Congress event held Oct. 13–16 in Las Vegas.