LAU Research Shows a Cost-Saving Path for AI-Powered Cybersecurity
A new study by Dr. Fouad Trad finds that AI agents do not always require all available information to accurately detect phishing websites, demonstrating a cost-effective path for AI-Powered cybersecurity.
Phishing websites remain one of the most common and costly cybersecurity threats, often imitating trusted services to steal passwords, financial details, or personal information.
While artificial intelligence can help detect these websites, AI systems become expensive to run as they require more information. This makes them inefficient for real-world deployment, where systems may need to check thousands or millions of websites.
A recent study by Dr. Fouad Trad, assistant professor at the School of Engineering, investigates whether AI agent systems, which perform tasks autonomously, can become both more accurate and more affordable by receiving information more efficiently.
In the study, titled “Toward accurate and cost-effective LLM agents via information flow optimization: Insights from a phishing detection case study,” published in Information Processing and Management, Dr. Trad and his co-author tested large language model agents such as Gemini-2.0-Flash and GPT-4.1 on phishing detection.
The researchers used a benchmark dataset of 30,000 websites from which they selected a balanced sample of 2,000 websites, including 1,000 phishing and 1,000 legitimate sites. Each website had four types of information: its URL, domain registration record, webpage HTML code, and screenshot. The data were revalidated in early 2025, and the experiments were repeated using Gemini-2.0-Flash and GPT-4.1 to establish whether the findings held across different models.
Although AI models evolve quickly, Dr. Trad stressed that the results remain significant.
“Our goal was not to measure the absolute performance of a particular model generation, but rather to understand how multiple AI agents should share information and make decisions collectively in a cost-efficient way without sacrificing performance,” he said. “While newer models such as GPT-5.5 and Gemini 3.5 will likely perform better overall, I would expect the general patterns and insights we observed regarding agent collaboration and information sharing to remain valid.”
The results show that having a single source of information, such as the URL, was useful but incomplete. With Gemini-2.0-Flash, the URL-only agent performed best among the single-input agents, with 0.905 accuracy and a 0.896 F1-score, a measure that balances how many phishing websites the system catches with how often its warnings are correct. The HTML-based agent followed, while domain details and screenshots alone were weaker.
Providing the AI with all information at once, a parallel approach, improved results, with 0.945 accuracy and a 0.942 F1-score, but at a high cost. Sequential methods did better. A multi-agent pipeline, where several specialized AI agents examine the case in stages, reached 0.950 accuracy and a 0.947 F1-score.
The single-agent sequential design registered 0.965 accuracy and a 0.964 F1-score, a statistically significant improvement over the parallel approach. The dynamic sequential agent, which selects the information to request next, performed best overall in the Gemini experiments, with 0.975 accuracy and a 0.974 F1-score, significantly outperforming the fixed single-agent sequential design.
Cost was equally important. The parallel-input agent cost about $925 per one million website checks. By comparison, the multi-agent pipeline cost $42; the single-agent sequential design, $48; and the dynamic design, $97.
Similar trends were observed with GPT-4.1. Sequential models remained more efficient, and the single-agent sequential design achieved the best balance, while the dynamic model was close and not statistically different from it.
When compared with commercial anti-phishing tools, the proposed AI-agent architectures also performed strongly, with the dynamic sequential design exceeding the tested commercial detectors in accuracy and F1-score.
However, the study’s implications go beyond phishing itself, which “served as a representative task that requires information fusion and collaborative reasoning,” noted Dr. Trad. “The insights we obtained about information-sharing strategies can potentially extend to other domains where decisions must be made by combining information from multiple sources.”
In conclusion, cost-aware AI may depend on teaching the systems when to stop, when to request more, and which evidence to examine next. Dr. Trad highlighted how this work could directly benefit end users.
“One of the most attractive aspects of LLMs is that they can help users detect phishing attempts without requiring any specialized training or expertise,” he said. “Much like how people already interact with systems such as ChatGPT, a user can simply ask an LLM whether a website appears legitimate and receive an explanation for its assessment. This makes them a promising tool for helping users assess unfamiliar websites and adapt to new phishing techniques as they emerge.”
To browse more scholarly output by the LAU community, visit our open-access digital archive, the Lebanese American University Repository (LAUR).