Scientists develop AI monitoring agent to detect and stop harmful outputs By Cointelegraph

[ad_1]



A team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University and Microsoft (NASDAQ:) Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing.

The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs, such as code attacks, before they happen.

An illustration of the monitor in action. On the left, a workflow ending in a high safety rating. On the right, a workflow ending in a low safety rating. Source: Naihin, et., al. 2023