ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists By Cointelegraph

[ad_1]



Nearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.

LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.

Flowchart of AgentBench’s evaluation method. Source: Liu, et al