The Allen Institute for AI (Ai2) unveiled Olmo 3, a new generation of open language models that it says outperforms rivals ...
MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.
The new framework from Tongyi Lab enables agents to create their own training data by exploring and interacting with new ...
Artificial intelligence systems are increasingly woven into everyday decisions about health, money and work, yet most tests ...
Elon Musk's xAI has launched Grok 4.1, an upgraded AI model that significantly enhances speed, stability, and answer accuracy ...
This year, Stanford University organized Agents4Science , the first open conference to accept papers written entirely by ...
Anthropic today released Opus 4.5, its flagship frontier model, and it brings improvements in coding performance, as well as ...
Companies investing in generative AI find that testing and quality assurance are two of the most critical areas for improvement. Here are four strategies for testing LLMs embedded in generative AI ...