LLM Coding Benchmarks

Study finds newer LLMs introduce more severe coding bugs despite higher benchmark scores

A new report today from code quality testing startup SonarSource SA is warning that while the latest large language models may be getting better at passing coding benchmarks, at the same time they are ...

Forbes

How Open Benchmarking Ensures AI Development Is Reliable And Safe

Artificial intelligence (AI) is essential to our daily lives. It influences everything from the way we drive and secure our homes to how we manage our money and receive medical care. However, the rush ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

Morningstar

Diffblue’s Latest Innovations in Unit Test Generation Deliver 20x Productivity Advantage Versus AI Coding Assistants

New benchmark study confirms Diffblue’s advantages over LLM coding assistants realized through its reinforcement learning-powered agentic capabilities Diffblue today announced the release of the next ...

Geeky Gadgets

Qwen 2 impressive LLM and AI coding assistant can help you write the perfect code

Qwen-2 is an advanced open-source large language model and AI coding assistant that has shown significant improvements over its predecessor, Qwen 1.5. It is available in five different sizes and has ...

ExtremeTech

AMD Announces OLMo, Its First Fully Open LLM

AMD planted another flag in the AI frontier by announcing a series of large language models (LLMs) known as AMD OLMo. As with other LLMs like OpenAI’s GPT 4o, AMD’s in-house-trained LLM has reasoning ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results