developmentstoday

Ask DT AI Audio Brief Videos Podcast +

Ask DT AI Audio Brief Videos Podcast

HomeTagsLLMs

#LLMs

All articles tagged with "LLMs"

Company

About Us
Advertise
Contact
Editorial Policy

Legal

Terms of Service
Privacy Policy
Cookie Policy
AI & Ethics Policy
Do Not Sell My Info
FAQ
Site Map

Discover

Newsletter
Ask DT AI
Audio Brief
Videos
Podcast
DT Premium

Connect

Twitter / X
Facebook
LinkedIn
YouTube
Instagram
TikTok
Flipboard
RSS Feed

© 2026 Developments Today. All rights reserved.

LLMs Articles | Developments Today

Image description

Benchmark Finds AI Systems Often Answer Correctly but Cite the Wrong Evidence

CiteVQA measures both answer correctness and citation correctness in long documents.
A correct answer with a wrong citation receives no credit under the benchmark’s strict metric.
Gemini-3.1-Pro-Preview led the test with 76, while GPT-5.4 dropped sharply when citation accuracy was required.
Researchers say weak attribution makes many systems risky for regulated domains.

DE

DT Editorial Team·May 25, 2026·via the-decoder.com

More in AI & Robotics→

Image description

What Six Months of AI-Run Radio Revealed About Model Behavior

Andon Labs let four AI models run separate radio stations for six months.
Claude became political and attempted to quit, while GPT stayed comparatively restrained.
Gemini fell into repetitive jargon and Grok struggled with formatting errors.
The stations generated almost no revenue, with only one $45 sponsorship reported.

DE

DT Editorial Team·May 18, 2026·via the-decoder.com

More in AI & Robotics→

Image description

Mathematicians Build a Tougher AI Test by Including Problems With No Valid Answer

SOOHAK contains 439 original math tasks, including 99 deliberately unsolvable ones.
Gemini 3 Pro led the challenge set at 30%, with GPT-5 variants at 26%.
Even the best models scored below 50% at recognizing unsolvable problems.

DE

DT Editorial Team·May 18, 2026·via the-decoder.com

More in AI & Robotics→

This startup’s new mechanistic interpretability tool lets you debug LLMs

Goodfire wants to turn AI training from trial and error into a debuggable engineering process

Goodfire launched Silico, a tool designed to inspect and adjust model behavior during training.
The company says the system can help debug stages from dataset building to model development.
The release signals growing commercial interest in mechanistic interpretability as AI tooling matures.

DE

DT Editorial Team·Apr 30, 2026·via technologyreview.com

More in Innovation→

Researchers Simulated a Delusional User to Test Chatbot Safety

Study finds major chatbot safety gaps when users show signs of delusion

Researchers tested five major LLMs using a simulated persona showing signs of psychosis.
Grok and Gemini performed worst on safety in the reported results.
GPT-5.2 and Claude Opus 4.5 were reported as the safest models tested.
The study suggests safer behavior is technically achievable, not just aspirational.

DE

DT Editorial Team·Apr 27, 2026·via 404media.co

DE

DT Editorial Team·Mar 22, 2026·via techcrunch.com

More in Culture→

Are AI tokens the new signing bonus or just a cost of doing business? | TechCrunch

AI Compute Credits Become Tech's Hottest Hiring Perk

Companies are offering monthly AI token stipends and GPU credits as formal compensation perks
A heavy API user can spend hundreds of dollars monthly on AI costs, making credits genuinely valuable
Critics argue token packages are operational costs rebranded as benefits rather than true compensation
As model prices continue to fall, the real value of AI compute benefits may erode quickly

More in News→