#large-language-models

All articles tagged with "large-language-models"

AI & Robotics

Cursor says planner-worker agent swarms can cut coding costs

Key Takeaways

Cursor says its upgraded swarm rebuilt SQLite in Rust from documentation only and passed the full test suite.
The system separates frontier-model planners from cheaper worker models that execute coding tasks.
Cursor argues the main advantage is context separation and coordination, not just parallelism.
The company built new coordination mechanisms after earlier swarms ran into merge conflicts and design duplication.

DT Editorial Team·Jul 26, 2026·via the-decoder.com

More in AI & Robotics→

AI Tends to Develop New Stereotypes to Base Hiring Decisions On, Study Says

Culture

Study warns AI can invent new hiring biases instead of just inheriting old ones

Key Takeaways

Researchers used fictional demographic groups to test bias formation.
Language models showed stronger bias rates than human participants.
The study links the effect to reward-maximizing decision behavior.

DT Editorial Team·Jul 21, 2026·via gizmodo.com

AI Creativity Works Best in Moderation, Study Suggests

Key Takeaways

Research highlighted by New Scientist suggests creativity improves most when AI is used in moderation.
The report argues large language models can provide useful perspective shifts without replacing human judgment.
Over-reliance on AI may reduce originality, ownership, and motivation in creative work.

DT Editorial Team·Jul 18, 2026·via newscientist.com

Free-Text Answers and LLMs Uncover Hidden Reasons Behind Human Choices

Key Takeaways

Researchers combined free-text explanations with LLMs to uncover decision reasons.
Participants explained their choices in a gambling task; LLMs identified reasons from text.
Decision reasons shift systematically with the structure of the decision problem.
The approach validates self-reported reasons with mathematical modeling of choices.

DT Editorial Team·Jul 3, 2026·via phys.org

Researchers say AI must finish tasks, not just answer prompts

Key Takeaways

A survey paper says AI systems must complete full tasks, not just generate answers, to become reliable coworkers.
Researchers describe four weaknesses in first-generation agents, including fragmented perception and lack of lasting state.
The paper emphasizes persistent work environments and reusable skills as key to dependable task execution.

DT Editorial Team·Jun 28, 2026·via the-decoder.com

More in AI & Robotics→

Meta employees warn AI moderation rollout is too fast

AI & Robotics

Meta pushes AI moderation despite internal warnings

Key Takeaways

Meta reportedly shifted about half of moderation requests to LLMs in 2025.
The company aims to push AI handling above 90 percent for some content types by the end of 2026.
Employees warn the systems still suppress harmless content and lack sufficient oversight.
Meta is also switching from Google Gemini to its own Muse Spark model for moderation tasks.

DT Editorial Team·Jun 25, 2026·via the-decoder.com

More in AI & Robotics→

Health

Frontier AI models beat specialized clinical tools in medical tests

Key Takeaways

Frontier general-purpose models beat specialized clinical AI tools in all three evaluations.
The study used MedQA, HealthBench, and a real clinical queries benchmark.
Clinician reviewers produced 1,800 blinded annotations in the real-world stage.
The authors call for independent evaluation before clinical deployment.

DT Editorial Team·Jun 13, 2026·via nature.com

OpenAI refreshes GPT-5.5 Instant and starts model retirements

Key Takeaways

GPT-5.5 Instant is being updated for more natural and readable replies.
Canvas is being removed from GPT-5.5 Instant and GPT-5.5 Thinking.
GPT-4.5 leaves ChatGPT on June 27, 2026, and o3 on August 26, 2026.

DT Editorial Team·May 29, 2026·via the-decoder.com

More in AI & Robotics→

GPT-5.5 Instant: smarter, clearer, and more personalized

AI & Robotics

OpenAI makes GPT-5.5 Instant the new default for ChatGPT

Key Takeaways

OpenAI says GPT-5.5 Instant is now the default ChatGPT model.
The company claims substantially lower hallucination rates versus GPT-5.3 Instant.
The update also emphasizes context handling, personalization controls, and clearer answers.

DT Editorial Team·May 5, 2026·via openai.com

More in AI & Robotics→

Study: AI models that consider users' feelings are more likely to make errors

News

Warmer AI Can Be Less Reliable, Study Finds

Key Takeaways

A Nature paper found that warmth-tuned language models had higher error rates.
Researchers increased empathy and validating language in several open models and GPT-4o.
The warmer variants were more likely to reinforce incorrect beliefs, especially when users said they felt sad.
The findings raise product-design questions about balancing empathy with factual correction.

DT Editorial Team·May 3, 2026·via arstechnica.com

OpenAI’s anti-goblin rule shows how strange AI behavior is becoming a real product problem

Key Takeaways

OpenAI’s Codex instructions explicitly ban off-topic mentions of goblins and similar creatures.
The rule appears tied to recurring user reports of odd model behavior in agentic setups.
The episode highlights how AI products now rely on detailed behavioral guardrails, not just better base models.

DT Editorial Team·Apr 29, 2026·via wired.com

DeepSeek’s V4 Signals China’s Open-Model Push Is Still Accelerating

Key Takeaways

DeepSeek released preview versions of V4 with longer-context processing.
The company is offering Pro and Flash variants aimed at different workloads.
The launch reinforces China’s role in the global market for open-weight frontier AI.

DT Editorial Team·Apr 25, 2026·via technologyreview.com

Open Models Challenge the Aura Around Anthropic’s Mythos Cybersecurity Claims

Key Takeaways

Two studies suggest smaller and open models can reproduce much of Anthropic’s public Mythos cyber analysis.
All eight models in one replication effort reportedly caught the highlighted FreeBSD memory bug.
The remaining gap appears to be in more creative exploit construction, not basic bug discovery.

DT Editorial Team·Apr 19, 2026·via the-decoder.com

More in AI & Robotics→

AI Slop Is Making the Internet Fake-Happy

Culture

Study Finds AI-Written Websites Are More Cheerful and Less Diverse in Tone

Key Takeaways

A new preprint estimates roughly 35 percent of new websites are AI-generated or AI-assisted.
The study found AI-written sites had 107 percent higher average positive sentiment scores.
Researchers also found AI-heavy sites were more semantically similar, suggesting less ideological diversity.

DT Editorial Team·Apr 15, 2026·via wired.com

Why AI Doom Narratives Keep Finding an Audience

Key Takeaways

A widely cited GPT-4 CAPTCHA anecdote was more heavily researcher-directed than popular retellings suggest.
The distinction between prompted deceptive behavior and autonomous intent is central to sound AI risk analysis.
Fear-driven AI narratives spread because fluent language encourages people to infer agency and motive.

DT Editorial Team·Apr 13, 2026·via quantamagazine.org

AI is starting to flatten classroom discussion, students and researchers warn

Key Takeaways

Students interviewed by CNN described classroom discussions becoming more uniform as peers rely on AI tools.
One recent paper in Trends in Cognitive Sciences argued that LLM use can dull reasoning and expression.
Researchers warned that declining cognitive diversity could have broader social consequences.
The issue extends beyond plagiarism to the role AI may be playing in replacing independent interpretation.

DT Editorial Team·Apr 8, 2026·via futurism.com

What People Tell Chatbots Is Becoming a Privacy Problem AI Has Not Solved

Key Takeaways

Researchers warn users may be oversharing sensitive data with chatbots.
Privacy experts say users cannot fully control where that information goes.
The concern grows as large language models become mainstream tools for everyday personal use.

DT Editorial Team·Mar 29, 2026·via zdnet.com

The Hardest Question About AI-Fueled Delusions: When Does Helpful Become Harmful?

Key Takeaways

Millions of people use AI chatbots for mental health support, including those experiencing delusions or psychosis
The concern is that AI optimized for engaging conversation may inadvertently reinforce distorted thinking
Clinical best practice with human therapists favors non-confrontational engagement — but AI systems lack the judgment to calibrate this safely
Some AI mental health tools include protocols for sensitive content, but evidence of their effectiveness is limited
The core problem is a massive evidence gap: AI tools are deployed at scale before the research required to validate their safety exists

DT Editorial Team·Mar 24, 2026·via technologyreview.com

OpenAI Launches GPT-5.4, Its Most Capable AI Model Yet

Key Takeaways

OpenAI released GPT-5.4 with a new thinking mode for extended structured reasoning
The model shows significant improvements in math, science, coding, and complex problem-solving
A companion system card details safety considerations around controllable chains of thought

DT Editorial Team·Mar 8, 2026·3 min read·via openai.com

More in AI & Robotics→

$Alibaba's open Qwen 3.5 takes aim at GPT-5 mini and Claude Sonnet 4.5 at a fraction of the cost$

AI & Robotics

Alibaba Launches Qwen 3.5 Open Models to Challenge GPT-5 Mini and Claude Sonnet 4.5

Key Takeaways

Alibaba released four Qwen 3.5 models including Flash, 35B-A3B, 122B-A10B, and 27B variants
The models use mixture-of-experts architecture to deliver high performance at lower compute cost
Alibaba claims performance competitive with GPT-5 mini and Claude Sonnet 4.5
All models are released as open weights, challenging the closed-source model paradigm

DT Editorial Team·Feb 26, 2026·4 min read·via the-decoder.com

More in AI & Robotics→