#multimodal AI

All articles tagged with "multimodal AI"

AI & Robotics

Alibaba pushes agentic AI with Qwen3.7-Plus launch

Key Takeaways

Alibaba says Qwen3.7-Plus combines visual perception, coding, tool use, and GUI control.
A showcase app build reportedly ran for more than 11 hours and generated over 10,000 lines of code.
The model appears strongest on interface-operation benchmarks while lagging on pure logic tests.

DT Editorial Team·Jun 6, 2026·via the-decoder.com

More in AI & Robotics→

The latest AI news we announced in May 2026

AI & Robotics

Google’s May AI Push Centers on Gemini 3.5 and Omni

Key Takeaways

Google used its June 5 recap to package May as the start of an "agentic Gemini era."
Gemini 3.5 was framed around agents and coding, while Gemini Omni was pitched as a multimodal creation system.
The company also linked AI to shopping, health, hardware and life-sciences research.

DT Editorial Team·Jun 5, 2026·via blog.google

More in AI & Robotics→

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM

AI & Robotics

Gemma 4 12B pushes multimodal AI onto 16GB laptops

Key Takeaways

Gemma 4 12B processes text, images, and audio natively.
The model is designed to run locally on laptops with 16 GB of RAM.
Google says it nearly matches the larger 26B model across benchmarks.

DT Editorial Team·Jun 4, 2026·via the-decoder.com

More in AI & Robotics→

AI & Robotics

ByteDance Study Finds Long-Document AI Learns Better From Questions Than From Transcribing Text

Key Takeaways

Researchers compared OCR-style training with question-answer supervision for long documents.
The study reports that pure text-recognition training worsened performance.
The resulting MMProLong model, built on Qwen2.5-VL, reportedly beat larger competitors on long-document tasks.

DT Editorial Team·May 25, 2026·via the-decoder.com

More in AI & Robotics→

AI & Robotics

Google pushes AI deeper into Search with new Gemini default, agents, and multimodal input

Key Takeaways

Google says AI Mode has surpassed one billion monthly users.
Gemini 3.5 Flash is becoming the default model in AI Mode globally.
The company is rolling out a redesigned Search box with conversational, multimodal, and agent-style capabilities.

DT Editorial Team·May 22, 2026·via blog.google

AI & Robotics

DT Editorial Team·May 22, 2026·via blog.google

AI & Robotics

DT Editorial Team·May 21, 2026·via blog.google

AI & Robotics

DT Editorial Team·May 20, 2026·via blog.google

AI & Robotics

DT Editorial Team·May 16, 2026·via the-decoder.com

AI & Robotics

DT Editorial Team·Apr 30, 2026·via the-decoder.com

AI & Robotics

DT Editorial Team·Apr 28, 2026·via artificialintelligence-news.com

AI & Robotics

DT Editorial Team·Apr 26, 2026·via blog.google

News

DT Editorial Team·Apr 2, 2026·via techcrunch.com

#multimodal AI

Alibaba pushes agentic AI with Qwen3.7-Plus launch

Google’s May AI Push Centers on Gemini 3.5 and Omni

Gemma 4 12B pushes multimodal AI onto 16GB laptops

ByteDance Study Finds Long-Document AI Learns Better From Questions Than From Transcribing Text

Google pushes AI deeper into Search with new Gemini default, agents, and multimodal input

Alibaba pushes agentic AI with Qwen3.7-Plus launch

Google’s May AI Push Centers on Gemini 3.5 and Omni

Gemma 4 12B pushes multimodal AI onto 16GB laptops

ByteDance Study Finds Long-Document AI Learns Better From Questions Than From Transcribing Text

Google pushes AI deeper into Search with new Gemini default, agents, and multimodal input

Google Launches Gemini 3.5 With a Stronger Push Into Agentic AI

Google Uses I/O 2026 to Push Faster Gemini Models and a New Multimodal Creative System

Google says AI Mode has topped a billion monthly users and is changing how people search

New Benchmark Shows Why Better-Looking AI Video Still Fails at Basic World Logic

Nvidia’s Nemotron 3 Nano Omni shows how open multimodal models are now built

Why Encoders Matter More as AI Becomes Multimodal

Google Positions Gemini as a Household Organizer in New Push Beyond Search and Chat

Microsoft expands its in-house AI lineup with new transcription, voice and image models