Nvidia Nemotron 3 Nano Omni புதிய மல்டிமோடல் AI playbook-ஐ வெளிப்படுத்துகிறது

அதிகமாக வெளிப்படுத்தும் விவரங்களுடன் ஒரு திறந்த வெளியீடு

Nvidia-வின் புதிய Nemotron 3 Nano Omni ஒரு மல்டிமோடல் மாதிரி என்பதால் மட்டுமல்ல, இப்படியான ஒரு அமைப்பு எவ்வாறு உருவாக்கப்படுகிறது என்பதை நிறுவனம் மிகத் துல்லியமாக வெளிப்படுத்தியுள்ளதாலும் குறிப்பிடத்தக்கது. வழங்கப்பட்ட மூல உரையின் படி, இந்த மாதிரி text, images, video, மற்றும் audio-ஐ கையாளுகிறது, agentic பயன்பாடுகளுக்காக வடிவமைக்கப்பட்டுள்ளது, மேலும் வணிக பயன்பாட்டிற்கும் அனுமதிக்கப்பட்டுள்ளது. Nvidia model weights-உடன் training data மற்றும் pipelines-இன் சில பகுதிகளையும் வெளியிடுகிறது.

இந்த சேர்க்கை இந்த வெளியீட்டை ஒரு சாதாரண model release-ஐவிட மேலே கொண்டு செல்கிறது. நவீன மல்டிமோடல் AI அமைப்புகளுக்குப் பின்னால் உள்ள increasingly hybrid and synthetic data flows-ஐ இது காட்டுகிறது; இங்கு training பெரும்பாலும் ஒரு தூய corpus-ஆல் அல்ல, பல பிற மாதிரிகளின் அடுக்கடுக்கான outputs-ஆல் சார்ந்திருக்கும்.

மாதிரி எதற்காக உருவாக்கப்பட்டுள்ளது

Nemotron 3 Nano Omni என்பது 30-billion-parameter open-source multimodal model என விவரிக்கப்படுகிறது; இதில் Mamba-Transformer hybrid மற்றும் mixture-of-experts routing உள்ளது. ஒவ்வொரு query-க்கும் சுமார் மூன்று பில்லியன் parameters செயல்படுத்தப்படுகின்றன. இந்த மாதிரி Nvidia-வின் C-RADIOv4-H vision encoder மற்றும் Parakeet-TDT audio encoder-ஐ பயன்படுத்துகிறது; மேலும் அதன் context window 256,000 tokens வரை உள்ளது. அதிகாரப்பூர்வமாக ஆதரிக்கப்படும் மொழி ஆங்கிலம் மட்டுமே.

Nvidia-வின் படி, இந்த அமைப்பு முக்கியமாக agentic use cases-க்கு இலக்காக உள்ளது. source report document processing, computer-use agents, video and audio analysis, மற்றும் voice interaction-ஐ அதன் நோக்கப்பட்ட பயன்பாடுகளாக குறிப்பிடுகிறது. இந்த framing முக்கியமானது, ஏனெனில் prompt-களுக்கு பதிலளிப்பதைத் தாண்டி, interfaces மற்றும் media types across நீண்ட context மற்றும் action-oriented workflows-களில் இயங்குவதற்காக வடிவமைக்கப்படும் systems என்ற வேகமாக வளர்ந்து வரும் வகையில் இந்த மாதிரியை இது வைக்கிறது.

சுருக்கப்பட்ட மூலத்தில் குறிப்பிடப்பட்ட பல benchmarks-ல், இந்த மாதிரி அதன் முந்தைய பதிப்பை மிஞ்சி, Alibaba-வின் Qwen3-Omni-க்கு அருகில் போட்டியிடுகிறது. குறிப்பாக கவனம் ஈர்க்கும் எண்ணிக்கை OSWorld-ல், GUI agents-க்கான benchmark-இல், previous version-இன் 11.1-இலிருந்து 47.4 points-ஆக accuracy உயர்ந்ததாக report கூறுகிறது. அதே interactivity level-இல் throughput Qwen3-Omni-யை விட ஒன்பது மடங்கு அதிகம் என Nvidia கூறுகிறது.

Edge AI software layer diagram from Numurus.

Edge AI could become the Windows layer for robots

A growing class of edge AI software aims to make robots easier to build and deploy by abstracting hardware complexity for non-specialists.

Read article

பெரிய கதை training recipe பற்றியது

இந்த release-இல் மிக வெளிப்படையான விவரம் training pipeline ஆக இருக்கலாம். மூல உரையின்படி, Nvidia ஏழு training stages-களில் சுமார் 717 billion tokens-ஐ செயலாக்கியது; ஒவ்வொரு கட்டத்திலும் context window விரிவடைந்தது. synthetic data-வில் குறிப்பிடத்தக்க பகுதி பிற முக்கிய மாதிரிகளிலிருந்து வந்தது.

இந்த article image captions, question-answer pairs, மற்றும் reasoning traces ஆகியவை Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen2.5-VL-72B-Instruct, OpenAI-வின் gpt-oss-120b, Kimi-K2.5, GLM-4.1V-9B-Thinking, மற்றும் DeepSeek-OCR போன்ற மாதிரிகளைப் பயன்படுத்தி உருவாக்கப்பட்டதாக கூறுகிறது. filtering-க்கு GPT-4o மற்றும் Gemini 3 Flash Preview பயன்படுத்தப்பட்டன.

இது முக்கியமானது, ஏனெனில் இது அடிக்கடி பேசப்பட்டாலும் பகுதி மட்டுமே ஆவணப்படுத்தப்பட்டுள்ள ஒரு உண்மையை வெளிப்படையாகக் காட்டுகிறது: frontier-capable multimodal systems increasingly மற்ற முன்னணி மாதிரிகளின் outputs-ஐப் பயன்படுத்தி பயிற்சி பெறுகின்றன. synthetic data இனி சிறிய கூடுதல் கூறு அல்ல. அது போட்டித்தன்மை கொண்ட model development-இன் மையப் பொருளாக மாறியுள்ளது.

இது AI துறைக்கு ஏன் முக்கியம்

இதன் விளைவுகள் Nvidia-வைத் தாண்டி செல்கின்றன. frontier-capable multimodal systems மற்ற frontier models-இன் layered interactions மூலம் பயிற்சி பெறுகின்றன என்றால், AI முன்னேற்றம் மேலும் recursive ஆகிறது. நிறுவனங்கள் வெறும் original architectures-ஐ மட்டுமல்ல, ஏற்கனவே உள்ள systems ecosystem-இல் இருந்து capabilities-ஐ curate, filter, மற்றும் distill செய்தும் வருகின்றன.

இதனால் போட்டி நிலை பல வகையில் மாறுகிறது:

weights-உடன் data மற்றும் pipeline decisions-ஐயும் வெளிப்படுத்தும் open releases அதிக மதிப்புள்ளதாகிறது
model development increasingly மற்ற சக்திவாய்ந்த systems-க்கு synthesis மற்றும் filtering-க்கான அணுகலைப் பொறுத்துள்ளது
performance gains architecture மாற்றங்களுக்கே அல்ல, data orchestration-க்கும் சார்ந்திருக்கலாம்
commercially usable open models agents மற்றும் multimodal tooling-ல் downstream product development-ஐ வேகப்படுத்தலாம்

அந்த வகையில், Nemotron 3 Nano Omni ஒரு product மட்டுமல்ல, ஒரு disclosure event-மும் கூட. benchmarks charts-ஐத் தாண்டி வெளியிட தயாராக இருக்கும் போது, துறை உண்மையில் எவ்வாறு இயங்குகிறது என்பதைக் காட்டுகிறது.

‘Count Anything’ Aims to Turn Object Counting Into a General AI Skill

A new research model called Count Anything uses text prompts to count objects across images ranging from crowd scenes to medical scans and satellite views.

Read article

Agentic AI design choices-ஐ இயக்குகிறது

மாதிரியின் architecture மற்றும் benchmark emphasis தற்போதைய சந்தை முன்னுரிமையை agents மீது பிரதிபலிக்கிறது. நீண்ட context window, multimodal inputs, மற்றும் வலுவான OSWorld gains ஆகியவை, interfaces, documents, மற்றும் media-ஐ தொடர்ச்சியான workflow-இல் புரிந்துகொள்ளும் அமைப்பை நோக்கி சுட்டுகின்றன.

இது முக்கியம், ஏனெனில் agentic AI-க்கு chat-only model-ஐவிட வேறுபட்ட தேவைகள் உள்ளன. visual மற்றும் textual information across நல்ல grounding, நீண்ட task-களில் அதிக robustness, மற்றும் interactive speeds-ல் அதிக திறன் தேவை. அதனால் Nvidia-வின் comparable interactivity levels-ல் மேம்பட்ட throughput என்ற கூற்று ஒரு lab metric மட்டுமல்ல; deployment constraint-ஐ நேரடியாகச் சொல்லுகிறது.

இந்த வெளியீடு open models இனி குறுகிய அல்லது இலகு multimodal roles-க்குள் மட்டுப்படவில்லை என்பதையும் காட்டுகிறது. weights, பகுதி training data, மற்றும் pipeline visibility உடன் commercially usable system என்பது closed APIs-ஐ மட்டும் நம்பாமல் multimodal agents உருவாக்க விரும்பும் நிறுவனங்களுக்கு ஒரு முக்கிய building block ஆகும்.

மாதிரி உருவாக்கத்தின் அடுத்த கட்டத்துக்கான தெளிவான பார்வை

Nemotron 3 Nano Omni முக்கியமானது, ஏனெனில் இது துறையின் பல மாற்றங்களை ஒரே release-இல் ஒன்றிணைக்கிறது: open multimodality, agent-focused design, heavy synthetic data usage, மற்றும் training stack பற்றிய அதிக வெளிப்படைத்தன்மை. benchmark results கவனத்தை ஈர்க்கும்; ஆனால் ஆழமான அர்த்தம் leading AI systems இப்போது பிற leading systems-உடனான விரிவான interaction மூலம் கட்டப்படுகின்றன என்பதையே ஏற்றுக்கொள்வதில் உள்ளது.

இது Nvidia-வின் பணியை குறைக்கவில்லை. மாறாக, கடினமான சிக்கல்கள் இப்போது எங்கே உள்ளன என்பதை இது மறுவரையறை செய்கிறது. திறமையான multimodal model ஒன்றை உருவாக்க இப்போது architecture, compute, evaluation, filtering, மற்றும் synthetic data strategy அனைத்தும் ஒருசேர தேவைப்படுகிறது. மாதிரி என்பது ஒரு ecosystem-இன் விளைவு; வெறும் ஒரு training run-இன் விளைவு அல்ல.

டெவலப்பர்கள் மற்றும் ஆராய்ச்சியாளர்களுக்கு, இந்த வெளியீடு ஒரு பயனுள்ள கருவியையும், industry practice-இன் அதிக நேர்மையான சித்திரத்தையும் வழங்குகிறது. பரந்த AI துறைக்கு இது ஒரு எளிய கருத்தை மீண்டும் வலியுறுத்துகிறது: open multimodal AI-யின் எதிர்காலம் parameter counts போலவே pipeline design மற்றும் data provenance-ஆலும் வடிவமைக்கப்படும்.

இந்தக் கட்டுரை The Decoder-இன் செய்தியளிப்பை அடிப்படையாகக் கொண்டது. மூலக் கட்டுரையைப் படிக்கவும்.

A Gatik autonomous truck at a PepsiCo distribution center.

Gatik Deepens Autonomous Freight Role in PepsiCo Network

PepsiCo has expanded its work with Gatik in a multi-year deal that brings autonomous middle-mile freight deeper into North American food and beverage logistics.

Read article

Originally published on the-decoder.com