
ByteDance Study Finds Long-Document AI Learns Better From Questions Than From Transcribing Text
Researchers from ByteDance Seed and HKUST report that question-answer training improved long-document performance in multimodal models, while pure text-recognition training actually made results worse.
- Researchers compared OCR-style training with question-answer supervision for long documents.
- The study reports that pure text-recognition training worsened performance.







