From One Mysterious Molecule to 260,000

In 2018, scientists studying breast cancer stumbled upon something they could not explain. A small RNA molecule they designated T3p was present in tumor tissue but completely absent from healthy cells. It did not match any known gene. It did not correspond to any recognized class of non-coding RNA. It was, in the language of molecular biology, an orphan — a molecule without a home in the existing taxonomy of the human genome. That single puzzling discovery launched a six-year investigation that has now culminated in a finding of remarkable scope: approximately 260,000 previously unknown cancer-specific small RNAs hidden across 32 different types of human cancer.

The research, conducted by Jeffrey Wang, Hani Goodarzi, and their colleagues at the Arc Institute, represents one of the most comprehensive surveys of cancer-specific non-coding RNA ever undertaken. By mining data from The Cancer Genome Atlas — a landmark database containing genomic information from thousands of tumors — the team identified a vast and previously invisible landscape of small RNA molecules that appear exclusively in cancer cells.

Digital Molecular Barcodes

What makes these orphan non-coding RNAs, or oncRNAs, particularly striking is their specificity. Each of the 32 cancer types examined displayed its own distinct pattern of oncRNA expression, creating what the researchers describe as digital molecular barcodes. These barcodes capture cancer identity at multiple levels — distinguishing not only between different tumor types such as breast versus lung cancer, but also between subtypes within a single cancer and even between different cellular states within a single tumor.

To test whether these molecular signatures could be used for practical diagnosis, the team built machine learning classification models trained on oncRNA expression patterns. The results were impressive: the models achieved 90.9 percent accuracy in classifying cancer types from tumor tissue samples. When validated against a separate group of 938 tumors that the models had never seen before, accuracy remained strong at 82.1 percent — a level of performance that suggests real clinical potential.

The ability to classify cancer type from RNA signatures alone could have profound implications for patients with cancers of unknown primary origin, a clinical scenario that affects roughly three to five percent of all cancer patients and carries a particularly poor prognosis because treatment decisions depend heavily on knowing where the cancer originated.