BGI-Research and Zhejiang Lab have released Genos-Mutation and Genos-Reg, two new models that predict how DNA mutations and cellular states alter gene expression. Both build on the Genos genomic foundation model, released in 2025 and published in GigaScience, trained on 636 diverse human genomes spanning global populations, which processes DNA sequences of up to one million base pairs at single-nucleotide resolution.
Over two decades of sequencing advances have brought scientists close to reading the full three billion letters of the human genome. Yet reading the sequence is only half the challenge. When a single DNA letter changes, does gene activity shift? Why might a variant outside a protein-coding gene still contribute to disease? Why does the same stretch of DNA drive different programs in immune cells versus neurons? Answering these questions in the laboratory is costly and time-consuming, often requiring weeks of sample preparation, sequencing, and analysis for each candidate variant, and scaling poorly when hundreds of variants or multiple cell types must be tested. Genos-Mutation and Genos-Reg approach these questions from two complementary directions: mutation effects and epigenetic regulation.
Genos-Mutation compares predicted RNA expression tracks between reference and variant sequences, highlighting potential expression changes and alternative splicing signals introduced by a specific mutation.
Genos-Mutation predicts how a specific DNA change may alter the genes a cell uses. The model takes a sequence window around a variant of interest and generates predicted RNA expression tracks for both the original and mutated versions, then compares the two to show where and how strongly expression may shift, including potential effects on splicing. It operates in two modes. A single-cell-type mode provides high-resolution analysis within one cellular background, drawing on hundreds of personal genomes with matched expression data to capture how real human genetic diversity shapes transcriptional output. A multi-cell-type mode predicts the same variant's effects simultaneously across multiple tissues and cell types, revealing cases where a mutation alters expression in immune cells but remains silent in neurons, or is activated only under specific regulatory conditions. Traditional variant interpretation asks whether a mutation is likely harmful; Genos-Mutation goes further, predicting how it may change expression, in which genomic region, and in which cell types. This makes it particularly valuable for noncoding variants, where conventional criteria offer limited guidance and experimental RNA data may be unavailable.
Genos-Reg uses DNA sequence and ATAC-seq chromatin accessibility signals as dual inputs to predict how different cellular states produce distinct gene expression patterns from the same underlying genetic code.
Genos-Reg addresses a different question: how the same DNA produces different outcomes in different cells. Nearly all human cells carry an identical genome, yet a neuron and a liver cell perform entirely different functions. The difference lies in which regions of DNA are physically accessible at any given moment, a property known as chromatin accessibility. Regions that are "open" can be read by the cell's machinery; regions that are "closed" remain silent. Genos-Reg takes both DNA sequence and a chromatin accessibility profile (ATAC-seq data) as input, predicting the gene expression pattern for a given cellular state rather than from sequence alone. In a demonstration using natural killer cells, the model predicted distinct expression patterns across different NK cell states based solely on differences in their chromatin profiles, and these predictions aligned with independent experimental measurements. This ability to simulate how changes in chromatin openness reshape gene activity represents an early step toward virtual cell modeling. Researchers can alter the chromatin input and observe how predicted expression shifts, running virtual perturbation experiments that would be costly and slow to perform in the laboratory.
Predicted RNA expression tracks for different natural killer cell states, generated from the same DNA sequence but distinct chromatin accessibility profiles, show agreement with independent single-cell RNA sequencing measurements.
Rather than replacing laboratory experiments, these models enable large-scale computational screening that narrows the field for targeted validation. In genetic disease research, Genos-Mutation can help interpret variants that resist classification by conventional criteria, turning static DNA changes into interpretable expression predictions. In immunology and cell biology, Genos-Reg can simulate expression differences across cellular states, bridging sequence data and functional readouts. Both models are available through the BGI DCS Cloud platform. Extending this framework to additional tissue types, developmental stages, and data modalities, including proteomic, metabolomic, and spatial transcriptomic measurements, could further illuminate how genetic variation and cellular context jointly shape biological function, advancing genomic AI from reading the genome's letters toward interpreting their meaning.