On October 23, 2025, at the 20th Annual Meeting of the International Conference on Genomics (ICG-20) in Hangzhou, BGI-Research and Zhejiang Lab announced the release of Genos, an open-source Human-Centric genomic foundation model that learns directly from high-quality human genomes representing global ancestries and scales to single-nucleotide precision over ultra-long contexts. The technological details of Genos were also published in GigaScience.
October 23, 2025: Genos was unveiled at the 20th Annual Meeting of the International Conference on Genomics (ICG-20) in Hangzhou.
The study “Genos: A Human-Centric Genomic Foundation Model” was published in GigaScience.
The model comes in two sizes, 1.2B and 10B parameters versions, to balance accuracy, latency, and cost in research and clinical prototyping. It is trained on 636 high-quality telomere-to-telomere level (T2T) de-novo and reference assemblies drawn from anonymized international public resources such as Human Pangenome Reference Consortium (HPRC), Human Genome Structural Variation Consortium (HGSVC) and Centre d'etude du polymorphisme humain (CEPH), with staged curricula up to ~1 Mb inputs; cumulative training tokens are ~1.6T for the 1.2B model and ~2.2T for the 10B model.
Logo of Genos.
Built on an evolved Transformer with a Mixture-of-Experts (MoE) design, Genos routes tokens to two of eight experts per layer and combines Grouped-Query Attention, FlashAttention and large-base-frequency RoPE to sustain signal over long ranges while keeping inference practical on local devices. Inference modes include embeddings, sequence generation and on-demand fine-tuning via adapters or continued training. Together these choices enable single-base resolution modeling up to one million bases, so the model can capture distal regulatory context as well as local motifs.
From data to deployment. Pipeline from single-base tokenization of >600 high-quality human genomes into a MoE Transformer optimized for up to ~1 Mb context, and downstream use for embeddings, sequence generation, and fine-tuning APIs. Shows the routing of two experts per token and the practical paths to inference and adaptation.
Across benchmarks, Genos shows strong and consistent performance from short to ultra-long inputs. In tasks with 8K-length inputs, the 10B model reaches ~0.93 AUC on ClinVar pathogenicity and ~0.75 on enhancer detection; on an ultra-long mutation-hotspot task using Chinese Pangenome data, it attains ~0.99 AUC at 128K. These results suggest the model can help prioritize non-coding variants and read population-level signals directly from sequence.
Genos also powers application prototypes that link sequence to function and to language. After fine-tuning, the model generates single-base RNA-seq profiles from DNA windows and achieves high agreement with experimental data across cell types, while a complementary 4B-parameter Qwen language model, paired with Genos supports omics-aware reasoning on pathway-grounded disease questions.
Text-genome reasoning prototype. A 4B Qwen model paired with Genos reads raw DNA and pathway context to explain variant effects in plain language and predict disease class, illustrating how sequence and knowledge can be combined for interpretable diagnostics.
These technical capabilities translate into actionable pathways across biomedicine. In precision medicine, Genos can predict an individual's disease risks with greater accuracy, facilitating the development of personalized treatment regimens for diseases such as cancer or neurodegenerative disorders. Genos has achieved a 92% accuracy rate in interpreting pathogenic mutations for direct clinical applications. When combined with the 021 Science Foundation Model, the accuracy increases to an impressive 98.3%, providing an innovative and highly efficient tool for clinical diagnosis. Comprehensive evaluation results show that Genos outperforms the current state-of-the-art (SOTA) across most core tasks, demonstrating its exceptional and comprehensive capabilities.
For public health, the model can help with diseases monitoring and provide a scientific foundation for targeted preventive measures and healthcare policies. In developmental biology, Genos can help uncover the genetic mechanisms underlying embryo development, deepening our understanding of how genes regulate the formation of tissues and organs.
In scientific research, the Genos model works alongside BGI’s DCS Cloud to predict RNA expression profiles from DNA sequences in mere seconds, drastically speeding up bioinformatics workflows that previously took weeks or months. Additionally, its integration into the CNGBdb allows users to accurately forecast cell expression levels, effectively pinpoint and validate critical candidate genes, and greatly enhance the speed of scientific discoveries.
For clinical use, the Genos model, paired with BGI's GeneT deep reasoning model, provides advanced multimodal interpretations to aid in diagnosing genetic diseases. Meanwhile, in personal health applications, the seamless integration of the Genos model into BGI’s BGE platform facilitates the analysis of personal genomic data, transforming complex genetic information into easy-to-understand, personalized health reports.
The core research team behind Genos emerged from the Foundation Model Training Program, a bold cross-disciplinary initiative jointly launched by BGI-Research and Zhejiang Lab. This program brought together bioinformatics experts and computing specialists under a problem-driven, task-oriented training model, fostering the kind of interdisciplinary collaboration that is now propelling life science research from data mining toward emergent intelligence. It is precisely this cross-domain fusion that supplied the essential momentum for Genos’ transformative innovation.
Genos is released for the community to test, critique and extend. The project team expects continued model updates and more plug-and-play tools for functional genomics, rare variant interpretation, and multi-omics integration. Code, weights, and documentation are openly available on GitHub, Hugging Face, ModelScope and zero2x under the MIT license.
In addition, with optional hosted inference on BGI’s DCS Cloud, the platform provides each account with up to 1B tokens for free, significantly lowering the threshold for intelligent genomic analysis. Users can access Genos with a single click, without the need for complex installation or configuration.
All training and evaluation used de-identified, publicly available human genomic resources curated by international consortia, with ethical approvals managed by those programs; no new human sequencing was performed for this release.
Explore the project at:
Github: https://github.com/BGI-HangzhouAI/Genos
Huggingface: https://huggingface.co/BGI-HangzhouAI
ModelScope: https://www.modelscope.cn/organization/BGI-HangzhouAI
Zero2x: https://www.zero2x.org.cn/genos
The research paper can be accessed here: https://doi.org/10.1093/gigascience/giaf132