- The Gene Expression Omnibus (GEO) is a public functional genomics data repository maintained by the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine.
- It serves as a central resource for storing, sharing, and accessing high-throughput experimental data that investigates the molecular mechanisms of biological systems.
- GEO primarily houses data generated through gene expression profiling technologies, including microarrays, next-generation sequencing (RNA-seq, ChIP-seq), and other high-throughput functional genomic assays. By providing a platform for standardized data submission and retrieval, GEO promotes transparency, reproducibility, and the reuse of experimental data in biomedical research.
- Researchers submit their experimental datasets to GEO along with detailed metadata, including study design, sample characteristics, and data processing methods. Each submission is organized into several levels of records: Series (overall experiment description), Samples (biological materials analyzed), Platforms (the array or sequencing technology used), and processed DataSets for downstream analysis. This structure enables users to navigate easily between raw data, processed data, and contextual information, facilitating comparative studies and meta-analyses across diverse biological systems and conditions.
- GEO provides a user-friendly web-based interface for data browsing, searching, and visualization. The database integrates powerful tools such as GEO2R, which allows users to perform differential expression analysis directly online, and visualization features like heatmaps, box plots, and volcano plots. Additionally, GEO is interoperable with other NCBI resources such as PubMed, GenBank, and RefSeq, ensuring a seamless connection between experimental data, associated publications, and genomic annotations.
- Since its establishment in 2000, GEO has become one of the most widely used repositories for functional genomics data, hosting millions of individual samples from tens of thousands of studies across multiple organisms. It has played a critical role in advancing systems biology, biomarker discovery, disease research, and drug development by enabling researchers to validate findings, conduct secondary analyses, and explore gene regulation at a systems level. By providing open access to high-quality curated data, GEO continues to be an indispensable resource for the global scientific community.