- Sequencing refers to the experimental process of determining the precise order of the building blocks that make up a biological molecule, most commonly nucleic acids (DNA or RNA) or proteins.
- In the case of DNA sequencing, it involves identifying the linear arrangement of nucleotide bases—adenine (A), thymine (T), cytosine (C), and guanine (G). Similarly, RNA sequencing reveals the order of ribonucleotides, while protein sequencing uncovers the sequence of amino acids. This fundamental information is crucial for understanding the genetic blueprint of organisms, how genes are expressed, and how proteins function in health and disease.
- The history of sequencing began with the development of protein sequencing methods by Frederick Sanger in the 1950s, followed by DNA sequencing breakthroughs in the 1970s. Sanger’s chain-termination method became the gold standard for DNA sequencing for decades, powering monumental achievements such as the Human Genome Project, which was completed in 2003. Although highly accurate, Sanger sequencing was relatively slow and expensive, limited to analyzing relatively short stretches of DNA at a time.
- The advent of next-generation sequencing (NGS) revolutionized the field by allowing millions of DNA fragments to be sequenced in parallel, dramatically reducing cost and time while generating massive amounts of data. NGS technologies, such as Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, and 454 pyrosequencing, enabled whole genomes, transcriptomes, and epigenomes to be analyzed at unprecedented scale. More recently, third-generation sequencing methods, including Pacific Biosciences’ single-molecule real-time (SMRT) sequencing and Oxford Nanopore sequencing, allow for long-read sequencing that captures entire genes or chromosomes in one read, providing better resolution of repetitive regions and structural variations.
- Beyond DNA, RNA sequencing (RNA-Seq) has become a central tool in transcriptomics. It not only identifies which genes are expressed but also quantifies expression levels, detects alternative splicing events, and uncovers novel transcripts. Protein sequencing, though less high-throughput than nucleic acid sequencing, remains essential for characterizing protein structures, modifications, and functions, often using mass spectrometry-based approaches. Together, these methods provide a holistic view of biological information flow from DNA to RNA to protein.
- The applications of sequencing are vast and transformative. In medicine, sequencing has paved the way for precision medicine by identifying genetic mutations linked to cancer, inherited disorders, and infectious diseases. Pathogen sequencing has become critical in epidemiology and public health—for example, tracking the evolution of viruses such as SARS-CoV-2 during the COVID-19 pandemic. In agriculture, sequencing is used to improve crop breeding, enhance livestock genetics, and monitor plant and animal pathogens. In ecology and microbiology, metagenomic sequencing enables the characterization of microbial communities in diverse environments, from soil and oceans to the human gut microbiome.