Since the premiere of the wildly popular 1993 dinosaur cloning movie Jurassic Park, the sciences featured in the film, genetic engineering and genomics, have advanced at a breathtaking rate. When the film was released, the Human Genome Project was already working on sequencing the entire human genome for the first time. They completed the project in 2003 after 13 years and at a cost of $1 billion. Today, the human genome can be sequenced in less than a day and at a cost of less than $1,000.
One of the leading genomics research organizations, the Wellcome Sanger Institute in England, has a mission to improve the health of all humans by developing a comprehensive understanding of the 23 chromosomes in the human body. They rely on cutting-edge technology to operate at incredible speed and scale, including reading and analyzing an average of 40 trillion base pairs of DNA per day.
Along with advances in DNA sequencing techniques and computational biology, high performance computing (HPC) is at the heart of advances in genomics research. Powerful HPC helps researchers process large-scale sequencing data to solve complex computational problems and perform computationally intensive operations on massive resources.
Large-scale genomics
Genomics is the study of an organism’s genes or genome. From curing cancer and fighting COVID-19 to better understanding the evolution and cell growth of humans, parasites and microbes, the science of genomics is booming. The global genomics market is expected to reach $94.65 billion by 2028, from $27.81 billion in 2021, according to Fortune Business Insights. Enabling this growth is an HPC environment that daily contributes to a better understanding of our biology, helping to accelerate the production of vaccines and other health approaches around the world.
Using HPC resources and mathematical techniques known as bioinformatics, genomics researchers analyze massive amounts of DNA sequence data to find variations and mutations that affect health, disease, and response to medication. The ability to search among the approximately 3 billion units of DNA across 23,000 genes in a human genome, for example, requires massive amounts of computing, storage and networking resources.
After sequencing, billions of data points must be analyzed to look for things like mutations and variations in viruses. Computational biologists use pattern-matching algorithms, mathematical models, image processing, and other techniques to derive meaning from this genomic data.
A genomic powerhouse
At the Sanger Institute, scientific research takes place at the intersection of genomics and HPC computing. Institute scientists tackle some of the toughest challenges in genomics research to fuel scientific discovery and push the boundaries of our understanding of human biology and pathogens. Among many other projects, the Institute’s Tree of Life program explores the diversity of complex organisms found in the UK through sequencing and cell technologies. Scientists are also creating a reference map of different types of human cells.
Science on the scale of that conducted at the Sanger Institute requires access to massive amounts of data processing power. The Institute’s Informatics Support Group (ISG) helps meet this need by providing high-performance computing environments to Sanger’s scientific research teams. The ISG team provides support, architecture design, and development services for the traditional Sanger Institute HPC environment and extensive OpenStack private cloud computing infrastructure, among other HPC resources.
Responding to a global health crisis
During the COVID-19 pandemic, the Institute began working closely with public health agencies in the UK and academic partners to sequence and analyze the SARS-COV-2 virus as it emerges. evolution and its spread. The work has been used to inform public health measures and to help save lives.
As of September 2022, over 2.2 million coronavirus genomes have been sequenced at Wellcome Sanger. They are immediately made available to researchers around the world for analysis. Mutations that affect the virus’ spike protein, which it uses to bind to and enter human cells, are of particular interest and are the target of current vaccines. Genomic data is used by scientists along with other information to determine which mutations may affect the virus’ ability to transmit itself, cause disease or evade the immune response.
Society’s better understanding of genomics and the informatics that comes with it has accelerated the development of vaccines and our ability to respond to diseases in ways that have never been possible before. Along the way, the world is witnessing firsthand the incredible power of genomic science.
Learn more about genomics, informatics, and HPC in this Wellcome Sanger Institute white paper and case study.
***
IntelĀ® Technologies Advance Analytics
Data analytics is the key to unlocking the maximum value you can extract from your organization’s data. To create a productive, cost-effective scanning strategy that gets results, you need high-performance hardware that’s optimized to work with the software you’re using.
Modern data analytics covers a range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). New to analytics? Ready to evolve your analytics strategy or improve your data quality? There’s always room to grow, and Intel is ready to help. With a broad ecosystem of analytics technologies and partners, Intel accelerates the efforts of data scientists, analysts, and developers across industries. Learn more about Intel Advanced Analytics.
#Breakthroughs #genomic #science #happening #faster #HPC