101 Genomes’ Genomic Cloud has been operational since June 2022. To date, this bioinformatics biobank (Bio-Biobank) is probably already one of the largest genomic databases (WGS) of Marfan syndrome patients in the world (if not the largest). And everything’s in place for it to keep on growing! This revolutionary resource will enable bioinformatics researchers to better understand Marfan syndrome, and could serve as a pilot for research into other rare diseases. Colby T. Ford is the architect of our Genomic Cloud. He looks back on his work with us, and answers Ludivine’s questions on the recent release of his book “Genomics in the Azure Cloud”, which was inspired by this experience.
Ludivine: Hello Colby, can you introduce yourself?
Colby: My name is Colby T. Ford, PhD in science and mathematics, and I’m a scientist specializing in genomics and cloud architect. I’m the owner of Tuple, a Microsoft and Databricks partner consultancy specializing in creating cloud-based genomic solutions for life sciences organizations. Apart from Tuple, I’m a passionate researcher in human genomics and infectious diseases. I have contributed to subjects such as oncology, immunology, malaria, SARS-CoV-2, etc. I’m a Microsoft Certified Trainer and the author of Genomics in the Azure Cloud, published by O’Reilly Media in 2022.
L.: Can you briefly explain your role as cloud architect for the F101G?
C. My role as a consultant for F101G was to work with the founders of F101G to understand the objectives of their cloud genomics platform (for Marfan syndrome as a starting point). We began by creating a genomicdata lake to house all the genomic and phenotypic data from the study participants. I then collaborated with other team members to set up data pipelines to collect genomes from sequencing providers. We have also set up IT services to analyze and visualize data from the data lake. These included bioinformatics pipelines and logic for scalable querying of variant data, as well as a DICOM visualization application for viewing patient imaging data (X-rays and MRIs). Finally, we worked closely with a security consultant and achieved ISO 27001:2013 compliance for the entire cloud architecture.
L.: What do you think of your collaboration with the F101G and the F101G project?
C. The F101G project as a whole was an interesting challenge with a very important research objective. Coming from the US, I was unfamiliar with European regulations and rules on patient data, so I was delighted to learn more.
I’d worked on other rare diseases in the past, but Marfan syndrome wasn’t one of them. I’m always keen to work on a new biological use project, a new disease, a new drug target, etc., as part of various assignments for my customers.
What’s more, the collaboration with F101G was quite unique in that we were able to collaborate both scientifically on the study of the disease and technically on the design of the cloud architecture. I love the F101G team’s drive to transform research into Marfan syndrome and other rare diseases in general, through an innovative, cloud-oriented approach.
L.: You recently published a book entitled “Genomics in the Azure Cloud”. Can you tell us more about it?
C. This book provides a foundation of essential considerations for building a cloud architecture in the field of genomics. I wrote this book because I noticed that there wasn’t much content or examples for enterprise-scale genomics, although there are plenty for finance, retail and other sectors. In the book, I detail the issues surrounding data platform services such as data lakes and data warehouses, and then we look at IT services that can help automate and scale bioinformatics data processing. This book is aimed at scientists who want to learn how to work better in Azure, as well as cloud architects who want to learn more about solutions for managing genomics workloads.
L.: Is there anything else you’d like to add?
C. I sincerely believe that the work we’ve done with F101G will be revolutionary for Marfan syndrome research. What’s more, the architecture and cloud computing resources we’ve put in place can easily be extended to other rare diseases in the future. It will be amazing to see how the Azure cloud contributes to providing evolving information in disease research over time!
