Géraldine Van der Auwera, 101 Genomes consultant and author of “Genomics in the Cloud”.
Géraldine Van der Auwera is a talented researcher from Brussels. She lives in the USA, where she worked for many years at the Broad Institute (co-created by MIT and Harvard University). She is the author of one of the very first books devoted to the alliance between the Cloud and genomics, and helped set up the “Terra” platform. She has been lending her expertise to the Global Alliance for Genomics and Health (GA4GH) and the 101 Genomes Foundation since 2020. Ludivine interviewed her during one of her rare visits to Brussels to try and shed some light on Géraldine’s unique expertise and what she brings to 101 Genomes.
Ludivine: Hello Géraldine, would you like to introduce yourself?
Géraldine: I trained as a microbiologist, and switched to human genomics after a post-doctorate in microbial genetics. I worked for ten years at the Broad Institute of MIT and Harvard, a genomics research institute in Boston, USA. Here, I was mainly responsible for providing technical and scientific support to researchers using certain bioinformatics tools made available to the scientific community by the Broad Institute. I’m currently a freelance consultant in bioinformatics and scientific communication. As an additional activity, I co-direct the Large-Scale Genomics Workstream of the
Global Alliance for Genomics and Health
(GA4GH), an international organization developing technical standards and regulatory frameworks to promote the responsible sharing of genomic data.
L.: Can you explain your role in the 101 Genomes Foundation and why you decided to get involved in 2020?

G. : You and Romain contacted me with questions about the Broad Institute’s large-scale genomic studies, and the migration to the cloud of genomic data analysis and sharing systems, which play a key role in the implementation of such studies. I was both touched by their family history and impressed by their approach to creating the Foundation, which made me want to lend them a hand. I play an advisory role, mainly in the development of the cloud infrastructure to support the scientific aims of the project.
L.: You wrote “Genomics in Cloud”. This is one of the first books devoted to using the Cloud to preserve and study the genome. Can you tell us a little more about the book?
G. The book Genomics in the Cloud, published by O’Reilly Media in 2020offers both a theoretical and practical introduction to the analysis methods used in human genomics, focusing primarily on data processing and variant identification from sequencing data, using cloud infrastructure.
My co-author, Brian O’Connor, and I designed this book based on our shared experience at the intersection of genomics and computer technology. It’s a highly interdisciplinary field, bringing together specialists in biology and medicine, who generally have very little computer science training in their background, and technologists, programmers and other IT infrastructure professionals, who find themselves confronted with particularly complex scientific concepts and vocabularies.
Our book offers readers the chance to upgrade their technical and scientific knowledge through practical exercises, with very few prerequisites, with the aim of making human genomics more accessible.
L.: You put a lot of work into developing “Terra”. Why do you think 101 Genomes is a good candidate to join “Terra on Azure”?
G. One of the major difficulties encountered by rare disease research associations is the fragmentation of genomic data sources. Many studies are based on relatively small numbers of patients, too small to be able to use the large-scale analysis techniques needed to examine complex genetic mechanisms in a statistically robust way.
The solution to this problem is to federate data from multiple studies. The Terra platform has been designed to enable such data federation within an open-source scientific ecosystem that promotes scientific collaboration while protecting data security and ownership.

101 Genomes is an excellent example of a project that can benefit from such a platform to achieve its scientific goals without having to take on the development and operation of a complete infrastructure. Having already set up a data lake on Microsoft’s Azure cloud, the F101G will soon be able to connect its data lake to Terra on Azure, enabling research teams to analyze this data collaboratively via Terra. As other groups migrate their data to this data federation ecosystem, these analyses will gain in statistical power and push forward our understanding of the biological mechanisms involved.
L.: Is there anything else you’d like to add?
G. I think it’s important to remember that human genomics is coming back,
ultimately
to humanity’s global biological heritage. This has ethical implications, but also very practical ones: we can only achieve sufficient understanding of the human genome if we have sufficient representation of populations around the world. That’s why it’s essential to work towards in-depth international collaboration, as the Global Alliance for Genomics and Health. It’s an effort that requires the participation of stakeholders from all walks of life – researchers, doctors, technologists, forensic scientists, politicians, as well as patient associations and even the general public.
