The dawn of cloud native bioinformatics

The University of Queensland organises every year the Mathematical and Computational Biology Winter School, which designed to introduced to give advanced undergraduate and postgraduate students, postdoctoral researchers and others working in the fields of life sciences an overview of the discipline.

With big-name international presenters, the school regularly attracts 280+ attendees, and this year was at is absolute capacity. This attests to the increasing popularity of bioinformatics and related skills in modern life science analysis.

Denis was invited to deliver the "Future of Bioinformatics" talk, where where an "eminent" person in the field gives their view on the direction the field is taking. It comes as no surprise to anyone that Denis' topic was "The dawn of cloud native bioinformatics". Denis' presentation can be found here.

Denis Bauer @allPowerde, CSIRO’s transformational bioinformatics leader, speaking at @UQwinterSchool on the future of bioinformatics and the dawn of cloud-native bioinformatics. UQwinterSchool pic.twitter.com/vgLBFCcE1n
— Nick Hamilton (@DoktrNick) July 3, 2019

Denis' main message was Bioinformatics has become increasingly collaborative because the demands have increased dramatically so that no single group can excel in all domains. Specifically, workflows need to fulfil reproducibility/compliance standards, data set sizes are ever increasing and algorithms become more complex and interconnected.

Working together to satisfy this, will increasingly become only sustainable in the cloud, which in turn will create more opportunities for people to contribute and help build something that is larger than its parts.

In fact, the cloud has already been demonstrated to create new jobs in industry: "48% of businesses using cloud services reported an increase in IT staff and 41% reported a rise in non-IT staff since using cloud services" [ComputerWeekly]. This trend is likely especially positive for the research space.

What a #genomical talk presented by @allPowerde @UQwinterSchool. From the storage needed for #genomics to the #genetic #Hipster Index pic.twitter.com/M62fsCnIEV
— Tamblyn Thomason (@ThomasonTamblyn) July 3, 2019

Denis also showcased how serverless technology can bring communities together online by computing shared phenotypes cost-effectively. The fun demo-application for this, of course, is CSIRO's Hitchhiker's Thumb app.

"Let's build a healthier world together" @allPowerde from @CSIROnews talked about the collaborative future of bioinformatics #FutureOfBioinf and going #serverless a very insightful talk #thumbsup ?? ? @UQwinterSchool #WomeninSTEM #UQwinterschool #hitchhikersthumb #csiro pic.twitter.com/Xt2Kop5Dv3
— Maria Rondon (@MariaRondonG) July 3, 2019

Machine learning on high-dimensional data

One of the main themes of this year's school was Machine Learning, where Arash presented about Random Forests and its application to genome-wide association studies (presentation). This presentation focuses on the core operation of the Random Forest algorithm to describe the process in which interactions between genetic markers are taken into account. Arash discussed some of the weaknesses of Random Forests and pointed out possible solutions for them. The slides include the implementation details of VariantSpark where the proper partitioning and advance parallelisation allow processing large scale genomic data.

Next up at @UQwinterSchool, CSIRO's Arash Bayat on the powerfiul VariantSpark software for cloud-based machine learning for big genomic data. #UQwinterSchool pic.twitter.com/9UU2DEec4t
— Nick Hamilton (@DoktrNick) July 5, 2019

Arash also gave a live demo on VariantSpark: A cloud-based machine learning approach for big genomic data, which we will make available shortly. In the meantime, the technical paper for VariantSpark is currently available on BioRxiv. In this demonstration, Arash illustrated the process of deploying VariantSpark on AWS and Databricks clouds. On AWS a cloud formation template facilitates the configuration of an EMR compute-cluster with VariantSpark and Hail installed. Arash also introduces ViGWAS, that is an analysis pipeline for quality control of genomic data.

Cloud-native bioinformatics: @allPowerde on the advantages and uses of cloud computing; such as #VariantSpark - machine learning tool scanning the whole genome to find disease-associated genetic variants such as in ALS or #Hipsters #UQwinterschool @UQwinterSchool pic.twitter.com/5AJzW8eXma
— Dilys Lam (@dilys_lam) July 3, 2019

Image credit: interestedbystandr

Machine learning on high-dimensional data

Subscribe to Transformational Bioinformatics

Subscribe to Transformational Bioinformatics