Denis talked to AWS's Simon Elisha (Head of Technology and Transformation) on the AWS Podcast about her group's most recent achievements on the cloud.
Like the "This is my Architecture" video the two recorded in 2017, it was a high-energy and fun-filled show that covered a wide range of topics from genome research to AWS Marketplace deployments. Listen to it here or jump to the three take-aways below.
Why we do genomic research in the cloud
Our research aims to understand how information encoded in our genome can lead to diseases. In fact, a single "mis-spelling" in the 3 billion letter long genome can have debilitating consequences. To gain actionable insights from 3 billion letters, our team develops special Machine Learning tools capable of handling ultra-high dimensional data (e.g. VariantSpark).
Developing such sophisticated technologies, requires lots of experimentation on relatively small synthetic data. We need to be able to do hyper-parameter tuning efficiently and the cloud allows us to spin up multiple appropriately sized clusters economically. With the insights from this exploration, we can then launch the full scale analysis within the same framework, where we frequently scale up to Spark clusters that have 7.5 TB of RAM, when analysing 100M variants from 100K individuals resulting in a matrix with 10 Trillion entries.
The flexibility of scaling up effortlessly combined with having industry-standard security at our fingertips is what makes our genomics research world-class.
Ultra-high dimensional data is coming - let's get ready
While genomics is thought to be leading the space in terms of producing truely big data (more than Twitter, YouTube and Astronomy combined), the data-fication from automatic (sensor) data collections means other disciplines are quickly catching up.
Interestingly, genomic research is not too dissimilar from wanting to understand supply chains or ensure the smooth operation of a production plant, when you think of the genetic code as the "sensors" that influence "operational" outcomes. We believe that especially VariantSpark is directly applicable to other domains for gaining insights from millions of features across tens of thousands of samples.
VariantSpark is available on the AWS Market, which enables you to spin up this sophisticated machine learning solution in your own account next to your data, so you can gain game-changing insights today.
"Sciencing" industry at the speed of cloud.
It is estimated that 83% of enterprise workloads will be in the cloud by the end of 2020, and Digital Marketplaces enable customers to consume digital products with “procurement at the speed of cloud”. This distribution channel is pegged to reach a US $5.8 Trillion Market by 2022.
However, it also represents a unique opportunity for academia to offer digital products circumventing the bottlenecks of licence negotiations and distribution.
VariantSpark represents the world-wide first digital health product from a public sector organisation on the AWS Marketplace.
Digital Marketplaces heralds new era where science can accelerate industry solution at the speed of cloud.
Resources
- AWS Podcast episode: https://aws.amazon.com/podcasts/367-genomic-analysis-in-the-aws-marketplace-variantspark-from-csiro/
- VariantSpark on the AWS Marketplace: https://aws.amazon.com/marketplace/pp/B07YVND4TD