We strive towards making our scientific software solutions platform-independent as well as increase their accessibility and ease of use. We believe that software solutions should be flexible enough to reach their end users on the platform of their choice, not vice versa.
Another step towards this goal of platform independence is making VariantSpark available on Google Cloud Platform via Terra notebook. This extends its current accessibility via databricks (AWS and Azure), scala API, Python API, Hail API.
Currently, the most extensive usage of VariantSpark takes advantage of AWS EMR or EC2 instance with either Python or Hail API, because the underlying AWS infrastructure along with Spark cluster gives VariantSpark the computing power and speed to perform necessary computations parallelly. With Terra, users can now utilise the same computing power using Google Cloud Platform and perform fast, sensitive machine learning computations on genomic data.
Starter script for VariantSpark
To make the notebook easily reproducible, we have created a starter script that installs and builds VariantSpark in the environment so that the user can simply import VariantSpark into their notebook.
With this starter script, users can directly call the python API to perform computations. Terra and VariantSpark work synergistically using Spark: Terra provides users with the ability to work with Spark clusters in their notebook and VariantSpark can use Spark cluster to parallelly process huge datasets seamlessly.
A winning concept
Using this notebook we participated in the Terra Open Science Competition, which was run as part of this year's ISMB/ECCB conference in Basel. The evaluation criteria were the "coolness of science", "reproducibility", and "ease of use". Our notebook scored third place and Natalie is presenting the work at the Terra workshop.
Researchers who prefer to use Google Cloud Platform for their cloud computing needs can now use the Terra interface to run VariantSpark. The starter script we provided is very easy to reproduce and can also be extended by end users to use their own dataset.
Let us know if you've used Terra before and where you think this platform is going...