Case study: Automating neonatal genomic screening pipeline
The Client
Genepath is an Australian owned and operated company, using next generation genetic testing for neonatal testing. Genepath developed, NextGen, which is the world’s first accredited, peer reviewed neonatal screening test using Next-Generation DNA sequencing technology [1]. They have been accredited to International Standard ISO15189 for medical pathology testing by The Royal College of Pathologists of Australasia and the National Association of Testing Authorities, Australia.
The Challenge
Approximately 8% of the Australian population are living with any one of about 10,000 known rare genetic diseases.
The current heel prick newborn screening technology detects about 25 treatable genetic conditions and identifies approximately 300 affected babies in Australia every year. In comparison, Nextgen can detect over 200 treatable genetic conditions and is estimated to identify over 1278 children with a treatable genetic condition every year.
However, identifying these genetic variants from a pool of millions of variants can be very time consuming and labor intensive process.
To speed up the genetic screening process and to streamline the manual curation process, Genepath approached CSIRO to automate the existing pipeline and suggest future downstream machine learning improvements.
I'm very pleased with the outcome of the project and am excited to see it used in production! I particularly like CSIRO's thoughts on Transfer Learning as a future direction for improvement.
Dr. Bennett Shum, Chief Scientific Officer
The Solution
CSIRO built an automation around variant calling pipeline allowing Genepath researchers to look at the variants they have not seen before. The solution was designed to be flexible to easily incorporate within Genepath's existing pipeline and extensible to allow CSIRO to build a cloud-based solution in the future to further speed up the variant calling process.
The Genome Insights team also realised the need to validate/expand certain components of Anovar's annotation pipeline, specifically for intron positions and used the variantValidator API to fetch this information. The API can be customised to fetch any other relevant information. To further improve the automation, CSIRO used Clinvar results from Anovar annotation to prioritize pathogenic variants, flagging them as "Prioritize for curation".
The Outcomes
CSIRO was able to enhance Genepath's variant calling pipeline by enabling:
- Automating Variant prioritization - Using the scripts to comb through million of variants and prioritizing pathogenic variants.
- Making the prioritization process future ready - built solution to easily port to cloud in future.
- Making the system more interoperable - Validating Anovar's results using variantValidator API.
CSIRO also leveraged the group's machine learning expertise to guide the future enhancements of Genepath's variant calling pipeline. Specifically, CSIRO proposed the use of ML tools/techniques like Polyphen, CADD and DANN, allowing the use of multiple features including conservation metrics and regulatory information for more accurate prediction of protein disruption. CSIRO also propose the use of transfer learning to pro-actively reduce the innate ethnic bias within the dataset.