CSRIO and ANU researchers have written a review article that could function as a babel fish, translating between two distinct science domains – Machine Learning and Genome Engineering – and foster impactful cross-disciplinary research.  

Machine learning and artificial intelligence (AI) have transformed many disciplines, with Andrew Ng (CEO, Landing AI and deeplearning.ai) famously summarising the latter as, "AI is the new electricity. I can hardly imaging an industry which is not going to be transformed by AI."

The more recently developed technology of genome engineering has a similar transformative potential. Here, the DNA of living cells can be modified to alter their capabilities and traits.

It is hence a momentous phenomenon to see these two juggernaut disciplines join forces to realise the potential of genome editing for manufacturing, agriculture and medicine, more systematically.

However, the language barrier in the distinct jargons between the two disciplines has stifled progress. For example, "effectiveness" in the genome editing world refers to the rate at which changes are introduced into the DNA and depends on molecular conditions, while in ML it refers to how successful a model has generalised from the training data and is measured by the accuracy on the testing data.

Aidan O'Brien has hence written a "rosetta stone", aimed at bridging the gap between ML and genome editing. Published by Briefings in Bioinformatics, one of the most prestigious bioinformatics journals, Mr O'Brien says that the paper discusses ML approaches and pitfalls in the context of CRISPR gene editing applications.

"Specifically, we address common considerations, such as algorithm choice, as well as problems, such as overestimating accuracy and data interoperability, by providing tangible examples from the genome engineering domain."

Mr. O'Brien, a joint PhD candidate with CSIRO and ANU, has a background in machine learning and recently developed a software platform that makes genome editing based approaches for detecting disease genes more effective.

"Equipping researchers with the knowledge to effectively use ML to better design gene editing experiments and predict experimental outcomes will help advance the field more rapidly." says Dr. Gaetan Burgio, the ANU supervisor and co-author of the paper.

The paper gives three recommendations for how data scientists and experimental researchers could improve the field:

1) Making datasets publicly accessible so ML methods can be trained better.

2) Publishing both positive and negative experimental outcomes for balanced training sets.

3) Using ML prediction as hypothesis generation platform for experimental validation.

"ML approaches have already improved the ability to apply genome engineering more precisely, through there is still plenty of room for improvement before genome engineering can be applied in very sensitive domains such as medicine", concludes A/Prof Denis Bauer, CSIRO, who is the senior author on the paper.

Image credit: https://www.maxpixel.net/

O'Brien et al. Domain-specific introduction to machine learning terminology, pitfalls and opportunities in CRISPR-based gene editing Briefings in Bioinformatics 2020