Powering tomorrow's research by connecting today's data.

CSIRO developed Serverless Beacon (sBeacon), the Nature Biotechnology published platform for sharing and querying genomic and medical data data. sBeacon utilises a new cloud-architecture paradigm – serverless, or Function-as-a-Service – that allows for the seamless scaling of compute resources on public cloud providers.

The GA4GH Beacon protocol is the most widely used exchange protocol for genomic data between international cohorts and clinical services. With the release of its version 2, the protocol has evolved to accommodate both genotypic and phenotypic data. However, current implementations struggle to economically scale to the large cohort sizes anticipated in the future. Additionally, these existing implementations lack support for complex queries that encompass both metadata and variant constraints. These limitations stem from two critical design choices of the existing reference implementations. First, these implementations depend on data being stored in central databases from which they are served. This approach results in accumulating running costs that surge dramatically with both the number of variants and the size of the cohort. Second, the metadata is stored in a set of MongoDB collections. This setup restricts the ability to perform intricate text queries across different collections in the free-to-use versions of the database.

sBeacon provides a solution that is more scalable and economical than the traditional implementation, hence catering to the cohort sizes of the future.

Value Proposition

Easy integration

  • Zero data transformations: sBeacon can directly consume variant information stored in standard VCF files, eliminating the need for database ingestion.
  • Simple infrastructure deployment: The sBeacon architecture is entirely built using Terraform templating, making deployment straightforward, cost-effective to maintain, and quick to deploy and update..

Feature complete

  • Ontoserver integration: sBeacon seamlessly integrates with Ontoserver, a vital feature for users working with standardised terminologies and ontologies.
  • Complex metadata queries: sBeacon is designed to handle queries involving multiple entities and multiple ontology filters.
  • Combined metadata and variant queries: sBeacon is capable of querying metadata and variant data simultaneously.

Fast and cost-effective solution

sBeacon ships as Infrastructure as Code (IaC) on the cloud, facilitating effortless installation. Its cost-effectiveness is enhanced by employing serverless lambdas, which have no idle compute costs and ensure the seamless scaling of resources. Moreover, sBeacon leverages other serverless solutions, including DynamoDB, Athena, and AWS Glue, to manage and execute queries on metadata.

For real-world cohorts, sBeacon manages to keep the query time to mere seconds, marking its exceptional efficiency. Additionally, it's notably more cost-effective than traditional implementations. While the latter can cost between US$100-500/month, sBeacon costs only around $16/month, even with an average of 72,000 queries per month.

Cost comparison (samples from 1000 genomes)

Additionally, sBeacon eliminates the need to transform VCF files, saving both time and computational resources. This distinctive blend of features positions sBeacon as a highly powerful, efficient, and cost-effective tool for managing genomic data.

Ingestion time comparison (samples from 1000 genomes)

Secure and private

sBeacon champions data decentralisation, ensuring heightened control over the privacy and ownership of individual genomic files, such as VCF and gVCF. This is essential for protecting sensitive data. By implementing robust authentication processes and offering granular access control, it ensures that only authorised users gain access to the application.

Moreover, sBeacon's unparalleled efficiency in data ingestion facilitates continuous data intake and removal. This dynamic capability guarantees rigorous governance of data access in line with the wishes of data owners.

Additionally, sBeacon employs a streamlined authentication mechanism, which deployers can easily activate using the Terraform configuration. Such a feature ensures optimal security measures for production systems.

Extensible

sBeacon is engineered for flexibility and adaptability, allowing for the seamless integration of new functionalities and enhancements. Such adaptability ensures that sBeacon evolves in tandem with the changing needs of its users.

Furthermore, sBeacon lowers the entry barrier for data sharing by offering an affordable solution suitable for enterprises and institutions of all sizes, without compromising on performance.

References

A. Wickramarachchi, B. Hosking, Y. Jain et al. Scalable genomic data exchange and analytics with sBeacon. Nature Biotechnology 2023. DOI: 10.1038/s41587-023-01972-9.

Pricing

DIY
Free
  • Open Source access
  • Full functionality
  • Documentation access
GitHub
SaaS
Coming Soon
  • MarketPlace service
  • Full functionality
  • Managed security
  • Managed updates
Enquire
R&D Support
On request
  • Custom implementation
  • Bespoke solutions
  • Product workshops
  • Managed updates
  • Full support
Enquire

Do business with us

Let us be your innovation catalyst by helping you understand the health space, solve your pain-points and  innovate to keep you ahead. Read more


Subscribe

* indicates required
Interests