Meet the 2012 macarthur fellowes paper

Big Data Analytics for Genomic Medicine

The successes of targeted drugs with companion predictive biomarkers and the technological advances in gene sequencing have generated. The new potato annotations are available with this paper. To meet the challenge of timely interpretation of structure, function and Doig, Kenneth D; Ellul, Jason; Fellowes, Andrew; Thompson, Ella R; Ryland, . Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W. In this paper, we review the challenges of manipulating large-scale .. established infrastructures to meet their bioinformatics requirements. . ; – doi: / . Wong S.Q., Fellowes A., Doig K., Ellul J., Bosma T.J., Irwin D., Vedururu R., Tan A.Y., Weiss J., Chan K.S., et al.

In this review, we describe how one type of Big Data, genomic data, is applied to improve clinical research and healthcare. We give an overview of the challenges in processing genomic data and EHRs, provide possible solutions to overcome these challenges using approaches that ensure the safety of genomic data, and present a Big Data solution for identifying clinically actionable variants in sequence data.

We also discuss the requirement for the efficient integration of genomic information into EHRs. Challenges of Handling Genomic and Clinical Data 2. Challenges in Manipulating Genomic Data Although more than Mendelian disorders have been studied at the genetic level so far, we still do not have a clear understanding of the majority of their roles in health and diseases [ 25 ].

While the development of NGS technologies has made it increasingly easier to sequence a whole genome or exome, there continue to be considerable challenges in terms of handling, analyzing, and interpreting the genomic information generated by NGS.

The actual size of a BAM file is determined by the coverage the average number of times each base is read; read depth and read length in a sequencing experiment. The approximate file sizes of different NGS data formats and running times of generating those different format files are described in Figure 2.

Big Data infrastructures can greatly facilitate the analysis of these data. A number of toolsets for data compression, cloud computing, variant prioritization, copy number variation CNV detection, data sharing, and phenotypes on exome sequencing data have been reviewed by Lelieveld et al. However, researchers are currently facing substantial challenges in storing, managing, manipulating, analyzing, and interpreting WGS data for moderate numbers of individuals if they need to take into account of data quality information stored in BAM files.

For example, the standard panel for screening cystic fibrosis as recommended by the American College of Medical Genetics is composed of only 23 mutations in cystic fibrosis transmembrane conductance regulators [ 3 ].

Even after accounting for all the mutations reported for the disease up tothe number of mutations is still under 2, [ 4 ].

sequence annotation pipeline: Topics by

The stark contrast between the mutations present and the mutations that physicians could respond to motivates a re-structure of the bioinformatics workflow that concentrates variants that lead to known clinical consequences. The current paradigm for clinical variant characterization based on next generation sequencing was designed for discovering new variants [ 6 ] unknown to the scientific community.

It involves aligning every read to the human reference assembly, discovering mutations at every position in the reference, and providing functional annotations through existing algorithms [ 7 ].

In addition, they may produce suboptimal results at sites that harbor actionable mutations, partially because of the criteria implemented for controlling global false positives. The increasing use of next generation sequencing for genomic testing [ 9 ] warrants the development of a new set of tools that operate under a paradigm that emphasizes characterization on important clinical targets.

ClinSeK: a targeted variant characterization framework for clinical sequencing

To answer the demand, we have designed and implemented ClinSeK, a bioinformatics tool that focuses computational power on clinically relevant sites while avoiding investigating mutations that are non-actionable, hence ameliorating the big-data challenge. The tool adapts the entire arsenal of variant characterization techniques used in a variety of applications to the targeted paradigm. Compared with existing tools designed for each separate application, ClinSeK achieves tremendous reduction in computational cost with higher sensitivity and comparable accuracy in the target zone.

Public Economist Raj Chetty: 2012 MacArthur Fellow - MacArthur Foundation

ClinSeK provides software-level target capture to supplement existing sequencing-level techniques [ 10 ]. The computational cost of ClinSeK depends on the number of potential clinical targets to be assessed. The total number of mutations that are likely to be associated with all the known clinical phenotypes in ClinVar [ 14 ] is on the order of79, as accessed on 30 April Categorized by pathological conditions, many rare yet well-characterized genetic disorders are associated with a handful of mutations [ 35 ].