Wiki

Clone wiki

invar / Methods

Home / Methods / Documentation / Source / Contact / Rosenfeld Lab

Background

Circulating tumour DNA (ctDNA) is increasingly used to monitor tumour responses [1,2]. In patients with low disease burden, ctDNA detection rates are low due to the presence of few or no copies of any individual mutation in each sample [3,4]. Sensitivity may be increased by collecting larger plasma volumes, but this is not feasible in practice.

Here we demonstrate that sensitivity can be greatly enhanced by analysing a large number of mutations via sequencing. Although cancers have thousands of mutations in their genome [5], previous analyses measured only individual or up to 32 tumour-specific mutations in plasma [1,4,6,7].

We here present an algorithm designed to handle mutation lists of 100-10,000 mutated loci per patient, using custom capture panels, whole exome or shallow (low-coverage) whole genome sequencing (WES/WGS) of plasma samples. We developed an analytical method for INtegration of VAriant Reads (INVAR) that aggregates reads carrying tumour mutations, and uses a statistical model to assign confidence to error-suppressed reads based on mutation context, fragment length and tumour representation.

Workflow

INVAR requires error-suppressed BAM files, sample information, and patient-specific mutation lists as input. Pileups are performed at patient-specific loci, which will be annotated with trinucleotide context-specific error rates, and additional features such as tumour allele fraction and fragment length of mutant and wild-type reads. Based on each of these factors, the INVAR algorithm generates a score for each sample. Control samples are used to determine a threshold for INVAR score.

Overview of methods

An overview of the pipeline can be seen in Figure 1

  • Running mpileups on error suppressed bam files and combining all individual patient files into one cohort file
  • Splitting dataframe into on and off target data using the provided mutation list and calculate trinucleotide context specific error rates
  • Annotate data with additional information on the mutations, split the on target data in patient-specific and non-patient-specific data based on initial mutation list, annotate both with the previously determined
  • Generation of size information for each of the fragments used in the analysis. This is later used to weigh mutations
  • Patient specific outlier suppression is applied, which identifies noise based on knowledge of a large number of patient-specific loci
  • Generation of INVAR score for each sample utilising a variety of features highlighted in Figure 2: trinucleotide error rate, fragment length, patient specific outlier suppression, and tumour allelic fraction

Figures

Figure 1: Pipeline overview

3350331287-Flowchart.png

Integration of variant reads workflow. INVAR utilises plasma sequencing data and requires a list of patient-specific mutations, which may be derived from tumour or plasma sequencing. Filters are applied to sequencing data, then the data is split into: patient-specific (locus belonging to that patient), non-patient-specific (locus not belonging to that patient), and near-target (bases within 10 bp of all patient-specific loci). Patient-specific and non-patient-specific data are then annotated with features that influence the probability of observing a real mutation. Outlier-suppression is applied to identify mutant signal inconsistent with the overall level of patient-specific signal. Next, signal is aggregated across all loci, taking into account annotated features, to generate an INVAR score per sample. Based on non-patient-specific samples, an INVAR score threshold is determined based on ROC analysis for each cohort.

Figure 2: Schematic of INVAR algorithm

INVAR_schematic_simple.png

Integration of variant reads. To overcome sampling error, signal was aggregated across hundreds to thousands of mutations. Here we classify samples (rather than mutations) as being significantly positive for ctDNA, or non-detected. Reads from a patient’s sample that overlap loci in the patient-specific mutation list are indicated as ‘patient-specific’, whereas reads overlapping the same loci in other patients are indicated as ‘non-patient-specific’. INVAR also incorporates additional sequencing information on fragment length and tumour allelic fraction, to enhance detection.

References

1. Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer 17, 223–238 (2017).

2. Siravegna, G., Marsoni, S., Siena, S. & Bardelli, A. Integrating liquid biopsies into the management of cancer. Nat. Rev. Clin. Oncol. (2017). doi:10.1038/nrclinonc.2017.14

3. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

4. Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early stage lung cancer evolution. Nature 22364, 1–25 (2017).

5. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

6. Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat Biotechnol 34, 547–55 (2016).

7. Forshew, T. et al. Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA. Sci. Transl. Med. 4, 136ra68-136ra68 (2012).

Updated