Input

The PCGR workflow accepts two types of input files:

  • An unannotated, single-sample VCF file (>= v4.2) with called somatic variants (SNVs/InDels)
  • A copy number segment file

PCGR can be run with either or both of the two input files present.

VCF

  • We strongly recommend that the input VCF is compressed and indexed using bgzip and tabix
  • If the input VCF contains multi-allelic sites, these will be subject to decomposition
  • Variants used for reporting should be designated as ‘PASS’ in the VCF FILTER column

IMPORTANT NOTE 1: Considering the VCF output for the numerous somatic SNV/InDel callers that have been developed, we have a experienced a general lack of uniformity and robustness for the representation of somatic variant genotype data (e.g. variant allelic depths (tumor/normal), genotype quality etc.). Variant genotype data found as INFO tags in the input VCF can be specified as optional arguments to the PCGR workflow, which in turn can be used for interactive exploration in the tumor report.

IMPORTANT NOTE 2: PCGR generates a number of VCF INFO annotation tags that is appended to the query VCF. We will therefore encourage the users to submit query VCF files that have not been subject to annotations by other means, but rather a VCF file that comes directly from variant calling. If not, there are likely to be INFO tags in the query VCF file that coincide with those produced by PCGR.

Copy number segments

The tab-separated values file with copy number aberrations MUST contain the following four columns:

  • Chromosome
  • Start
  • End
  • Segment_Mean

Here, Chromosome, Start, and End denote the chromosomal segment, and Segment_Mean denotes the log(2) ratio for a particular segment, which is a common output of somatic copy number alteration callers. Note that coordinates must be one-based (i.e. chromosomes start at 1, not 0). Below shows the initial part of a copy number segment file that is formatted correctly according to PCGR’s requirements:

Chromosome    Start   End Segment_Mean
1 3218329 3550598 0.0024
1 3552451 4593614 0.1995
1 4593663 6433129 -1.0277

CPSR report

One may feed the contents of a CPSR analysis (https://github.com/sigven/cpsr) with PCGR through the --cpsr_report option. This will result in a dedicated germline section of the PCGR report. Note that one should here input the compressed JSON file that is output by CPSR.