CHANGELOG

0.9.2 - June 30th 2021

  • Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL, Disease Ontology/EFO, Open Targets Platform, UniProt KB, GENCODE
  • Software upgrades: R v4.1, Bioconductor v3.13, VEP (104) ++

Changed

  • TOML-based configuration for PCGR is abandoned, all options to PCGR are now configured through command-line parameters
    • NOTE: We recommend to turn on --show_noncoding and --vcf2maf (prevously turned on by default in TOML). For tumor-only runs, we recommend to include --exclude_dbsnp_nonsomatic and exclude_nonexonic

Added

  • Command-line options
    • Previously set in TOML file)
      • Allelic support
        • --tumor_dp_tag
        • --tumor_af_tag
        • --control_dp_tag
        • --control_af_tag
        • --call_conf_tag
      • Tumor-only options
        • --maf_onekg_eur
        • --maf_onekg_amr
        • --maf_onekg_afr
        • --maf_onekg_eas
        • --maf_onekg_sas
        • --maf_onekg_global
        • --maf_gnomad_nfe
        • --maf_gnomad_asj
        • --maf_gnomad_fin
        • --maf_gnomad_oth
        • --maf_gnomad_amr
        • --maf_gnomad_afr
        • --maf_gnomad_eas
        • --maf_gnomad_sas
        • --maf_gnomad_global
        • --exclude_pon
        • --exclude_likely_het_germline
        • --exclude_likely_hom_germline
        • --exclude_dbsnp_nonsomatic
        • --exclude_nonexonic
      • --report_theme
      • --preserved_info_tags (previously custom_tags (TOML))
      • --show_noncoding (previously list_noncoding (TOML))
      • --vcfanno_n_proc (previously n_vcfanno_proc (TOML))
      • --vep_n_forks (previously n_vep_forks (TOML))
      • --vep_pick_order
      • --vep_no_intergenic (previously vep_skip_intergenic (TOML))
      • --vcf2maf
    • New options
      • --report_nonfloating_toc (NEW) - add the TOC at the top of the HTML report, not floating at the left of the document
      • --cpsr_report (NEW) - add a dedicated section in PCGR with main germline findings from CPSR analysis - (use the gzipped JSON output from CPSR as input)
      • --vep_regulatory (NEW) - append regulatory annotations to variants (TF binding sites etc.)
      • --include_artefact_signatures (NEW) - include sequencing artefacts in the reference collection of mutational signatures (COSMIC v3.2)

Fixed

  • Bug in writing (large) report contents to JSON (issue #118)
  • Bug (typo) in merge of clinical evidence items from different sources (CIVIC + CGI) (issue #126)
  • Bug in value box for number of (high-confident) kataegis events - rmarkdown (issue #122)
  • Bug in value box for tumor purity/ploidy -rmarkdown (issue #129)

Removed

  • Command-line options
    • --conf - TOML-based configuration file

0.9.1 - November 30th 2020

  • Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL/DGIdb, Disease Ontology, Experimental Factor Ontology

Added

  • added possibility to configure algorithm for TMB calculation, optional argument tmb_algorithm - all coding variants (all_coding) or non-synonymous variants only (nonsyn)
  • R code subject to static analysis with lintr
  • Improved Conda recipe (i.e. meta.yaml) with version pinning of all package dependencies

Changed

  • Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
  • Version pinning of software dependencies in Dockerfile:
    • All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
    • Other tools/utilities and Python libraries that have been version pinned:
      • bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas

0.9.0rc - September 24th 2020

  • Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform
  • Software updates: VEP 101

Fixed

  • An extra comma was mistakenly present in the template for tier 2 variants, issue #96
  • Missing protein domain annotations for grch38, issue #116

Changed

  • All arguments to pcgr.py is now non-positional
  • Arguments to pcgr.py are divided into two groups: required and optional
  • Options allelic_support:tumor_dp_min, allelic_support:tumor_af_min, allelic_support:control_dp_min, allelic_support:control_af_max in PCGR configuration file are now optional arguments --tumor_dp_min, --tumor_af_min, --control_dp_min, –control_af_maxincpsr.py`
  • Option mutational_burden:mutational_burden in PCGR configuration file is now optional argument --estimate_tmb in pcgr.py
  • Option msi:msi in PCGR configuration file is now optional argument --estimate_msi_status in pcgr.py
  • Option mutational_signatures:mutational_signatures in PCGR configuration file is now optional argument --estimate_signatures in pcgr.py
  • Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
  • Optional argument --cna_overlap_pct in pcgr.py replaces cna:cna_overlap_pct in PCGR configuration file
  • Optional argument --logr_gain in pcgr.py replaces cna:logr_gain in PCGR configuration file
  • Optional argument --logr_homdel in pcgr.py replaces cna:logr_homdel in PCGR configuration file
  • Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
  • Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
  • Settings section of report is now divived into three:
    • Metadata - sample and sequencing assay
    • Report configuration

Added

  • Optional argument --include_trials in pcgr.py - includes a section with annotated clinical trials for the tumor type in question
  • Optional argument --assay in pcgr.py - designates type of sequencing assay
  • Optional argument --cell_line in pcgr.py - designates runs of tumor cell lines (only for display, not used to configure any analysis)
  • Optional argument --min_mutations_signatures in pcgr.py - minimum number of required mutations for mutational signature analysis with MutationalPatterns
  • Optional argument --all_reference_signatures in pcgr.py - considers all reference signatures during fitting of mutational profile to known signatures
  • Optional argument --estimate_signatures now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output
  • The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
  • All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
  • For copy number amplifications, other putative drug targets in cancer are listed in a new section
  • Detailed documentation of report contents are added to the Documentation section
  • References are updated and all provided with DOI

0.8.4 - November 18th 2019

  • Data updates: ClinVar, CIViC, CancerMine, UniProt KB
  • Software updates: VEP 98.3

0.8.3 - October 14th 2019

  • Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine
  • Software updates: VEP 98.2, vcf2tsv

Fixed

  • More improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)

Added

  • Possibility to filter evidence items by RATING in interactive data tables

Changed

  • Option target_size_mb in pcgr.py replaces target_size_mb in configuration file, more convenient in terms of configuring runs
  • Option tumor_type in pcgr.py replaces tumor_type in configuration file

0.8.2 - Sep 29th 2019

  • Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB
  • Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3

Fixed

  • Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
  • Silent variants that coincide with biomarkers reported at codon level are ignored
  • Distinction between clinical evidence items of different origins (somatic + germline)
  • Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
  • Bug in UpSetPlot for cases where filtering produce less than two intersecting sets

Added

  • New field ‘mane’ as criteria for pick order in configuration file (VEP)
  • Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
  • Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
  • Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration

0.8.1 - May 22nd 2019

Added

  • Cancer_NOS.toml as configuration file for unspecified tumor types

0.8.0 - May 20th 2019

Fixed

  • Bug in value box for Tier 2 variants (new line carriage) Issue #73

Added

  • Upgraded VEP to v96
    • Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
    • Added option to configure pick-order for choice of primary transcript in configuration file
  • Pre-made configuration files for each tumor type in conf folder
  • Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
  • Added possibility to input estimates of tumor purity and ploidy
    • shown as value boxes in Main results
  • Tumor mutational burden is now compared with the distribution of TMB observed for TCGA’s cohorts (organized by primary site)
    • Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
  • Added flexibility for variant filtering in tumor-only input callsets
    • Added additional options to exclude likely germline variants (both requires the tumor VAF tag to be correctly specified in the input VCF)
      • exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
    • exclude_likely_het_germline - removes any variant with
      • an allelic fraction between 0.4 and 0.6, and
      • presence in dbSNP + gnomAD, and
      • no presence as somatic event in COSMIC/TCGA
    • Added possibility to input PANEL-OF-NORMALS VCF - this to support the many labs that have sequenced a database/pool of healthy controls. This set of variants are utilized in PCGR to improve the variant filtering when running in tumor-only mode. The PANEL-OF-NORMALS annotation work as follows:
      • all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
    • If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
  • For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
  • Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
  • Added annotation of TCGA’s ten oncogenic signaling pathways
  • Added EXONIC_STATUS annotation tag (VCF and TSV)
    • exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
  • Added CODING_STATUS annotation tag (VCF and TSV)
    • coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
  • Added SYMBOL_ENTREZ annotation tag (VCF)
    • Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
  • Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
    • Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
  • Added WINMASKER_HIT annotation tag (VCF and TSV)
    • Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
  • Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
    • Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
  • Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and TSV)
    • Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
  • Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF and TSV)
    • Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
  • Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF and TSV)
    • Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
  • Added CLINVAR_REVIEW_STATUS_STARS annotation tag
    • Rating of the ClinVar variant (0-4 stars) with respect to level of review

Changed

Removed

  • Original tier model ‘pcgr’

0.7.0 - Nov 27th 2018

Fixed

  • Bug in assignment of variants to tier1/tier2 Issue #61
  • Missing config option for maf_gnomad_asj in TOML file (also setting operator to <=) Issue #60
  • Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
  • vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
  • Bug in message showing too few variants for MSI prediction, Issue #55
  • Bug in appending of custom VCF tags
    • Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
  • Bug in SCNA value box display for multiple copy number hits (Issue #47)
  • Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
  • Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
  • Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
  • Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
  • Removed ‘COSM’ prefix in COSMIC mutation links
  • Bug in retrieval of splice site predictions from dbscSNV

Added

  • Possibility to run PCGR in a non-Docker environment (e.g. using the –no-docker option). Thanks to an excellent contribution by Vlad Saveliev, Issue #35
    • Added possibility to add docker user-id
  • Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
  • Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
  • Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
  • Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
  • Individual entries/columns for variant effect predictions:
    • Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
    • Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
  • Upgraded samtools to v1.9 (makes vcf2maf work properly)
  • Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
  • Added for future implementation:
    • SeqKat + karyoploteR for exploration of kataegis/hypermutation
    • CELLector - genomics-guided selection of cancer cell lines
  • Upgraded VEP to v94

Changed

  • Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
  • Moved from TSGene 2.0 to CancerMine for annotation of tumor suppressor genes and proto-oncogenes
    • A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine

0.6.2.1 - May 14th 2018

Fixed

  • Bug in copy number annotation (broad/focal)

0.6.2 - May 9th 2018

Fixed

  • Bug in copy number segment display (missing variable initalization, Issue #34))
  • Typo in gnomAD filter statistic (fraction, Issue #31)
  • Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
  • Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
  • Removed ‘Noncoding mutations’ section when no input VCF is present
  • Bug in annotation of copy number event type (focal/broad)
  • Bug in copy number annotation (missing protein-coding transcripts)
  • Updated MSI prediction (variable importance, performance measures)

Added

  • Genome assembly is appended to every output file
  • Issue warning for copy number segment that goes beyond chromosomal lengths of specified assembly (segments will be skipped)
  • Added missing subtypes for ‘Skin_Cancer_NOS’ in the cancer phenotype dataset

0.6.1 - May 2nd 2018

Fixed

  • Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
  • Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
  • Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
  • Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)

0.6.0 - April 25th 2018

Added

  • New argument in pcgr.py
    • assembly (grch37/grch38)
  • New option in pcgr.py
    • –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
  • New sections in HTML report
    • Settings and annotation sources - now also listing key PCGR configuration settings
    • Main findings - Six value boxes indicating the main findings of clinical relevance
  • New configuration options
    • [tier_model](string) - choice between pcgr_acmg and pcgr
    • [mutational_burden] - set TMB tertile limits
      • tmb_low_limit (float)
      • tmb_intermediate_limit (float)
    • [tumor_type] - choose between 34 tumor types/classes:
      • Adrenal_Gland_Cancer_NOS (logical)
      • Ampullary_Carcinoma_NOS (logical)
      • Biliary_Tract_Cancer_NOS (logical)
      • Bladder_Urinary_Tract_Cancer_NOS (logical)
      • Blood_Cancer_NOS (logical)
      • Bone_Cancer_NOS (logical)
      • Breast_Cancer_NOS (logical)
      • CNS_Brain_Cancer_NOS (logical)
      • Colorectal_Cancer_NOS (logical)
      • Cervical_Cancer_NOS (logical)
      • Esophageal_Stomach_Cancer_NOS (logical)
      • Head_And_Neck_Cancer_NOS (logical)
      • Hereditary_Cancer_NOS (logical)
      • Kidney_Cancer_NOS (logical)
      • Leukemia_NOS (logical)
      • Liver_Cancer_NOS (logical)
      • Lung_Cancer_NOS (logical)
      • Lymphoma_Hodgkin_NOS (logical)
      • Lymphoma_Non_Hodgkin_NOS (logical)
      • Ovarian_Fallopian_Tube_Cancer_NOS (logical)
      • Pancreatic_Cancer_NOS (logical)
      • Penile_Cancer_NOS (logical)
      • Peripheral_Nervous_System_Cancer_NOS (logical)
      • Peritoneal_Cancer_NOS (logical)
      • Pleural_Cancer_NOS (logical)
      • Prostate_Cancer_NOS (logical)
      • Skin_Cancer_NOS (logical)
      • Soft_Tissue_Cancer_NOS (logical)
      • Stomach_Cancer_NOS (logical)
      • Testicular_Cancer_NOS (logical)
      • Thymic_Cancer_NOS (logical)
      • Thyroid_Cancer_NOS (logical)
      • Uterine_Cancer_NOS (logical)
      • Vulvar_Vaginal_Cancer_NOS (logical)
    • [mutational_signatures]
      • mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
    • [cna]
      • transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
    • [allelic_support]
      • If input VCF has correctly formatted depth/allelic fraction as INFO tags, users can add thresholds on depth/support that are applied prior to report generation
        • tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
        • tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
        • normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
        • normal_af_max (float) - maximum allelic fraction for variant in normal sample
    • [visual]
      • report_theme (string) - visual theme of report (Bootstrap)
    • [other]
      • vcf_validation (logical) - keep/skip VCF validation by vcf-validator
  • New output file - JSON output of HTML report content
  • New INFO tags of PCGR-annotated VCF
    • CANCER_PREDISPOSITION
    • PFAM_DOMAIN
    • TCGA_FREQUENCY
    • TCGA_PANCANCER_COUNT
    • ICGC_PCAWG_OCCURRENCE
    • ICGC_PCAWG_AFFECTED_DONORS
    • CLINVAR_MEDGEN_CUI
  • New column entries in annotated SNV/InDel TSV file:
    • CANCER_PREDISPOSITION
    • ICGC_PCAWG_OCCURRENCE
    • TCGA_FREQUENCY
  • New column in CNA output
    • TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
    • MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment

Removed

  • Elements of databundle (now annotated directly through VEP):
    • dbsnp
    • gnomad/exac
    • 1000G project
  • INFO tags of PCGR-annotated VCF
    • DBSNPBUILDID
    • DBSNP_VALIDATION
    • DBSNP_SUBMISSIONS
    • DBSNP_MAPPINGSTATUS
    • GWAS_CATALOG_PMID
    • GWAS_CATALOG_TRAIT_URI
    • DOCM_DISEASE
  • Output files
    • TSV files with mutational signature results and biomarkers (i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and sample_id.pcgr.mutational_signatures.tsv)
      • Data can still be retrieved - now from the JSON dump
    • MAF file
      • The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release

Changed

  • HTML report sections
    • Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
    • Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
    • Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)