Difference between revisions of "PGx in Estonia"
Farmakorakel (talk | contribs) |
Farmakorakel (talk | contribs) (→Challenges and solutions) |
||
(24 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The Estonian Genome Centre at the University of Tartu has done a considerable job with [https://doi.org/10.1101/356204 Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations]. | The Estonian Genome Centre at the University of Tartu has done a considerable job with [https://doi.org/10.1101/356204 Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations]. | ||
− | + | ==Bioinformatic pipelines== | |
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! Technology !! Methods !! Comments | ! Technology !! Methods !! Comments | ||
|- | |- | ||
− | | High density microarrays || HumanOmniExpress beadchip (OMNI, 8132 patients) and Global Screening Array (GSA, Illumina, 33157 patients), GenomeStudio (Illumina, genotyping, filtering for GSA), PLINK (filtering for all), zCall (genotyping rare variants for GSA) || 1308 of these patients were also Whole genome sequenced | + | | High density microarrays || HumanOmniExpress beadchip (OMNI, 8132 patients) and Global Screening Array (GSA, Illumina, 33157 patients), GenomeStudio (Illumina, genotyping, filtering for GSA), PLINK (filtering for all), [https://www.ncbi.nlm.nih.gov/pubmed/22843986 zCall] (genotyping rare variants for GSA), Eagle2 (phasing), Beagle (impuation, population specific imputation panel from WGS) || 1308 of these patients were also Whole genome sequenced |
|- | |- | ||
− | | Whole genome sequencing || TruSeq PCR-free prep, Illumina HiSeq X (150bp paired-end, 30x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition) || Quality filtering parameters are given in the article. The WGS samples (with some modifications) were also merged into a reference panel used for imputation (total 2279 Estonians and 1856 Finns) | + | | Whole genome sequencing || TruSeq PCR-free prep, Illumina HiSeq X (150bp paired-end, 30x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition), Genome STRiP (CNV calls for CYP2D6, 2269 patients), Astrolabe (allele matching for CYP2D6, for comparison) || Quality filtering parameters are given in the article. The WGS samples (with some modifications) were also merged into a reference panel used for imputation (total 2279 Estonians and 1856 Finns). Cf. [https://www.nature.com/articles/ejhg201751 Mitt ''et al.''] |
|- | |- | ||
− | | Whole exome sequencing || Agilent SureSelect Human All Exon V5+UTRs target capture kit, HiSeq2500 (67x mean coverage) || | + | | Whole exome sequencing || Agilent SureSelect Human All Exon V5+UTRs target capture kit, HiSeq2500 (67x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition) || |
|} | |} | ||
− | + | ==Challenges and solutions== | |
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
|- | |- | ||
! [[NGS|Challenge]] !! Solution !! Comments | ! [[NGS|Challenge]] !! Solution !! Comments | ||
|- | |- | ||
− | | | + | | [[Allele definition]] || Pruning of allele definitions (removing variants from allele definitions (i.e. only keeping variants that destroys the protein), removing [[Unknown function|alleles with unknown function]]) || The allele pruning also makes it more likely that patients have normal phenotype (instead of unknown phenotype), removing most sources to [[Unknown function|alleles with unknown function]] |
|- | |- | ||
− | | | + | | [[NGS|HLA-typing]] || [https://www.ncbi.nlm.nih.gov/pubmed/23762245 SNP2HLA] tool (WGS only) || A review of HLA-typing methods [https://www.ncbi.nlm.nih.gov/pubmed/27802932 from Bauer ''et al.''] does not mention this tool, but SNP2HLA is provided by the Broad Institute, so it should be good. |
|- | |- | ||
− | | | + | | [[Allele definition|Multiple allele matches]] || Made hierarchy of alleles based on the biochemical function (No function > Decreased Function > Other functional statuses) || Probably this can be seen as a variant of the best solution to the [[Unknown function|unknown function problem]]: Look for the most serious consequence, and if no allele with serious consequence was found, assume Normal function. In case there were more than one star allele match per haplotype, they matched all possible star allele diplotypes, and picked the diplotype with the most serious clinical consequence |
|- | |- | ||
− | | | + | | Haplotype calling || Haplotype estimation for WGS was performed, but it is unclear which method was used. Probably the methodology is similar to that used in [https://www.nature.com/articles/ejhg201751 ''Mitt et al.''], in which case they used SHAPEIT2. Otherwise Eagle2 (as for microarray data) which is 6 times faster. || In general, the difference between haplotyping and PGx allele matching it not clear (maybe right to say that PGx allele matching is a subset of general haplotyping?). |
+ | |- | ||
+ | | CYP2D6 calling || Combination of Genome STRiP and normal allele matching (favorable comparison to Astrolabe used by PharmCAT) || Did not understand exactly how they did it (maybe check out reference by [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5292679/ ''Gaedigk et al.'']) | ||
|} | |} | ||
+ | |||
+ | ==Take home messages== | ||
+ | * Haplotype calling essential | ||
+ | * Prefiltering (pruning) of the allele definition tables provided by PharmGKB | ||
+ | * Rare variants (< 1% minor allele frequency) account for 89% of all (different kinds of) deleterious mutations (affect 30-40% of patients with non-normal allele function according to [[NGS|''Lauschke et al.'']]) | ||
+ | * Rare variants should only be used for research | ||
+ | * Multiple star alleles are for some genes expected on same haplotype. Suggestion: look for the functional effect of variants within star alleles instead of looking for star alleles, making decision trees that prioritize variants | ||
+ | * WES is not good enough for PGx, unless adding customized probes (which is generally more expensive than a pure microarray approach) | ||
+ | * Mircoarrays with impuation of unknown variants is cost-effective approach to PGx | ||
+ | * WGS has similar quality as microarrays. In addition WGS allows for HLA-calling and finds additional variants that are as yet not actionable |
Latest revision as of 08:47, 27 August 2018
The Estonian Genome Centre at the University of Tartu has done a considerable job with Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations.
Bioinformatic pipelines
Technology | Methods | Comments |
---|---|---|
High density microarrays | HumanOmniExpress beadchip (OMNI, 8132 patients) and Global Screening Array (GSA, Illumina, 33157 patients), GenomeStudio (Illumina, genotyping, filtering for GSA), PLINK (filtering for all), zCall (genotyping rare variants for GSA), Eagle2 (phasing), Beagle (impuation, population specific imputation panel from WGS) | 1308 of these patients were also Whole genome sequenced |
Whole genome sequencing | TruSeq PCR-free prep, Illumina HiSeq X (150bp paired-end, 30x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition), Genome STRiP (CNV calls for CYP2D6, 2269 patients), Astrolabe (allele matching for CYP2D6, for comparison) | Quality filtering parameters are given in the article. The WGS samples (with some modifications) were also merged into a reference panel used for imputation (total 2279 Estonians and 1856 Finns). Cf. Mitt et al. |
Whole exome sequencing | Agilent SureSelect Human All Exon V5+UTRs target capture kit, HiSeq2500 (67x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition) |
Challenges and solutions
Challenge | Solution | Comments |
---|---|---|
Allele definition | Pruning of allele definitions (removing variants from allele definitions (i.e. only keeping variants that destroys the protein), removing alleles with unknown function) | The allele pruning also makes it more likely that patients have normal phenotype (instead of unknown phenotype), removing most sources to alleles with unknown function |
HLA-typing | SNP2HLA tool (WGS only) | A review of HLA-typing methods from Bauer et al. does not mention this tool, but SNP2HLA is provided by the Broad Institute, so it should be good. |
Multiple allele matches | Made hierarchy of alleles based on the biochemical function (No function > Decreased Function > Other functional statuses) | Probably this can be seen as a variant of the best solution to the unknown function problem: Look for the most serious consequence, and if no allele with serious consequence was found, assume Normal function. In case there were more than one star allele match per haplotype, they matched all possible star allele diplotypes, and picked the diplotype with the most serious clinical consequence |
Haplotype calling | Haplotype estimation for WGS was performed, but it is unclear which method was used. Probably the methodology is similar to that used in Mitt et al., in which case they used SHAPEIT2. Otherwise Eagle2 (as for microarray data) which is 6 times faster. | In general, the difference between haplotyping and PGx allele matching it not clear (maybe right to say that PGx allele matching is a subset of general haplotyping?). |
CYP2D6 calling | Combination of Genome STRiP and normal allele matching (favorable comparison to Astrolabe used by PharmCAT) | Did not understand exactly how they did it (maybe check out reference by Gaedigk et al.) |
Take home messages
- Haplotype calling essential
- Prefiltering (pruning) of the allele definition tables provided by PharmGKB
- Rare variants (< 1% minor allele frequency) account for 89% of all (different kinds of) deleterious mutations (affect 30-40% of patients with non-normal allele function according to Lauschke et al.)
- Rare variants should only be used for research
- Multiple star alleles are for some genes expected on same haplotype. Suggestion: look for the functional effect of variants within star alleles instead of looking for star alleles, making decision trees that prioritize variants
- WES is not good enough for PGx, unless adding customized probes (which is generally more expensive than a pure microarray approach)
- Mircoarrays with impuation of unknown variants is cost-effective approach to PGx
- WGS has similar quality as microarrays. In addition WGS allows for HLA-calling and finds additional variants that are as yet not actionable