module: genome_integration.causal_inference¶
This class is used to do a ‘simple’ inverse variance estimate on independent effects.
-
class
genome_integration.causal_inference.mendelian_randomization.MendelianRandomization¶ This class is a base class of most summary statistic based MR analyses.
- estimation_done: bool
- boolean indicating if an estimation was done.
- estimation_data: empty list
- initialized as an empty list, but as estimates are added, will contain tuples of length 2 with a beta and a standard error.
- estimation_snps = : empty list
- initialized as an empty list, but will be filled with snp information as estimates are added. can be left empty.
- outcome_tuples: empty list
- initialized as an empty list, can be filled with the summary statistics of the outcome in the same way as estimation data
- exposure_tuples: empty list
- initialized as an empty list, can be filled with the summary statistics of the exposure in the same way as estimation data
- q_test_indices_remaining: list of ints
- initialized as empty, but as the q test removes estimates, it shows which estimates are removed.
- _ivw_intermediate_top: list of floats
- this is an internal variable that should not be used.
- _ivw_intermediate_bottom list of floats
- this is an internal variable that should not be used.
- add_estimate(self, beta_se_tuple, variant_name, pos, chr)
- adds a single estimate for MR estimation. beta_se_tuple is a tuple of length 2, containing the beta estimate and se(beta). The variant name, pos and chr are information that denote where the variant comes from.
- do_ivw_estimation(self)
- Does the IVW estimation based on the single estimates in estimation data.
- do_ivw_estimation_on_estimate_vector(self, estimation_vec)
- Does an IVW estimate on an estimate vector, without storing any information in the class. estimation_vec is a list of tuples with beta and se per variant that’s being used.
- get_ivw_estimates(self)
- Deprecated, mirrors the do_ivw_estimation() method.
- do_smr_estimate(self, exposure_tuple, outcome_tuple)
- Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
- do_smr_estimate(self, exposure_tuple, outcome_tuple)
- Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
- do_and_add_smr_estimation(self, exposure_tuple, outcome_tuple, variant_name, pos, chr)
- Adds an SMR estimate from the exposure and outcome tuples (beta, se).
- do_chochrans_q_meta_analysis(self, p_value_threshold):
- Does a chochrans q meta analysis of the results at a certain p value threhold.
- do_egger_regression(self):
- Does the original Egger regression with a single variance term. (weighted by the outcome se)
- do_egger_regression_single_variance_term(self):
- Does egger regression with a single variance term. (weighted only by the outcome se)
- do_egger_regression_two_variance_term(self)
- Does egger regression with a double variance term. (weighted by both the outcome and exposure se)
- do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple)
- single term MR estimate. In contrast to SMR estimate, uses a single variance term.
- do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple)
- single term MR estimate. In contrast to SMR estimate, uses a single variance term. adds it to the list of estimates.
- mr_presso(self, n_sims=1000, significance_thresh=0.05):
- Does MR-PRESSO, number of sims is the number of permutations done, and the significance term is the term at which estimates are rejected.
- do_lda_mr_egger(self, ld_matrix)
- Does LDA-MR-Egger. requires an LD matrix (peason correlation), which is ordered (rows and columns) by the estimates that were made.
- do_lda_mr_egger_on_estimates(self, list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix,
- write_out=False):
Does LDA MR Egger Requires the list of (beta, se) tuples from the outcome and exposure and the pearson_ld_matrix. write_out is for debug purposes.
-
add_estimate(beta_se_tuple, variant_name, pos, chr)¶ Adds an estimate to the class.
Parameters: - beta_se_tuple – Estimation data.
- variant_name – name of variant for external reference
- pos – position of the variant for external reference
- chr – chromosome of the variant for external reference
Returns:
-
do_and_add_single_term_mr_estimation(exposure_tuple, outcome_tuple)¶ Does a single term variance estimate of a variant, and adds it to the class for further analysis.
Parameters: - exposure_tuple – beta, se tuple of the exposure
- outcome_tuple – beta, se tuple of the outcome
Returns: None
-
do_and_add_smr_estimation(exposure_tuple, outcome_tuple, variant_name=None, pos=None, chr=None)¶ Does an SMR estimation (two variance terms included) and adds it to the class.
Parameters: - exposure_tuple – beta,se tuple of the exposure summary statistics
- outcome_tuple – beta, se tuple of the outcome summary statistics
- variant_name – name of the variant for external reference
- pos – position of the variant for external reference
- chr – chromosome of the variant for external reference
Returns:
-
do_chochrans_q_meta_analysis(p_value_threshold)¶ Does a chochrans Q meta analysis on the interally present estimates, and estimates a combined causal effect.
Parameters: p_value_threshold – p value threshold when to reject the null hypothesis that the estimate is drawn from the same distribution.
Returns: tuple of floats: beta, se, wald_p_val of the estimate after chochran’s Q meta analysis.
-
do_egger_regression()¶ Does egger regression based on single variance term estimates.
Returns: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope
-
do_egger_regression_single_variance_term()¶ Does egger regression based on single variance term estimates.
Returns: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.
-
do_egger_regression_two_variance_term()¶ Does egger regression based on two variance term estimates.
Returns: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.
-
do_ivw_estimation()¶ Does IVW estimation on all the methods
Returns: tuple of floats: beta, se, wald_p_val of the estimate.
-
do_ivw_estimation_on_estimate_vector(estimation_vec, save_intermediate=False)¶ Estimates IVW on a specified estimate vector
Parameters: - estimation_vec – list of (beta,se) tuples
- save_intermediate – save intermediate results to the class.
Returns: tuple of floats: beta, se, wald_p_val of the estimate.
-
do_ivw_heterogeneity_estimation()¶ Run a heterogeneity estimation on an already done IVW estimation. :return: a p value of estimation.
-
do_ivw_heterogeneity_estimation_on_estimate_vector(estimation_vector)¶ Provides a heterogeneity estimation p value of an ivw estimation based on an estimation vector.
Parameters: estimation_vector – estimation vector of MR estimates. Returns: a p value of heterogeneity.
-
do_lda_mr_egger(ld_matrix)¶ Perform LDA MR egger on internal estimates. :param ld_matrix: pearson LD matrix. ordered by the estimates. :return: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.
-
do_lda_mr_egger_on_estimates(list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix, write_out=False)¶ This will do LDA simulate_mr egger regression as described in Barfield et al. 2018, genetic epidemiology Implemented based on their paper, and a reference implementation they provided in personal communication
<Begin email.> Hi Adriaan, Below please find the R function to implement the approach. Please let me know if you have any questions.
-Richard
X is the vector of joint eQTL effects Y is the vector of joint GWAS effects W is the inverse of the covariance of the joint GWAS effects (i.e. var(Y))
- weight.func2<-function(X,Y,W){
- bX<-cbind(1,X) bread<-solve(crossprod(bX,W)%*%bX) theEsts<-bread%*%crossprod(bX,W%*%Y) theresid<-c(Y-theEsts[1]-X*theEsts[2]) Sig.Est<-c(crossprod(theresid,W%*%theresid))/(length(X)-2) finresults<- cbind(theEsts,diag(bread)*Sig.Est) TestStat<-theEsts/sqrt(finresults[,2]) Pvals<-2*pt(abs(TestStat),df = nrow(bX)-2,lower.tail = F) return(cbind(finresults,TestStat,Pvals))
} <End email.>
Parameters: - list_of_outcome_tuples – list of beta,se tuples from the outcome
- list_of_exposure_tuples – list of beta,se tuples from the exposure
- pearson_ld_matrix – peason LD matrix in the order of the estimates.
Returns: return: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.
-
do_single_term_mr_estimate(exposure_tuple, outcome_tuple)¶ Does a single variance term MR estimate on a variant.
Parameters: - exposure_tuple – beta, se tuple of the exposure
- outcome_tuple – beta, se tuple of the outcome
Returns: beta, se and p value of the single estimate.
-
do_smr_estimate(exposure_tuple, outcome_tuple)¶ Determine SMR test effect and standard error. Identifies standard error of the estimate, based on on two variance terms
Parameters: - exposure_data – Exposure estimates which must have the methods get_beta and get_z_score
- outcome_data – Outcome estimates which must have the methods get_beta and get_z_score
Returns: tuple of smr beta and smr se of the estimate.
-
do_weighted_median_meta_analysis_on_estimate_vectors(exposure_associations, outcome_associations)¶ Performs the weighted median estimation following Web Appendix 2 from Genet Epidemiol. 2016 May; 40(4): 304–314. Published online 2016 Apr 7. doi: 10.1002/gepi.21965 PMCID: PMC4849733 PMID: 27061298 Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator Jack Bowden, 1 George Davey Smith, 1 Philip C. Haycock, 1 and Stephen Burgess
Parameters: estimation_vector – Returns: Weighted median estimator.
-
get_ivw_estimates()¶ Mirrors do_ivw_estimation
Returns: tuple of floats: beta, se, wald_p_val of the estimate.
-
mr_presso(n_sims=1000, significance_thresh=0.05)¶ Python reimplementation of MR-PRESSO.
Parameters: - n_sims – number of permutation simulations.
- significance_thresh – significance thresshold.
Returns: beta, se and p value of the estimate after the bad snps were removed. If no estimate can be made,
returns a tuple of 3* (np.nan).
-
genome_integration.causal_inference.mr_link.make_mr_link_design_matrix(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95, output_selected_variants=False)¶ Parameters: - outcome_geno – genotype matrix of the outcome
- r_sq_mat – R^2 matrix of all the genotypes of the outcome
- exposure_betas – beta estimates of the exposure instrumental variables
- causal_exposure_indices – indices of the exposure instrumental variables
- upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned
Returns: a design matrix for use in MR-link.
-
genome_integration.causal_inference.mr_link.mask_instruments_in_ld(r_sq_mat, instruments, upper_r_sq_thresh=0.99, lower_r_sq_thresh=0.1, prune_r_sq_thresh=0.95, shuffle_positions=False)¶ Masks instruments that are in LD [upper_r_sq_threshold, lower_r_sq_threshold] with instrumental variables of the exposure. As well as being in high LD (> prune_r_sq_threshold) with itself.
Parameters: - r_sq_mat – squared pearson correlation matrix of variants, shape (m x m)
- instruments – indices () of the instruments (variants) used for the exposure.
- upper_r_sq_thresh – maximum correlation from instrument threshold for r_sq_mat, float in [0,1]
- lower_r_sq_thresh – minimum from instrument threshold for r_sq_mat, float in [0,1]
- prune_r_sq_thresh – threshold from which to remove highly correlated SNPs. float in [0,1]
Returns: Returns a logical vector of length m representing variants that are in high to low LD with the
instrumental variables and in high LD with other variants.
-
genome_integration.causal_inference.mr_link.mr_link_ols(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, outcome_phenotype, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95)¶ Does MR-link solved by ordinary least squares.
Parameters: - outcome_geno – outcome genotypes
- r_sq_mat – R^2 matrix in order of genotypes of outcome geno
- exposure_betas – beta estimates of the exposure instrumental variables.
- causal_exposure_indices – indices of the exposure instrumental variables.
- outcome_phenotype – outcome phenotype vector
- upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.
Returns: beta, se and p value estimate of the MR-link estimate
-
genome_integration.causal_inference.mr_link.mr_link_ridge(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, outcome_phenotype, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95)¶ Does MR-link solved by ridge regression. Please note that the p value and se is uncorrected. so these are usually _very_ conservative. See the MR-link manuscript for details.
Parameters: - outcome_geno – outcome genotypes
- r_sq_mat – R^2 matrix in order of genotypes of outcome geno
- exposure_betas – beta estimates of the exposure instrumental variables.
- causal_exposure_indices – indices of the exposure instrumental variables.
- outcome_phenotype – outcome phenotype vector
- upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.
Returns: beta, se and p value estimate of the MR-link estimate
This iteratively selects variants that are less correlated to each other than r_sq_threshold removes SNPs that are more correlated that r_sq_threshold.
First SNP in r_sq_mat is always retained, later ones are pruned. Of note, in simulations and in real data, when permuting the ordering of r_sq_mat, we found no meaningful differences in MR-link results. So
Parameters: - r_sq_mat – squared pearson correlation matrix of variants, shape (m x m)
- r_sq_threshold – threshold for r_sq_mat, float in [0,1]
Returns: boolean array of shape (m) indicating which SNPs are lower in LD than r_sq_threshold