module: genome_integration.causal_inference

This class is used to do a ‘simple’ inverse variance estimate on independent effects.

class genome_integration.causal_inference.mendelian_randomization.MendelianRandomization

This class is a base class of most summary statistic based MR analyses.

estimation_done: bool
boolean indicating if an estimation was done.
estimation_data: empty list
initialized as an empty list, but as estimates are added, will contain tuples of length 2 with a beta and a standard error.
estimation_snps = : empty list
initialized as an empty list, but will be filled with snp information as estimates are added. can be left empty.
outcome_tuples: empty list
initialized as an empty list, can be filled with the summary statistics of the outcome in the same way as estimation data
exposure_tuples: empty list
initialized as an empty list, can be filled with the summary statistics of the exposure in the same way as estimation data
q_test_indices_remaining: list of ints
initialized as empty, but as the q test removes estimates, it shows which estimates are removed.
_ivw_intermediate_top: list of floats
this is an internal variable that should not be used.
_ivw_intermediate_bottom list of floats
this is an internal variable that should not be used.
add_estimate(self, beta_se_tuple, variant_name, pos, chr)
adds a single estimate for MR estimation. beta_se_tuple is a tuple of length 2, containing the beta estimate and se(beta). The variant name, pos and chr are information that denote where the variant comes from.
do_ivw_estimation(self)
Does the IVW estimation based on the single estimates in estimation data.
do_ivw_estimation_on_estimate_vector(self, estimation_vec)
Does an IVW estimate on an estimate vector, without storing any information in the class. estimation_vec is a list of tuples with beta and se per variant that’s being used.
get_ivw_estimates(self)
Deprecated, mirrors the do_ivw_estimation() method.
do_smr_estimate(self, exposure_tuple, outcome_tuple)
Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
do_smr_estimate(self, exposure_tuple, outcome_tuple)
Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
do_and_add_smr_estimation(self, exposure_tuple, outcome_tuple, variant_name, pos, chr)
Adds an SMR estimate from the exposure and outcome tuples (beta, se).
do_chochrans_q_meta_analysis(self, p_value_threshold):
Does a chochrans q meta analysis of the results at a certain p value threhold.
do_egger_regression(self):
Does the original Egger regression with a single variance term. (weighted by the outcome se)
do_egger_regression_single_variance_term(self):
Does egger regression with a single variance term. (weighted only by the outcome se)
do_egger_regression_two_variance_term(self)
Does egger regression with a double variance term. (weighted by both the outcome and exposure se)
do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple)
single term MR estimate. In contrast to SMR estimate, uses a single variance term.
do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple)
single term MR estimate. In contrast to SMR estimate, uses a single variance term. adds it to the list of estimates.
mr_presso(self, n_sims=1000, significance_thresh=0.05):
Does MR-PRESSO, number of sims is the number of permutations done, and the significance term is the term at which estimates are rejected.
do_lda_mr_egger(self, ld_matrix)
Does LDA-MR-Egger. requires an LD matrix (peason correlation), which is ordered (rows and columns) by the estimates that were made.
do_lda_mr_egger_on_estimates(self, list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix,
write_out=False):

Does LDA MR Egger Requires the list of (beta, se) tuples from the outcome and exposure and the pearson_ld_matrix. write_out is for debug purposes.

add_estimate(beta_se_tuple, variant_name, pos, chr)

Adds an estimate to the class.

Parameters:
  • beta_se_tuple – Estimation data.
  • variant_name – name of variant for external reference
  • pos – position of the variant for external reference
  • chr – chromosome of the variant for external reference
Returns:

do_and_add_single_term_mr_estimation(exposure_tuple, outcome_tuple)

Does a single term variance estimate of a variant, and adds it to the class for further analysis.

Parameters:
  • exposure_tuple – beta, se tuple of the exposure
  • outcome_tuple – beta, se tuple of the outcome
Returns:

None

do_and_add_smr_estimation(exposure_tuple, outcome_tuple, variant_name=None, pos=None, chr=None)

Does an SMR estimation (two variance terms included) and adds it to the class.

Parameters:
  • exposure_tuple – beta,se tuple of the exposure summary statistics
  • outcome_tuple – beta, se tuple of the outcome summary statistics
  • variant_name – name of the variant for external reference
  • pos – position of the variant for external reference
  • chr – chromosome of the variant for external reference
Returns:

do_chochrans_q_meta_analysis(p_value_threshold)

Does a chochrans Q meta analysis on the interally present estimates, and estimates a combined causal effect.

Parameters:p_value_threshold – p value threshold when to reject the null hypothesis that the estimate is drawn from

the same distribution.

Returns:tuple of floats: beta, se, wald_p_val of the estimate after chochran’s Q meta analysis.
do_egger_regression()

Does egger regression based on single variance term estimates.

Returns:list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope

do_egger_regression_single_variance_term()

Does egger regression based on single variance term estimates.

Returns:list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_egger_regression_two_variance_term()

Does egger regression based on two variance term estimates.

Returns:list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_ivw_estimation()

Does IVW estimation on all the methods

Returns:tuple of floats: beta, se, wald_p_val of the estimate.
do_ivw_estimation_on_estimate_vector(estimation_vec, save_intermediate=False)

Estimates IVW on a specified estimate vector

Parameters:
  • estimation_vec – list of (beta,se) tuples
  • save_intermediate – save intermediate results to the class.
Returns:

tuple of floats: beta, se, wald_p_val of the estimate.

do_ivw_heterogeneity_estimation()

Run a heterogeneity estimation on an already done IVW estimation. :return: a p value of estimation.

do_ivw_heterogeneity_estimation_on_estimate_vector(estimation_vector)

Provides a heterogeneity estimation p value of an ivw estimation based on an estimation vector.

Parameters:estimation_vector – estimation vector of MR estimates.
Returns:a p value of heterogeneity.
do_lda_mr_egger(ld_matrix)

Perform LDA MR egger on internal estimates. :param ld_matrix: pearson LD matrix. ordered by the estimates. :return: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.

do_lda_mr_egger_on_estimates(list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix, write_out=False)

This will do LDA simulate_mr egger regression as described in Barfield et al. 2018, genetic epidemiology Implemented based on their paper, and a reference implementation they provided in personal communication

<Begin email.> Hi Adriaan, Below please find the R function to implement the approach. Please let me know if you have any questions.

-Richard

X is the vector of joint eQTL effects Y is the vector of joint GWAS effects W is the inverse of the covariance of the joint GWAS effects (i.e. var(Y))

weight.func2<-function(X,Y,W){
bX<-cbind(1,X) bread<-solve(crossprod(bX,W)%*%bX) theEsts<-bread%*%crossprod(bX,W%*%Y) theresid<-c(Y-theEsts[1]-X*theEsts[2]) Sig.Est<-c(crossprod(theresid,W%*%theresid))/(length(X)-2) finresults<- cbind(theEsts,diag(bread)*Sig.Est) TestStat<-theEsts/sqrt(finresults[,2]) Pvals<-2*pt(abs(TestStat),df = nrow(bX)-2,lower.tail = F) return(cbind(finresults,TestStat,Pvals))

} <End email.>

Parameters:
  • list_of_outcome_tuples – list of beta,se tuples from the outcome
  • list_of_exposure_tuples – list of beta,se tuples from the exposure
  • pearson_ld_matrix – peason LD matrix in the order of the estimates.
Returns:

return:list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_single_term_mr_estimate(exposure_tuple, outcome_tuple)

Does a single variance term MR estimate on a variant.

Parameters:
  • exposure_tuple – beta, se tuple of the exposure
  • outcome_tuple – beta, se tuple of the outcome
Returns:

beta, se and p value of the single estimate.

do_smr_estimate(exposure_tuple, outcome_tuple)

Determine SMR test effect and standard error. Identifies standard error of the estimate, based on on two variance terms

Parameters:
  • exposure_data – Exposure estimates which must have the methods get_beta and get_z_score
  • outcome_data – Outcome estimates which must have the methods get_beta and get_z_score
Returns:

tuple of smr beta and smr se of the estimate.

do_weighted_median_meta_analysis_on_estimate_vectors(exposure_associations, outcome_associations)

Performs the weighted median estimation following Web Appendix 2 from Genet Epidemiol. 2016 May; 40(4): 304–314. Published online 2016 Apr 7. doi: 10.1002/gepi.21965 PMCID: PMC4849733 PMID: 27061298 Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator Jack Bowden, 1 George Davey Smith, 1 Philip C. Haycock, 1 and Stephen Burgess

Parameters:estimation_vector
Returns:Weighted median estimator.
get_ivw_estimates()

Mirrors do_ivw_estimation

Returns:tuple of floats: beta, se, wald_p_val of the estimate.
mr_presso(n_sims=1000, significance_thresh=0.05)

Python reimplementation of MR-PRESSO.

Parameters:
  • n_sims – number of permutation simulations.
  • significance_thresh – significance thresshold.
Returns:

beta, se and p value of the estimate after the bad snps were removed. If no estimate can be made,

returns a tuple of 3* (np.nan).

Parameters:
  • outcome_geno – genotype matrix of the outcome
  • r_sq_mat – R^2 matrix of all the genotypes of the outcome
  • exposure_betas – beta estimates of the exposure instrumental variables
  • causal_exposure_indices – indices of the exposure instrumental variables
  • upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned
Returns:

a design matrix for use in MR-link.

genome_integration.causal_inference.mr_link.mask_instruments_in_ld(r_sq_mat, instruments, upper_r_sq_thresh=0.99, lower_r_sq_thresh=0.1, prune_r_sq_thresh=0.95, shuffle_positions=False)

Masks instruments that are in LD [upper_r_sq_threshold, lower_r_sq_threshold] with instrumental variables of the exposure. As well as being in high LD (> prune_r_sq_threshold) with itself.

Parameters:
  • r_sq_mat – squared pearson correlation matrix of variants, shape (m x m)
  • instruments – indices () of the instruments (variants) used for the exposure.
  • upper_r_sq_thresh – maximum correlation from instrument threshold for r_sq_mat, float in [0,1]
  • lower_r_sq_thresh – minimum from instrument threshold for r_sq_mat, float in [0,1]
  • prune_r_sq_thresh – threshold from which to remove highly correlated SNPs. float in [0,1]
Returns:

Returns a logical vector of length m representing variants that are in high to low LD with the

instrumental variables and in high LD with other variants.

Does MR-link solved by ordinary least squares.

Parameters:
  • outcome_geno – outcome genotypes
  • r_sq_mat – R^2 matrix in order of genotypes of outcome geno
  • exposure_betas – beta estimates of the exposure instrumental variables.
  • causal_exposure_indices – indices of the exposure instrumental variables.
  • outcome_phenotype – outcome phenotype vector
  • upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.
Returns:

beta, se and p value estimate of the MR-link estimate

Does MR-link solved by ridge regression. Please note that the p value and se is uncorrected. so these are usually _very_ conservative. See the MR-link manuscript for details.

Parameters:
  • outcome_geno – outcome genotypes
  • r_sq_mat – R^2 matrix in order of genotypes of outcome geno
  • exposure_betas – beta estimates of the exposure instrumental variables.
  • causal_exposure_indices – indices of the exposure instrumental variables.
  • outcome_phenotype – outcome phenotype vector
  • upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.
Returns:

beta, se and p value estimate of the MR-link estimate

genome_integration.causal_inference.mr_link.remove_highly_correlated_snps(r_sq_mat, r_sq_threshold=0.95)

This iteratively selects variants that are less correlated to each other than r_sq_threshold removes SNPs that are more correlated that r_sq_threshold.

First SNP in r_sq_mat is always retained, later ones are pruned. Of note, in simulations and in real data, when permuting the ordering of r_sq_mat, we found no meaningful differences in MR-link results. So

Parameters:
  • r_sq_mat – squared pearson correlation matrix of variants, shape (m x m)
  • r_sq_threshold – threshold for r_sq_mat, float in [0,1]
Returns:

boolean array of shape (m) indicating which SNPs are lower in LD than r_sq_threshold