module: genome_integration.causal_inference¶

This class is used to do a ‘simple’ inverse variance estimate on independent effects.

class genome_integration.causal_inference.mendelian_randomization.MendelianRandomization¶

This class is a base class of most summary statistic based MR analyses.

estimation_done: bool: boolean indicating if an estimation was done.
estimation_data: empty list: initialized as an empty list, but as estimates are added, will contain tuples of length 2 with a beta and a standard error.
estimation_snps = : empty list: initialized as an empty list, but will be filled with snp information as estimates are added. can be left empty.
outcome_tuples: empty list: initialized as an empty list, can be filled with the summary statistics of the outcome in the same way as estimation data
exposure_tuples: empty list: initialized as an empty list, can be filled with the summary statistics of the exposure in the same way as estimation data
q_test_indices_remaining: list of ints: initialized as empty, but as the q test removes estimates, it shows which estimates are removed.
_ivw_intermediate_top: list of floats: this is an internal variable that should not be used.
_ivw_intermediate_bottom list of floats: this is an internal variable that should not be used.

add_estimate(self, beta_se_tuple, variant_name, pos, chr): adds a single estimate for MR estimation. beta_se_tuple is a tuple of length 2, containing the beta estimate and se(beta). The variant name, pos and chr are information that denote where the variant comes from.
do_ivw_estimation(self): Does the IVW estimation based on the single estimates in estimation data.
do_ivw_estimation_on_estimate_vector(self, estimation_vec): Does an IVW estimate on an estimate vector, without storing any information in the class. estimation_vec is a list of tuples with beta and se per variant that’s being used.
get_ivw_estimates(self): Deprecated, mirrors the do_ivw_estimation() method.
do_smr_estimate(self, exposure_tuple, outcome_tuple): Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
do_smr_estimate(self, exposure_tuple, outcome_tuple): Does an SMR estimate of causality. SMR uses the variance of the outcome summary statistic and the variance of the exposure summary statistic to estimate significance. exposure_tuple is a beta se tuple of the exposure summary statistics outcome_tuple is a beta se tuple of the outcome summary statistics
do_and_add_smr_estimation(self, exposure_tuple, outcome_tuple, variant_name, pos, chr): Adds an SMR estimate from the exposure and outcome tuples (beta, se).
do_chochrans_q_meta_analysis(self, p_value_threshold):: Does a chochrans q meta analysis of the results at a certain p value threhold.
do_egger_regression(self):: Does the original Egger regression with a single variance term. (weighted by the outcome se)
do_egger_regression_single_variance_term(self):: Does egger regression with a single variance term. (weighted only by the outcome se)
do_egger_regression_two_variance_term(self): Does egger regression with a double variance term. (weighted by both the outcome and exposure se)
do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple): single term MR estimate. In contrast to SMR estimate, uses a single variance term.
do_single_term_mr_estimate(self, exposure_tuple, outcome_tuple): single term MR estimate. In contrast to SMR estimate, uses a single variance term. adds it to the list of estimates.
mr_presso(self, n_sims=1000, significance_thresh=0.05):: Does MR-PRESSO, number of sims is the number of permutations done, and the significance term is the term at which estimates are rejected.
do_lda_mr_egger(self, ld_matrix): Does LDA-MR-Egger. requires an LD matrix (peason correlation), which is ordered (rows and columns) by the estimates that were made.
do_lda_mr_egger_on_estimates(self, list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix,: write_out=False):

Does LDA MR Egger Requires the list of (beta, se) tuples from the outcome and exposure and the pearson_ld_matrix. write_out is for debug purposes.

add_estimate(beta_se_tuple, variant_name, pos, chr)¶

Adds an estimate to the class.

Parameters:	beta_se_tuple – Estimation data. variant_name – name of variant for external reference pos – position of the variant for external reference chr – chromosome of the variant for external reference
Returns:

do_and_add_single_term_mr_estimation(exposure_tuple, outcome_tuple)¶

Does a single term variance estimate of a variant, and adds it to the class for further analysis.

Parameters:	exposure_tuple – beta, se tuple of the exposure outcome_tuple – beta, se tuple of the outcome
Returns:	None

do_and_add_smr_estimation(exposure_tuple, outcome_tuple, variant_name=None, pos=None, chr=None)¶

Does an SMR estimation (two variance terms included) and adds it to the class.

Parameters:	exposure_tuple – beta,se tuple of the exposure summary statistics outcome_tuple – beta, se tuple of the outcome summary statistics variant_name – name of the variant for external reference pos – position of the variant for external reference chr – chromosome of the variant for external reference
Returns:

do_chochrans_q_meta_analysis(p_value_threshold)¶

Does a chochrans Q meta analysis on the interally present estimates, and estimates a combined causal effect.

Parameters:	p_value_threshold – p value threshold when to reject the null hypothesis that the estimate is drawn from

the same distribution.

Returns:	tuple of floats: beta, se, wald_p_val of the estimate after chochran’s Q meta analysis.

do_egger_regression()¶

Does egger regression based on single variance term estimates.

Returns:	list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope

do_egger_regression_single_variance_term()¶

Does egger regression based on single variance term estimates.

Returns:	list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_egger_regression_two_variance_term()¶

Does egger regression based on two variance term estimates.

Returns:	list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_ivw_estimation()¶

Does IVW estimation on all the methods

Returns:	tuple of floats: beta, se, wald_p_val of the estimate.

do_ivw_estimation_on_estimate_vector(estimation_vec, save_intermediate=False)¶

Estimates IVW on a specified estimate vector

Parameters:	estimation_vec – list of (beta,se) tuples save_intermediate – save intermediate results to the class.
Returns:	tuple of floats: beta, se, wald_p_val of the estimate.

do_ivw_heterogeneity_estimation()¶: Run a heterogeneity estimation on an already done IVW estimation. :return: a p value of estimation.

do_ivw_heterogeneity_estimation_on_estimate_vector(estimation_vector)¶

Provides a heterogeneity estimation p value of an ivw estimation based on an estimation vector.

Parameters:	estimation_vector – estimation vector of MR estimates.
Returns:	a p value of heterogeneity.

do_lda_mr_egger(ld_matrix)¶: Perform LDA MR egger on internal estimates. :param ld_matrix: pearson LD matrix. ordered by the estimates. :return: list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept and slope respectively.

do_lda_mr_egger_on_estimates(list_of_outcome_tuples, list_of_exposure_tuples, pearson_ld_matrix, write_out=False)¶

This will do LDA simulate_mr egger regression as described in Barfield et al. 2018, genetic epidemiology Implemented based on their paper, and a reference implementation they provided in personal communication

<Begin email.> Hi Adriaan, Below please find the R function to implement the approach. Please let me know if you have any questions.

-Richard

X is the vector of joint eQTL effects Y is the vector of joint GWAS effects W is the inverse of the covariance of the joint GWAS effects (i.e. var(Y))

weight.func2<-function(X,Y,W){: bX<-cbind(1,X) bread<-solve(crossprod(bX,W)%*%bX) theEsts<-bread%*%crossprod(bX,W%*%Y) theresid<-c(Y-theEsts[1]-X*theEsts[2]) Sig.Est<-c(crossprod(theresid,W%*%theresid))/(length(X)-2) finresults<- cbind(theEsts,diag(bread)*Sig.Est) TestStat<-theEsts/sqrt(finresults[,2]) Pvals<-2*pt(abs(TestStat),df = nrow(bX)-2,lower.tail = F) return(cbind(finresults,TestStat,Pvals))

} <End email.>

Parameters:

list_of_outcome_tuples – list of beta,se tuples from the outcome
list_of_exposure_tuples – list of beta,se tuples from the exposure
pearson_ld_matrix – peason LD matrix in the order of the estimates.

Returns:

return:	list of length two each with a tuple of floats: beta, se, wald_p_val of the estimate for intercept

and slope respectively.

do_single_term_mr_estimate(exposure_tuple, outcome_tuple)¶

Does a single variance term MR estimate on a variant.

Parameters:	exposure_tuple – beta, se tuple of the exposure outcome_tuple – beta, se tuple of the outcome
Returns:	beta, se and p value of the single estimate.

do_smr_estimate(exposure_tuple, outcome_tuple)¶

Determine SMR test effect and standard error. Identifies standard error of the estimate, based on on two variance terms

Parameters:	exposure_data – Exposure estimates which must have the methods get_beta and get_z_score outcome_data – Outcome estimates which must have the methods get_beta and get_z_score
Returns:	tuple of smr beta and smr se of the estimate.

do_weighted_median_meta_analysis_on_estimate_vectors(exposure_associations, outcome_associations)¶

Performs the weighted median estimation following Web Appendix 2 from Genet Epidemiol. 2016 May; 40(4): 304–314. Published online 2016 Apr 7. doi: 10.1002/gepi.21965 PMCID: PMC4849733 PMID: 27061298 Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator Jack Bowden, 1 George Davey Smith, 1 Philip C. Haycock, 1 and Stephen Burgess

Parameters:	estimation_vector –
Returns:	Weighted median estimator.

get_ivw_estimates()¶

Mirrors do_ivw_estimation

Returns:	tuple of floats: beta, se, wald_p_val of the estimate.

mr_presso(n_sims=1000, significance_thresh=0.05)¶

Python reimplementation of MR-PRESSO.

Parameters:	n_sims – number of permutation simulations. significance_thresh – significance thresshold.
Returns:	beta, se and p value of the estimate after the bad snps were removed. If no estimate can be made,

returns a tuple of 3* (np.nan).

genome_integration.causal_inference.mr_link.make_mr_link_design_matrix(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95, output_selected_variants=False)¶

Parameters:	outcome_geno – genotype matrix of the outcome r_sq_mat – R^2 matrix of all the genotypes of the outcome exposure_betas – beta estimates of the exposure instrumental variables causal_exposure_indices – indices of the exposure instrumental variables upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned
Returns:	a design matrix for use in MR-link.

genome_integration.causal_inference.mr_link.mask_instruments_in_ld(r_sq_mat, instruments, upper_r_sq_thresh=0.99, lower_r_sq_thresh=0.1, prune_r_sq_thresh=0.95, shuffle_positions=False)¶

Masks instruments that are in LD [upper_r_sq_threshold, lower_r_sq_threshold] with instrumental variables of the exposure. As well as being in high LD (> prune_r_sq_threshold) with itself.

Parameters:

r_sq_mat – squared pearson correlation matrix of variants, shape (m x m)
instruments – indices () of the instruments (variants) used for the exposure.
upper_r_sq_thresh – maximum correlation from instrument threshold for r_sq_mat, float in [0,1]
lower_r_sq_thresh – minimum from instrument threshold for r_sq_mat, float in [0,1]
prune_r_sq_thresh – threshold from which to remove highly correlated SNPs. float in [0,1]

Returns:

Returns a logical vector of length m representing variants that are in high to low LD with the

instrumental variables and in high LD with other variants.

genome_integration.causal_inference.mr_link.mr_link_ols(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, outcome_phenotype, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95)¶

Does MR-link solved by ordinary least squares.

Parameters:

outcome_geno – outcome genotypes
r_sq_mat – R^2 matrix in order of genotypes of outcome geno
exposure_betas – beta estimates of the exposure instrumental variables.
causal_exposure_indices – indices of the exposure instrumental variables.
outcome_phenotype – outcome phenotype vector
upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.

Returns:

beta, se and p value estimate of the MR-link estimate

genome_integration.causal_inference.mr_link.mr_link_ridge(outcome_geno, r_sq_mat, exposure_betas, causal_exposure_indices, outcome_phenotype, upper_r_sq_threshold=0.99, lower_r_sq_threshold=0.1, prune_r_sq_threshold=0.95)¶

Does MR-link solved by ridge regression. Please note that the p value and se is uncorrected. so these are usually _very_ conservative. See the MR-link manuscript for details.

Parameters:

outcome_geno – outcome genotypes
r_sq_mat – R^2 matrix in order of genotypes of outcome geno
exposure_betas – beta estimates of the exposure instrumental variables.
causal_exposure_indices – indices of the exposure instrumental variables.
outcome_phenotype – outcome phenotype vector
upper_r_sq_threshold – the upper r_sq threshold for which the variants around the IVs are pruned.

Returns:

beta, se and p value estimate of the MR-link estimate

genome_integration.causal_inference.mr_link.remove_highly_correlated_snps(r_sq_mat, r_sq_threshold=0.95)¶

This iteratively selects variants that are less correlated to each other than r_sq_threshold removes SNPs that are more correlated that r_sq_threshold.

First SNP in r_sq_mat is always retained, later ones are pruned. Of note, in simulations and in real data, when permuting the ordering of r_sq_mat, we found no meaningful differences in MR-link results. So

Parameters:	r_sq_mat – squared pearson correlation matrix of variants, shape (m x m) r_sq_threshold – threshold for r_sq_mat, float in [0,1]
Returns:	boolean array of shape (m) indicating which SNPs are lower in LD than r_sq_threshold