module: genome_integration.variants

These classes are used to implement some features of variants. These classes are mostly used for storing variant reference information.

class genome_integration.variants.SNP(snp_name=None, chromosome=None, position=None, major_allele=None, minor_allele=None, minor_allele_frequency=None)

This is the base SNP class.

only biallelic variants possible.

Most of the data can be missing, as it often is in many datasets that you are trying to parse.

snp_name: str
Name of the SNP is used for cross comparison if the variant is merged with other snp_info.
chromosome: int, castable to int or str
Name of the chromosome, can be a str or an int.
position: int
Base pair position on the genome. Internally build b37 is used a lot. So if you’re unsure, try and use this.
major_allele: str
Used as the more common allele between in the variant (biallelic SNPs only)
minor_allele: str
Used as the less common allele between in the variant.
minor_allle_frequency: float
float between 0 and 1, inclusive. gives the minor allele frequency. By definition, the minor allele should be less often present so should more often than not be <=0.5, but alleles can be flipped which would also flip the frequency.
has_position_data: bool
if position data is available.
has_allele_data: bool
if all alleles are available
has_frequency_data:
if frequency data is available.
add_snp_data(self, snp_data, overwrite=False)
Adds data from another SNP class-like object.
add_frequency_data(self, snp_data, flipped, overwrite)
Adds the frequency data from another SNP object and requires you to say if the alleles are flipped or not.
add_pos_chr(self, pos, chr):
adds position information
update_alleles(self, snp_data, overwrite)
updates alleles, and updates the self.has_allele_data boolean
add_minor_allele_frequency(self, major, minor, freq):
Adds a minor allele frequency
_update_alleles(self, snp_data, overwrite)
updates alleles, but does not update the self.has_allele_data boolean, the function update_alleles() will do both.
_flip_alleles(self):
Flips alleles – major becomes minor; minor becomes major. Does not update frequency.
set_pos_name(self):
Sets the snp_name attribute to the pos_name which is {chr}:{position}
add_frequency_data(snp_data, flipped, overwrite)

updates the frequency data based on a reference.

Parameters:
  • snp_data
  • flipped
Returns:

add_minor_allele_frequency(major, minor, freq)

Adds a minor allele frequency :param major: :param minor: :param freq: :return:

add_pos_chr(pos, chr)

adds position information

Parameters:
  • pos
  • chr
Returns:

self

add_snp_data(snp_data, overwrite=False)

This class will return itself with updated snp data. It will only change data from a class if the snp_name is the same, or if the position is the same.

Author comment: This is bloody hard to get right.

Parameters:a SNP object or bigger. (snp_data,) –
Return self:
set_pos_name()

Sets the snp_name attribute to the pos_name which is {chr}:{position} :return:

update_alleles(snp_data, overwrite)

updates alleles, and updates the self.has_allele_data boolean :param snp_data: :param overwrite: :return:

class genome_integration.variants.BimFile(file_name)

Reads in a plink formatted bim file that contains many different variants.

bim_results: dict
Values contains SNP information of all variants in the bim file. keys are the SNP name of the bim file.
bim_results_by_pos: dict
Values contains SNP information of all variants in the bim file. keys are the pos name “<chr>:<pos>” of the bim file.
snp_names: list
ordered list of snp names
add_frq_information(self, file_name):
Adds frequency information from a plink file (.frq or .freqx)
add_frq_information(file_name)

Adds frequency information from a plink file (.frq or .freqx)

Parameters:file_name – file name of the plink frq or frqx file.
Returns:self, with added frq information