module: genome_integration.variants¶
These classes are used to implement some features of variants. These classes are mostly used for storing variant reference information.
-
class
genome_integration.variants.SNP(snp_name=None, chromosome=None, position=None, major_allele=None, minor_allele=None, minor_allele_frequency=None)¶ This is the base SNP class.
only biallelic variants possible.
Most of the data can be missing, as it often is in many datasets that you are trying to parse.
- snp_name: str
- Name of the SNP is used for cross comparison if the variant is merged with other snp_info.
- chromosome: int, castable to int or str
- Name of the chromosome, can be a str or an int.
- position: int
- Base pair position on the genome. Internally build b37 is used a lot. So if you’re unsure, try and use this.
- major_allele: str
- Used as the more common allele between in the variant (biallelic SNPs only)
- minor_allele: str
- Used as the less common allele between in the variant.
- minor_allle_frequency: float
- float between 0 and 1, inclusive. gives the minor allele frequency. By definition, the minor allele should be less often present so should more often than not be <=0.5, but alleles can be flipped which would also flip the frequency.
- has_position_data: bool
- if position data is available.
- has_allele_data: bool
- if all alleles are available
- has_frequency_data:
- if frequency data is available.
- add_snp_data(self, snp_data, overwrite=False)
- Adds data from another SNP class-like object.
- add_frequency_data(self, snp_data, flipped, overwrite)
- Adds the frequency data from another SNP object and requires you to say if the alleles are flipped or not.
- add_pos_chr(self, pos, chr):
- adds position information
- update_alleles(self, snp_data, overwrite)
- updates alleles, and updates the self.has_allele_data boolean
- add_minor_allele_frequency(self, major, minor, freq):
- Adds a minor allele frequency
- _update_alleles(self, snp_data, overwrite)
- updates alleles, but does not update the self.has_allele_data boolean, the function update_alleles() will do both.
- _flip_alleles(self):
- Flips alleles – major becomes minor; minor becomes major. Does not update frequency.
- set_pos_name(self):
- Sets the snp_name attribute to the pos_name which is {chr}:{position}
-
add_frequency_data(snp_data, flipped, overwrite)¶ updates the frequency data based on a reference.
Parameters: - snp_data –
- flipped –
Returns:
-
add_minor_allele_frequency(major, minor, freq)¶ Adds a minor allele frequency :param major: :param minor: :param freq: :return:
-
add_pos_chr(pos, chr)¶ adds position information
Parameters: - pos –
- chr –
Returns: self
-
add_snp_data(snp_data, overwrite=False)¶ This class will return itself with updated snp data. It will only change data from a class if the snp_name is the same, or if the position is the same.
Author comment: This is bloody hard to get right.
Parameters: a SNP object or bigger. (snp_data,) – Return self:
-
set_pos_name()¶ Sets the snp_name attribute to the pos_name which is {chr}:{position} :return:
-
update_alleles(snp_data, overwrite)¶ updates alleles, and updates the self.has_allele_data boolean :param snp_data: :param overwrite: :return:
-
class
genome_integration.variants.BimFile(file_name)¶ Reads in a plink formatted bim file that contains many different variants.
- bim_results: dict
- Values contains SNP information of all variants in the bim file. keys are the SNP name of the bim file.
- bim_results_by_pos: dict
- Values contains SNP information of all variants in the bim file. keys are the pos name “<chr>:<pos>” of the bim file.
- snp_names: list
- ordered list of snp names
- add_frq_information(self, file_name):
- Adds frequency information from a plink file (.frq or .freqx)
-
add_frq_information(file_name)¶ Adds frequency information from a plink file (.frq or .freqx)
Parameters: file_name – file name of the plink frq or frqx file. Returns: self, with added frq information