Public Interface¶
- Utils
- Geno Queue
- Preprocess
preprocess.standardize()preprocess.inpute()preprocess.compose()
Utils¶
-
geno_sugar.utils.is_in(bim, geno_range)¶ Parameters: - bim (pandas.DataFrame) – Variant annotation
- geno_range (tuple) – (chrom, pos_start, pos_end)
Returns: Isnp – Variant filter
Return type: bool array
-
geno_sugar.utils.snp_query(G, bim, Isnp)¶ Parameters: - G ((n_snps, n_inds) array) – Genetic data
- bim (pandas.DataFrame) – Variant annotation
- Isnp (bool array) – Variant filter
Returns: - G_out ((n_snps, n_inds) array) – filtered genetic data
- bim_out (dataframe) – filtered variant annotation
-
geno_sugar.utils.standardize_snps(G)¶ Standardize variantes.
Parameters: G ((n_inds, n_snps) array) – Genetic data Returns: G_out Return type: standardized array
-
geno_sugar.utils.unique_variants(G)¶ Filters out variants with the same genetic profile.
Parameters: G ((n_inds, n_snps) array) – Genetic data Returns: - G_out ((n_inds, n_unique_snps) array) – filtered genetic data
- idxs (int array) – indexes of the the unique variants
Geno Queue¶
Iterator class facilitating genome-wide analyses by (i) loading the genetic data in batches of snps, and (ii) applying user-specified functions for preprocessing and filtering.
-
class
geno_sugar.geno_queue.GenoQueue(G, bim, batch_size=1000, preprocess=None, verbose=True)¶ Util class for genome wide analysis
Parameters: - G ((snps, inds) array) – Genetic data
- bim (pandas.DataFrame) – Variant annotation
- batch_size (int) – number of snps in the batch
- preprocess (function) – preprocess function
- verbose (bool) – verbose flag (default True)
Preprocess¶
Preprocess functions return functions that take as only argument the array-like genetic matrix
-
geno_sugar.preprocess.compose(func_list)¶ composion of preprocessing functions
-
geno_sugar.preprocess.filter_by_maf(min_maf=0.01)¶ return function that filters by maf (takes minimum maf, default is 0.01)
-
geno_sugar.preprocess.filter_by_missing(max_miss=0.01)¶ return function that filters by missing values (takes maximum fraction of missing values, default is 0.01)
-
geno_sugar.preprocess.impute(imputer)¶ return impute function
-
geno_sugar.preprocess.standardize()¶ return variant standarize function