27/11/2022: This page describes the GLIMPSE1 method. GLIMPSE1 is recommended to the users who want to use the joint model, for imputation of >1x data and a moderate sized reference panels. For large reference panels and lower coverages, we recommend the use of the GLIMPSE2 method
GLIMPSE command options
Tool name:
GLIMPSE_chunk defines chunks where to run imputation
GLIMPSE_phase main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods
GLIMPSE_ligate concatenates imputation chunks in a single VCF/BCF file ligating phased information
GLIMPSE_sample generates haplotype calls by sampling haplotype estimates
Detailed description
GLIMPSE chunk
Usage
GLIMPSE_chunk --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --window-size 1000000 --window-count 1000 --buffer-size 250000 --buffer-count 250 --output chunks.txt --log chunks.log
Long form | Short | Argument | Description |
---|---|---|---|
--help |
NA | NA | Produces an help message with a short description of the command options |
--seed |
NA | INT | Seed for random number generator. |
--thread |
NA | INT | Number of threads to use (default 1). Multi-threading is only used for reading files using the htslib library |
--input |
--I |
STRING | Target dataset in VCF/BCF format defined at all variable positions. The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). |
--region |
NA | STRING | Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy. |
--window-size |
NA | INT | Minimal size of the imputation region in basepairs. |
--window-count |
NA | INT | Minimal number of variants present in the imputation region. |
--buffer-size |
NA | INT | Minimal size of the buffer region in basepairs. |
--buffer-count |
NA | INT | Minimal number of variants present in the buffer region. |
--output |
-O |
STRING | Output file containing buffer and imputation regions. |
--log |
NA | STRING | Log file. |
GLIMPSE phase
Usage
GLIMPSE_phase --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --thread 8 --input-region chr20:1500000-4500000 --map chr20.b38.gmap.gz --output-region chr20:2000000-4000000 --output imputed.chunk1.bcf --log imputed.chunk1.log
Long form | Short | Argument | Description |
---|---|---|---|
--help |
NA | NA | Produces an help message with a short description of the command options |
--seed |
NA | INT | Seed for random number generator. |
--thread |
NA | INT | Number of threads to use (default 1). Multi-threading is performed at the sample level, therefore there will be no benefit from this parameter only one sample is imputed |
--input |
--I |
STRING | Input VCF/BCF file containing genotype likelihoods |
--reference |
--R |
STRING | Reference panel of haplotypes in VCF/BCF format. |
--input-region |
NA | STRING | Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). |
--map |
--M |
STRING | File containing the genetic map. | --samples-file |
NA | STRING | File with sample names and ploidy information. One sample per line with a mandatory second column indicating ploidy (1 or 2). Sample names that are not present are assumed to have ploidy 2 (diploids). GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy. |
--impute-reference-only-variants |
NA | STRING | Allows imputation at variants only present in the reference panel. The use of this option is intended only to allow imputation at sporadic missing variants. If the number of missing variants is non-sporadic, please re-run the genotype likelihood computation at all reference variants and avoid using this option, since data from the reads should be used. A warning is thrown if reference-only variants are found. |
--input-GL |
NA | STRING | Uses FORMATL/GL field instead of FORMAT/PL as input for genotype likelihoods. |
--ban-repeated-sample-names |
NA | STRING | Excludes reference samples having names matching the target samples. To be used only when the target and the reference panel share (even partially) the same set of individuals. |
--burnin |
NA | INT | Number of burn-in iterations of the Gibbs sampler. |
--main |
NA | INT | Number of main iterations of the Gibbs sampler. Each main iterations contributes to output genotypes. Haplotypes sampled for the last (max 15) iterations are stored in the HS field. |
--pbwt-depth |
NA | INT | Depth of PBWT indexes to condition on. |
--pbwt-modulo |
NA | INT | Frequency of PBWT selection. |
--init-states |
NA | INT | Number of states used for initialization and maximal number of states in the subsequent iterations. |
--init-pool |
NA | INT | List of samples (sample IDs) from which haplotypes are initialised. |
--ne |
NA | FLOAT | Effective diploid population size. |
--output |
-O |
STRING | Output VCF/BCF file containing genotype probabilities (GP field), imputed dosages (DS field), best guess genotypes (GT field), sampled haplotypes in the last (max 16) main iterations (HS field) and info-score. |
--output-region |
NA | STRING | Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000). |
--log |
NA | STRING | Log file. |
GLIMPSE ligate
Usage
GLIMPSE_ligate --input list.imputed.txt --output imputed.chr20.bcf --log imputed.chr20.log
Long form | Short | Argument | Description |
---|---|---|---|
--help |
NA | NA | Produces an help message with a short description of the command options |
--seed |
NA | INT | Seed for random number generator. |
--thread |
NA | INT | Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library |
--input |
--I |
STRING | Text file containing the full list of files to ligate (one file per line) |
--output |
-O |
STRING | Output VCF/BCF file for the merged regions. Phased information (HS field) is updated accordingly for the full region. |
--log |
NA | STRING | Log file. |
GLIMPSE sample
Usage
GLIMPSE_sample --input imputed.chr20.bcf --solve --output phased.chr20.bcf --log phased.chr20.log
Long form | Short | Argument | Description |
---|---|---|---|
--help |
NA | NA | Produces an help message with a short description of the command options |
--seed |
NA | INT | Seed for random number generator. |
--thread |
NA | INT | Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library |
--input |
--I |
STRING | VCF/BCF file generated using GLIMPSE ligate |
--sample |
NA | STRING | Samples a likely haplotype pair for each sample, use it in combination with --seed . Option not recommended for general usage, use --solve instead. |
--solve |
NA | STRING | Get the most likely haplotype pair for each sample (the random number generator is not used) |
--output |
-O |
STRING | Output VCF/BCF file containing phased genotypes. |
--log |
NA | STRING | Log file. |