GLIMPSE

27/11/2022: This page describes the GLIMPSE1 method. GLIMPSE1 is recommended to the users who want to use the joint model, for imputation of >1x data and a moderate sized reference panels. For large reference panels and lower coverages, we recommend the use of the GLIMPSE2 method

Tool name:

GLIMPSE_chunk defines chunks where to run imputation

GLIMPSE_phase main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

GLIMPSE_ligate concatenates imputation chunks in a single VCF/BCF file ligating phased information

GLIMPSE_sample generates haplotype calls by sampling haplotype estimates

Detailed description

GLIMPSE chunk

Usage

GLIMPSE_chunk --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --window-size 1000000 --window-count 1000 --buffer-size 250000 --buffer-count 250 --output chunks.txt --log chunks.log

Command options

Long form	Short	Argument	Description
`--help`	NA	NA	Produces an help message with a short description of the command options
`--seed`	NA	INT	Seed for random number generator.
`--thread`	NA	INT	Number of threads to use (default 1). Multi-threading is only used for reading files using the htslib library
`--input`	`--I`	STRING	Target dataset in VCF/BCF format defined at all variable positions. The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended).
`--region`	NA	STRING	Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy.
`--window-size`	NA	INT	Minimal size of the imputation region in basepairs.
`--window-count`	NA	INT	Minimal number of variants present in the imputation region.
`--buffer-size`	NA	INT	Minimal size of the buffer region in basepairs.
`--buffer-count`	NA	INT	Minimal number of variants present in the buffer region.
`--output`	`-O`	STRING	Output file containing buffer and imputation regions.
`--log`	NA	STRING	Log file.

GLIMPSE phase

Usage

GLIMPSE_phase --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --thread 8 --input-region chr20:1500000-4500000 --map chr20.b38.gmap.gz --output-region chr20:2000000-4000000 --output imputed.chunk1.bcf --log imputed.chunk1.log

Command options

Long form	Short	Argument	Description
`--help`	NA	NA	Produces an help message with a short description of the command options
`--seed`	NA	INT	Seed for random number generator.
`--thread`	NA	INT	Number of threads to use (default 1). Multi-threading is performed at the sample level, therefore there will be no benefit from this parameter only one sample is imputed
`--input`	`--I`	STRING	Input VCF/BCF file containing genotype likelihoods
`--reference`	`--R`	STRING	Reference panel of haplotypes in VCF/BCF format.
`--input-region`	NA	STRING	Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000).
`--map`	`--M`	STRING	File containing the genetic map.
`--samples-file`	NA	STRING	File with sample names and ploidy information. One sample per line with a mandatory second column indicating ploidy (1 or 2). Sample names that are not present are assumed to have ploidy 2 (diploids). GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy.
`--impute-reference-only-variants`	NA	STRING	Allows imputation at variants only present in the reference panel. The use of this option is intended only to allow imputation at sporadic missing variants. If the number of missing variants is non-sporadic, please re-run the genotype likelihood computation at all reference variants and avoid using this option, since data from the reads should be used. A warning is thrown if reference-only variants are found.
`--input-GL`	NA	STRING	Uses FORMATL/GL field instead of FORMAT/PL as input for genotype likelihoods.
`--ban-repeated-sample-names`	NA	STRING	Excludes reference samples having names matching the target samples. To be used only when the target and the reference panel share (even partially) the same set of individuals.
`--burnin`	NA	INT	Number of burn-in iterations of the Gibbs sampler.
`--main`	NA	INT	Number of main iterations of the Gibbs sampler. Each main iterations contributes to output genotypes. Haplotypes sampled for the last (max 15) iterations are stored in the HS field.
`--pbwt-depth`	NA	INT	Depth of PBWT indexes to condition on.
`--pbwt-modulo`	NA	INT	Frequency of PBWT selection.
`--init-states`	NA	INT	Number of states used for initialization and maximal number of states in the subsequent iterations.
`--init-pool`	NA	INT	List of samples (sample IDs) from which haplotypes are initialised.
`--ne`	NA	FLOAT	Effective diploid population size.
`--output`	`-O`	STRING	Output VCF/BCF file containing genotype probabilities (GP field), imputed dosages (DS field), best guess genotypes (GT field), sampled haplotypes in the last (max 16) main iterations (HS field) and info-score.
`--output-region`	NA	STRING	Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000).
`--log`	NA	STRING	Log file.

GLIMPSE ligate

Usage

GLIMPSE_ligate --input list.imputed.txt --output imputed.chr20.bcf --log imputed.chr20.log

Command options

Long form	Short	Argument	Description
`--help`	NA	NA	Produces an help message with a short description of the command options
`--seed`	NA	INT	Seed for random number generator.
`--thread`	NA	INT	Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library
`--input`	`--I`	STRING	Text file containing the full list of files to ligate (one file per line)
`--output`	`-O`	STRING	Output VCF/BCF file for the merged regions. Phased information (HS field) is updated accordingly for the full region.
`--log`	NA	STRING	Log file.

GLIMPSE sample

Usage

GLIMPSE_sample --input imputed.chr20.bcf --solve --output phased.chr20.bcf --log phased.chr20.log

Command options

Long form	Short	Argument	Description
`--help`	NA	NA	Produces an help message with a short description of the command options
`--seed`	NA	INT	Seed for random number generator.
`--thread`	NA	INT	Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library
`--input`	`--I`	STRING	VCF/BCF file generated using GLIMPSE ligate
`--sample`	NA	STRING	Samples a likely haplotype pair for each sample, use it in combination with `--seed`. Option not recommended for general usage, use `--solve` instead.
`--solve`	NA	STRING	Get the most likely haplotype pair for each sample (the random number generator is not used)
`--output`	`-O`	STRING	Output VCF/BCF file containing phased genotypes.
`--log`	NA	STRING	Log file.

GLIMPSE command options

GLIMPSE chunk

Usage

Command options

GLIMPSE phase

Usage

Command options

GLIMPSE ligate

Usage

Command options

GLIMPSE sample

Usage

Command options