Tool name:

GLIMPSE_chunk defines chunks where to run imputation

GLIMPSE_phase main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods

GLIMPSE_ligate concatenates imputation chunks in a single VCF/BCF file ligating phased information

GLIMPSE_sample generates haplotype calls by sampling haplotype estimates


Detailed description

GLIMPSE chunk

Usage

GLIMPSE_chunk --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --window-size 1000000 --window-count 1000 --buffer-size 250000 --buffer-count 250 --output chunks.txt --log chunks.log

Command options

Long form Short Argument Description
--help NA NA Produces an help message with a short description of the command options
--seed NA INT Seed for random number generator.
--thread NA INT Number of threads to use (default 1). Multi-threading is only used for reading files using the htslib library
--input --I STRING Target dataset in VCF/BCF format defined at all variable positions. The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended).
--region NA STRING Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy.
--window-size NA INT Minimal size of the imputation region in basepairs.
--window-count NA INT Minimal number of variants present in the imputation region.
--buffer-size NA INT Minimal size of the buffer region in basepairs.
--buffer-count NA INT Minimal number of variants present in the buffer region.
--output -O STRING Output file containing buffer and imputation regions.
--log NA STRING Log file.

GLIMPSE phase

Usage

GLIMPSE_phase --input input.GLs.vcf.gz --reference reference.bcf --region chr20 --thread 8 --input-region chr20:1500000-4500000 --map chr20.b38.gmap.gz --output-region chr20:2000000-4000000 --output imputed.chunk1.bcf --log imputed.chunk1.log

Command options

Long form Short Argument Description
--help NA NA Produces an help message with a short description of the command options
--seed NA INT Seed for random number generator.
--thread NA INT Number of threads to use (default 1). Multi-threading is performed at the sample level, therefore there will be no benefit from this parameter only one sample is imputed
--input --I STRING Input VCF/BCF file containing genotype likelihoods
--reference --R STRING Reference panel of haplotypes in VCF/BCF format.
--input-region NA STRING Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000).
--map --M STRING File containing the genetic map.
--samples-file NA STRING File with sample names and ploidy information. One sample per line with a mandatory second column indicating ploidy (1 or 2). Sample names that are not present are assumed to have ploidy 2 (diploids). GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy.
--impute-reference-only-variants NA STRING Allows imputation at variants only present in the reference panel. The use of this option is intended only to allow imputation at sporadic missing variants. If the number of missing variants is non-sporadic, please re-run the genotype likelihood computation at all reference variants and avoid using this option, since data from the reads should be used. A warning is thrown if reference-only variants are found.
--input-GL NA STRING Uses FORMATL/GL field instead of FORMAT/PL as input for genotype likelihoods.
--ban-repeated-sample-names NA STRING Excludes reference samples having names matching the target samples. To be used only when the target and the reference panel share (even partially) the same set of individuals.
--burnin NA INT Number of burn-in iterations of the Gibbs sampler.
--main NA INT Number of main iterations of the Gibbs sampler. Each main iterations contributes to output genotypes. Haplotypes sampled for the last (max 15) iterations are stored in the HS field.
--pbwt-depth NA INT Depth of PBWT indexes to condition on.
--pbwt-modulo NA INT Frequency of PBWT selection.
--init-states NA INT Number of states used for initialization and maximal number of states in the subsequent iterations.
--init-pool NA INT List of samples (sample IDs) from which haplotypes are initialised.
--ne NA FLOAT Effective diploid population size.
--output -O STRING Output VCF/BCF file containing genotype probabilities (GP field), imputed dosages (DS field), best guess genotypes (GT field), sampled haplotypes in the last (max 16) main iterations (HS field) and info-score.
--output-region NA STRING Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000).
--log NA STRING Log file.

GLIMPSE ligate

Usage

GLIMPSE_ligate --input list.imputed.txt --output imputed.chr20.bcf --log imputed.chr20.log

Command options

Long form Short Argument Description
--help NA NA Produces an help message with a short description of the command options
--seed NA INT Seed for random number generator.
--thread NA INT Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library
--input --I STRING Text file containing the full list of files to ligate (one file per line)
--output -O STRING Output VCF/BCF file for the merged regions. Phased information (HS field) is updated accordingly for the full region.
--log NA STRING Log file.

GLIMPSE sample

Usage

GLIMPSE_sample --input imputed.chr20.bcf --solve --output phased.chr20.bcf --log phased.chr20.log

Command options

Long form Short Argument Description
--help NA NA Produces an help message with a short description of the command options
--seed NA INT Seed for random number generator.
--thread NA INT Number of threads to use (default 1). Multi-threading is only used for reading/writing files using the htslib library
--input --I STRING VCF/BCF file generated using GLIMPSE ligate
--sample NA STRING Samples a likely haplotype pair for each sample, use it in combination with --seed. Option not recommended for general usage, use --solve instead.
--solve NA STRING Get the most likely haplotype pair for each sample (the random number generator is not used)
--output -O STRING Output VCF/BCF file containing phased genotypes.
--log NA STRING Log file.