File with four columns listing in order: regions frequencies validation and imputed dataset. For genome-wide concordance, add more lines specifying different chromosomes.
--samples
NA
NA
List of samples to process, one sample ID per line.
--gt-val
NA
NA
Uses hard called genotypes rather than phread-scaled likelihoods for the validation dataset, reading them from FORMAT/GT field.
--gt-tar
NA
NA
Uses FORMAT/GT field to determine the best-guess genotype rather than the FORMAT/GP (default). FORMAT/DS are FORMAT/GP fields are still required for calibration and rsquared calculations.
Other parameters
Option name
Argument
Default
Description
--af-tag
STRING
AF
Allele frequency INFO tag to use for binning. By default the allele frequency is estimated from the INFO/AF tag.
--use-alt-af
NA
NA
If specified, the metrics work on the ALT allele frequency (range [0,1]), rather than minor allele frequency (range [0,0.5]).
--bins
VECTOR
NA
Allele frequency bins used for rsquared computations. By default they should as MAF bins [0-0.5], while they should take the full range [0-1] if –use-ref-alt is used.
--ac-bins
VECTOR
NA
User-defined allele count bins used for rsquared computations.
--allele-counts
VECTOR
NA
Default allele count bins used for rsquared computations. AN field must be defined in the frequency file.
--min-val-gl
FLOAT
NA
Minimum genotype likelihood probability P(G|R) in validation data [set to zero to have no filter of if using –gt-validation]
--min-val-dp
INT
NA
Minimum coverage in validation data. If FORMAT/DP is missing and –minDP > 0, the program exits with an error. [set to zero to have no filter of if using –gt-validation]
--min-tar-gp
VECTOR
NA
Minimum GP probabilities to be used as a filter. By default it looks at the GP field to specify the filter, but will try to use FORMAT/PL if gt-tar option is specified. Leave empty if no filter is used.
--out-r2-per-site
NA
NA
Output r2 at each site.
--out-rej-sites
NA
NA
Output sites where that cannot be used for the concordance.
--out-conc-sites
NA
NA
Output sites where all target genotypes are concordant with the truth.
--out-disc-sites
NA
NA
Output sites where at least one target genotype is diconcordant with the truth.
--groups
FILE
NA
Alternative to frequency bins: group bins are user defined, provided in a file.
Output files
Option name
Argument
Default
Description
-O [--output ]
STRING
NA
Prefix of the output files (extensions are automatically added)