Skip to main content Link Search Menu Expand Document (external link)

ligate

Table of contents

  1. Description
  2. Usage: Ligate two chunks of phased data
  3. Note: Ligate chunks with family information
  4. Command line options
    1. Basic options
    2. Input files
    3. Output files

Description

Ligate multiple phased BCF/VCF files into a single whole chromosome file. Typically used to ligate multiple chunks of phased common variants.

Usage: Ligate two chunks of phased data

Go in the test folder and run:

First, let’s phase two overlapping chunks of data:

phase_common --input array/target.unrelated.bcf --region 1:1-5500000 --map info/chr1.gmap.gz --output target.chunk1.bcf --thread 8
phase_common --input array/target.unrelated.bcf --region 1:4500000-10000000 --map info/chr1.gmap.gz --output target.chunk2.bcf --thread 8

Then, we make a text file listing the chunks of data to ligate. This file need to be sorted!

echo target.chunk1.bcf > chunks.txt
echo target.chunk2.bcf >> chunks.txt

Finally, we can proceed with the ligation:

ligate --input chunks.txt --output target.phased.bcf --thread 2

The ligate program uses two threads (--thread 2) and saves the ligated haplotypes in the output file (--output target.phased.bcf). Two statistics are reported while during ligation:

  • Switch rate: the percentage of samples for which haplotypes have been switched,
  • Avg phaseQ: the agreement of the phasing in the overlapping region between two successive chunks of data. We usually expect values above 80 or 90.

Note: Ligate chunks with family information

If some samples have been phased using pedigree information (trios/duos, option --pedigree in phase_common), you must use --pedigree here too, with the same ped file, to make sure that these haplotypes are not switched when ligated. If this is not done, haplotypes may be switched for offsprings which would make the paternal/maternal labels incorrects.


Command line options

Basic options

Option name Argument Default Description
--help NA NA Produces help message
--seed INT 15052011 Seed of the random number generator
-T [ --thread ] INT 1 Number of thread used

Input files

Option name Argument Default Description
-I [--input ] STRING NA Text file containing all VCF/BCF to ligate, one file per line
--pedigree STRING NA Pedigree information (offspring father mother triplets). Really important to make sure that scaffolded samples are not switched!

Output files

Option name Argument Default Description
-O [--output ] STRING NA Phased haplotypes in VCF/BCF format
--no-index STRING NA If specified, the output the the ligated VCF/BCF is not indexed for random access to genomic regions
--log STRING NA Log file

Back to top

Copyright © 2022-2023 Olivier Delaneau | All Rights Reserved | SHAPEIT5 executables and source code are distributed under the MIT license.