GeSICA: Genome Segmentation from Intra-Chromosomal Associations
contact: gesica [dot] tongji [at] gmail [dot] com
Human Genome Segmentation with GeSICA
Welcome to the website of GeSICA, a genome segmentation software.
Installation
GeSICA uses Python's distutils tools for source installations. To install a source distribution of GeSICA, unpack the distribution tarball and open up a command terminal. Go to the directory where you unpacked GeSICA, and simply run the install script :
$ python setup.py install
Usage
Usage: gesica <-f hicfile> [-n name] [-g genome] [-r resolution] [-d distal] [options]
Example: gesica -f ~/GeSICA-alpha/sample/GM_19.txt -n GM06990 -c 20000 -g hg18 -r 100000 -d 400000 -i 3.0
GeSICA: Genome Segmentation from Intra-Chromosomal Associations
Parameters:
--version: show program's version number and exit.
-h, --help: show this help message and exit.
-f, --filename: the direction and name of a hi-c file. REQUIRED.
-n, --name: experiment name, which will be used to generate output file names. DEFAULT: NA.
-g, --genome: genome version, hg19, hg18 and mm9. DEFAULT: hg18.
-c: Hi-C data filter in order to remove the close random interactions. DEFAULT: 20000.
-r: the resolution of this method. DEFAULT: 100000.
-d: the cutoff defining the distal interactions. DEFAULT: 400000.
-i: the inflation parameter in Markov Clustering. DEFAULT: 3.0.
-s: the scheme parameter in Markov Clustering. DEFAULT: 6.0.
--wig: whether or not the output files should include the Interaciton-Ratio wiggle file. DEFAULT: 1. RANGE: 0/1.
--bed: whether or not the output files should include the segmentation bed file. DEFAULT: 1. RANGE: 0/1.
--cluster: whether or not the output files should include the MCL genomic clusters file. DEFAULT: 1. RANGE: 0/1.
--verbose: set verbose level. 0: only show critical message; 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT: 2.
Input and Output format
INPUT format:
tag_key chromosome1 locus1 strand1 cutting_site1 chromosome2 locus2 strand2 cutting_site2
e.g.
1:45:219:557 11 68122 0 5 11 68157 1 5
1:62:241:1178 11 93555424 1 26946 11 76068 1 9
OUTPUT format:
Interaciton-Ratio wiggle file
position Interation-Ratio
e.g.
550000 -1.52059717215
650000 -2.26471439047
750000 -0.471311582731
MCL genomic clusters file
e.g.
chr1
157000000 159000000 162000000 168000000 ...
Each line of the clustering result of each chromosome is the loci that belongs to that cluster
Segmentation bed file
chromosome start_point end_point chromosome_minus state/cluster number
e.g.
chr1 0 100000 1_2
chr1 100000 200000 1_2
chr1 200000 300000 1_-1
...
chr19 323000000 324000000 19_0
1_2: cluster number in the specific chromosome
1_-1: minus state in the specific chromosome
19_0: not plus/minus state in the specific chromosome