GeSICA: Genome Segmentation from Intra-Chromosomal Associations

contact: gesica [dot] tongji [at] gmail [dot] com

download

Useful Files

 

 

Human Genome Segmentation with GeSICA

Welcome to the website of GeSICA, a genome segmentation software.

Installation

GeSICA uses Python's distutils tools for source installations. To install a source distribution of GeSICA, unpack the distribution tarball and open up a command terminal. Go to the directory where you unpacked GeSICA, and simply run the install script :

$ python setup.py install

Usage

Usage: gesica <-f hicfile> [-n name] [-g genome] [-r resolution] [-d distal] [options]

Example: gesica -f ~/GeSICA-alpha/sample/GM_19.txt -n GM06990 -c 20000 -g hg18 -r 100000 -d 400000 -i 3.0

GeSICA: Genome Segmentation from Intra-Chromosomal Associations

Parameters:

--version: show program's version number and exit.

-h, --help: show this help message and exit.

-f, --filename: the direction and name of a hi-c file. REQUIRED.

-n, --name: experiment name, which will be used to generate output file names. DEFAULT: NA.

-g, --genome: genome version, hg19, hg18 and mm9. DEFAULT: hg18.

-c: Hi-C data filter in order to remove the close random interactions. DEFAULT: 20000.

-r: the resolution of this method. DEFAULT: 100000.

-d: the cutoff defining the distal interactions. DEFAULT: 400000.

-i: the inflation parameter in Markov Clustering. DEFAULT: 3.0.

-s: the scheme parameter in Markov Clustering. DEFAULT: 6.0.

--wig: whether or not the output files should include the Interaciton-Ratio wiggle file. DEFAULT: 1. RANGE: 0/1.

--bed: whether or not the output files should include the segmentation bed file. DEFAULT: 1. RANGE: 0/1.

--cluster: whether or not the output files should include the MCL genomic clusters file. DEFAULT: 1. RANGE: 0/1.

--verbose: set verbose level. 0: only show critical message; 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT: 2.

Input and Output format

INPUT format:

tag_key chromosome1 locus1 strand1 cutting_site1 chromosome2 locus2 strand2 cutting_site2

e.g.

1:45:219:557 11 68122 0 5 11 68157 1 5

1:62:241:1178 11 93555424 1 26946 11 76068 1 9

OUTPUT format:

Interaciton-Ratio wiggle file

position Interation-Ratio

e.g.

550000 -1.52059717215

650000 -2.26471439047

750000 -0.471311582731

MCL genomic clusters file

e.g.

chr1

157000000 159000000 162000000 168000000 ...

Each line of the clustering result of each chromosome is the loci that belongs to that cluster

Segmentation bed file

chromosome start_point end_point chromosome_minus state/cluster number

e.g.

chr1 0 100000 1_2

chr1 100000 200000 1_2

chr1 200000 300000 1_-1

...

chr19 323000000 324000000 19_0

1_2: cluster number in the specific chromosome

1_-1: minus state in the specific chromosome

19_0: not plus/minus state in the specific chromosome