Our laboratory focuses on decoding chromatin regulatory grammar, i.e., how chromatin regulatory information is established, evolves, and functions during cell-fate determination. We aim to move from data-driven analysis to artificial-intelligence design, combining bioinformatics algorithms and AI models to reveal dynamic molecular regulation at multiple scales. Our work has progressed through four stages:
Stage 1. With the advent of high-throughput sequencing, we developed a suite of computational algorithms for analyzing large-scale biological data. These include MACS, a widely used ChIP-seq peak caller (Genome Biol 2008; Nat Protoc 2012; cited >20,000 times), GFOLD for ranking differentially expressed genes from RNA-seq (Bioinformatics 2012a), NPS and DiNuP for nucleosome positioning (BMC Genomics 2008; Bioinformatics 2012b), GeSICA for genome segmentation from Hi-C data (BMC Genomics 2012), and MethylPurify for tumor-purity deconvolution from DNA methylomes (Genome Biol 2014).
Stage 2. As high-throughput technologies became widely adopted, we investigated the dynamic regulation of epigenetic information during early embryogenesis, where epigenetic states change dramatically and often asymmetrically between cells. In zebrafish, we revealed how open chromatin regions and histone modifications are established during genome activation (Nature 2010; Genome Res 2014, 2018, 2022). In mouse, we demonstrated the dynamic reprogramming of H3K4me3, H3K27me3, and H3K9me3 modifications in pre-implantation embryos (Nature 2016; Nat Cell Biol 2018) and uncovered epigenetic barriers in nuclear-transfer embryos (Cell Discov 2016; Cell Stem Cell 2018; Stem Cell Rep 2023).

Stage 3. With the accumulation of large public datasets, we explored the cooperative interactions among transcriptional regulators and uncovered the mechanisms underlying epigenetic heterogeneity. We discovered the cooperative roles of H3K9me3 and DNA methylation in establishing imprinting control regions and a novel class of similar control regions (Nat Cell Biol 2022); revealed programmed epigenetic heterogeneity in pre-implantation embryos (Genome Biol 2020a); clarified the non-canonical functional mechanisms of chromatin regulators (Genome Biol 2020b); and inferred genome-wide maps of chromatin-associated condensates (Nat Commun 2024).
Stage 4. With the rapid development of new generation AI technologies, we have combined state-of-the-art approaches to decode chromatin regulatory grammar. Recently, we developed the pre-trained model ChromBERT, which efficiently captures context-dependent regulatory grammar (bioRxiv 2025). We are currently constructing a multi-scale virtual cell model that integrates genome, transcriptome, transcriptional-regulation, epigenome, and spatial-omics data to achieve unified modeling from molecular regulation to cellular phenotypes and organoid functions.