Page tree
Skip to end of metadata
Go to start of metadata

Accucopy (Fan et al. 2020)

A computational method that infers the TCNs (Total Copy Number) and ASCNs (Allele-Specific Copy Number) from a pair of tumor-normal whole-genome sequencing data. It can work on samples with 1X sequencing coverage and 0.1 tumor purity (only 10% tumor cells and the rest are normal cells).

Click this for more details.

Accucopy拓展和改进了Accurity的模型,不仅计算出肿瘤样本的纯度,同时也计算出肿瘤中的Allelic 拷贝数变异。Accucopy/Accurity 自发布以来已被下载使用超1000次,包括哈佛大学医学院、MIT、美国国家癌症研究所(NCI)、国家南方基因组中心等科研院所、和诸多公司。Dockerhub near 1000 downloads。

Accurity (Luo et al. 2018)

A computational method that infers the tumor purity and ploidy from a pair of tumor-normal WGS (whole-genome sequencing) data (WES may work too). Its strength is in low-purity and low-coverage samples. 


This method is now superseded by Accucopy.

Click this for more details.

个性化药物生物信息平台 Personalized Medicine Bioinformatics Portal

To access the bioinformatics data generated in the 个性化药物先导专项, please login It contains >100TB NGS genomics, transcriptomics, histo-imaging, and proteomics data, generated during the drug development pipeline. Login using the same credential as that of (化学库).

It is powered by Bootstrap + Vue.js, PostgreSQL.


PatientStratifier is a software package that stratifies patients based on patient biomarker data and drug response data. Its core is a machine learning module that learns from existing patient biomarker and drug response data.  It also has a component called PatientRecommender that recommends if a patient should be given a drug or not based on its biomarker data.

Contact us for more details.

Parallel workflow to analyze the NGS data

A workflow that analyzes ~900 genomes (cumulative coverage ~4000). Starting from billions of 100bp paired-end reads by Illumina GenomeAnalyzer II, the whole workflow is comprised of several different sub-workflows: the read filtering sub-workflow (whose main program is a custom-written java program based on GATK libraries), the read alignment sub-workflow (main program is bwa [Li et al. 2009], stampy[Lunter et al. 2011] used in test), the base-quality-score-recalibration sub-workflow by GATK [DePristo et al. 2011], the genotype-calling sub-workflow by SAMtools [Li et al. 2009] and GATK [DePristo et al. 2011], the pedigree calling sub-workflow (main program is TrioCaller [Chen et al. 2012]), and other sub-workflows that carry out the variant-filtering and statistics-calculation (Transition/Transversion, allele-frequency, Mendelian inconsistency, population genetic measures such as nucleotide diversity, Hardy-Weinberg equilibrium p-value, linkage disequilibrium). All workflows interact with the vervet postgreSQL database seamlessly through sqlalchemy. The workflows were constructed in a MapReduce manner using APIs from the Pegasus workflow management system to take full advantage of the parallel computing power in most clusters. The end-result is a powerful and flexible system that is capable of utilizing the full power of a computing cluster.

The main code could be found at Substantial java and C++ code are in private git repositories (contact if interested).

This is a job-dependency DAG (direct acyclic graph).

This is the job-duration vs. time diagram, illustrating which job takes most time, from a toy example. The real-data workflows involve 100X or more jobs.

Arabidopsis GWAS web app and database (Seren et al. 2012Huang et al. 2011Atwell et al. 2010)

The most update URL is at (old links from will be re-directed to this GMI site). The MySQL database dump could be downloaded from All the code for the version demonstrated in Huang et al. 2011 could be found in this tarball (Pylons web server, web client using Google web toolkit, etc.).

A second-generation version is at (Seren et al. 2012, source code link). The Arabidopsis polymorphism effort is at

GitHub homepage:

  • No labels