Accucopy (Fan et al. 2020)
A computational method that infers the TCNs (Total Copy Number) and ASCNs (Allele-Specific Copy Number) from a pair of tumor-normal whole-genome sequencing data. It can work on samples with 1X sequencing coverage and 0.1 tumor purity (only 10% tumor cells and the rest are normal cells).
Click this for more details.
Accucopy拓展和改进了Accurity的模型,不仅计算出肿瘤样本的纯度,同时也计算出肿瘤中的Allelic 拷贝数变异。Accucopy/Accurity 自发布以来已被下载使用超1000次,包括哈佛大学医学院、MIT、美国国家癌症研究所(NCI)、国家南方基因组中心等科研院所、和诸多公司。Dockerhub near 1000 downloads。
Accurity (Luo et al. 2018)
A computational method that infers the tumor purity and ploidy from a pair of tumor-normal WGS (whole-genome sequencing) data (WES may work too). Its strength is in low-purity and low-coverage samples.
个性化癌症治疗需要针对手术中获取的肿瘤组织样本进行突变分析,从而决定下一步的治疗方向(靶向药物,肿瘤免疫等).肿瘤样本中通常含有非肿瘤细胞(入侵免疫细胞等),肿瘤细胞在样本中的比例就是肿瘤纯度.肿瘤纯度过低会增加下一步分析结果的不确定性,降低下一步治疗的成功概率,根据肿瘤样本准确地估计它的纯度也就成了个性化癌症治疗的关键一步.相对传统的影像方法,超低深度(~0.5X)测序提供了一个快速、廉价、自动的癌症纯度估计路径,但是目前的算法在超低深度数据上预测纯度不是很精确。我们开发的软件,Accurity,依据精细的统计模型设计,在超低深度数据上表现突出。
This method is now superseded by Accucopy.
Click this for more details.
个性化药物生物信息平台 Personalized Medicine Bioinformatics Portal
To access the bioinformatics data generated in the 个性化药物先导专项, please login https://bioinfo.simm.ac.cn/. It contains >100TB NGS genomics, transcriptomics, histo-imaging, and proteomics data, generated during the drug development pipeline. Login using the same credential as that of https://sims.simm.ac.cn/ (化学库).
It is powered by Bootstrap + Vue.js, PostgreSQL.
PatientStratifier
PatientStratifier is a software package that stratifies patients based on patient biomarker data and drug response data. Its core is a machine learning module that learns from existing patient biomarker and drug response data. It also has a component called PatientRecommender that recommends if a patient should be given a drug or not based on its biomarker data.
Contact us for more details.
Parallel workflow to analyze the NGS data
A workflow that analyzes ~900 genomes (cumulative coverage ~4000). Starting from billions of 100bp paired-end reads by Illumina GenomeAnalyzer II, the whole workflow is comprised of several different sub-workflows: the read filtering sub-workflow (whose main program is a custom-written java program based on GATK libraries), the read alignment sub-workflow (main program is bwa [Li et al. 2009], stampy[Lunter et al. 2011] used in test), the base-quality-score-recalibration sub-workflow by GATK [DePristo et al. 2011], the genotype-calling sub-workflow by SAMtools [Li et al. 2009] and GATK [DePristo et al. 2011], the pedigree calling sub-workflow (main program is TrioCaller [Chen et al. 2012]), and other sub-workflows that carry out the variant-filtering and statistics-calculation (Transition/Transversion, allele-frequency, Mendelian inconsistency, population genetic measures such as nucleotide diversity, Hardy-Weinberg equilibrium p-value, linkage disequilibrium). All workflows interact with the vervet postgreSQL database seamlessly through sqlalchemy. The workflows were constructed in a MapReduce manner using APIs from the Pegasus workflow management system to take full advantage of the parallel computing power in most clusters. The end-result is a powerful and flexible system that is capable of utilizing the full power of a computing cluster.
The main code could be found at https://github.com/polyactis/pegaflow, https://github.com/polyactis/pymodule. Substantial java and C++ code are in private git repositories (contact polyactis@gmail.com if interested).
This is a job-dependency DAG (direct acyclic graph).
This is the job-duration vs. time diagram, illustrating which job takes most time, from a toy example. The real-data workflows involve 100X or more jobs.
Arabidopsis GWAS web app and database (Seren et al. 2012, Huang et al. 2011, Atwell et al. 2010)
The most update URL is at http://arabidopsis.gmi.oeaw.ac.at:5000/ (old links from http://arabidopsis.usc.edu/ will be re-directed to this GMI site). The MySQL database dump could be downloaded from http://arabidopsis.gmi.oeaw.ac.at/public_db_dump.tar.gz. All the code for the version demonstrated in Huang et al. 2011 could be found in this tarball (Pylons web server, web client using Google web toolkit, etc.).
A second-generation version is at http://gwas.gmi.oeaw.ac.at/index.html (Seren et al. 2012, source code link). The Arabidopsis polymorphism effort is at https://cynin.gmi.oeaw.ac.at/home/resources/atpolydb.
GitHub homepage: https://github.com/polyactis