Accurity (Luo et al. 2018)
A software that infers the tumor purity and ploidy from a pair of tumor-normal whole-genome sequencing data. It differentiates from others by performing well in low-purity and low-coverage samples.
目前研发计算癌症clonal evolution 的算法。
Check the Accurity page for more details.
To access the bioinformatics data generated in the 个性化药物先导专项, please login https://bioinfo.simm.ac.cn/. It contains >100TB NGS genomics, transcriptomics, histo-imaging, and proteomics data, generated during the drug development pipeline.
PatientStratifier is a software package that stratifies patients based on patient biomarker data and drug response data. Its core is a machine learning module that learns from existing patient biomarker and drug response data. It also has a component called PatientRecommender that recommends if a patient should be given a drug or not based on its biomarker data.
Contact us for more details.
Parallel workflow to analyze the NGS data (Huang et al. 2015)
A workflow that analyzes ~900 genomes (cumulative coverage ~4000). Starting from billions of 100bp paired-end reads by Illumina GenomeAnalyzer II, the whole workflow is comprised of several different sub-workflows: the read filtering sub-workflow (whose main program is a custom-written java program based on GATK libraries), the read alignment sub-workflow (main program is bwa [Li et al. 2009], stampy[Lunter et al. 2011] used in test), the base-quality-score-recalibration sub-workflow by GATK [DePristo et al. 2011], the genotype-calling sub-workflow by SAMtools [Li et al. 2009] and GATK [DePristo et al. 2011], the pedigree calling sub-workflow (main program is TrioCaller [Chen et al. 2012]), and other sub-workflows that carry out the variant-filtering and statistics-calculation (Transition/Transversion, allele-frequency, Mendelian inconsistency, population genetic measures such as nucleotide diversity, Hardy-Weinberg equilibrium p-value, linkage disequilibrium). All workflows interact with the vervet postgreSQL database seamlessly through sqlalchemy/elixir. The workflows were constructed in a MapReduce manner using APIs from the Pegasus workflow management system to take full advantage of the parallel computing power in most clusters. The end-result is a powerful and flexible system that is capable of utilizing the full power of a computing cluster. The main code could be found at http://code.google.com/p/vervet-web/. Substantial java and C++ code are in private git repositories (contact firstname.lastname@example.org if interested).
This is a job-dependency DAG (direct acyclic graph).
This is the job-duration vs. time diagram, illustrating which job takes most time, from a toy example. The real-data workflows involve 100X or more jobs.
Arabidopsis GWAS web app and database (Seren et al. 2012, Huang et al. 2011, Atwell et al. 2010)
The most update URL is at http://arabidopsis.gmi.oeaw.ac.at:5000/ (old links from http://arabidopsis.usc.edu/ will be re-directed to this GMI site). The MySQL database dump could be downloaded from http://arabidopsis.gmi.oeaw.ac.at/public_db_dump.tar.gz. All the code for the version demonstrated in Huang et al. 2011 could be found in this tarball (Pylons web server, web client using Google web toolkit, etc.).
A second-generation version is at http://gwas.gmi.oeaw.ac.at/index.html (Seren et al. 2012, source code link). The Arabidopsis polymorphism effort is at https://cynin.gmi.oeaw.ac.at/home/resources/atpolydb.