Page tree
Skip to end of metadata
Go to start of metadata

Yu Huang

黄宇

    I have been a Senior Director of Bioinformatics at Genecast Biotechnology Corp. 臻和生物科技 since early 2022.

  • Models and algorithms in MRD (Molecular Residual Disease) and MCED (多癌种早筛);
  • Bioinformatics Optimization;

I had been a Principal Investigator at SIMM (中国科学院上海药物研究所) from 2015 till 2021. My team focused on developing AI models, algorithms, distributed computing platforms, and databases in bioinformatics for personalized medicine (target discovery, validation, biomarker), computer-aided drug design and virtual screening.

  • 基于JT-变分自编码器 (JT-VAE) 的药物生成模型和基于CNN的小分子筛选模型 Deffini;
  • 从多组学数据预测矮小症药物响应的模型;
  • 从肿瘤组织测序数据估计肿瘤纯度和拷贝数突变的统计模型 Accurity & Accucopy in C++ & Rust
  • Illumina公司的DNA甲基化云计算软件 MethylSeq
  • Pegaflow: a distributed workflow system, that harness the entire computing cluster to analyze thousands of NGS samples;
  • eGADA: a fast genomic segmentation algorithm utilizing Sparse Bayesian Learning and Red-Black Tree;
  • 第一个非人类 GWAS (Nature 2010): base-calling, linear mixed model, Google-Web-Toolkit, PostgresDB, parallel computing;

     Previously at Illumina Inc, I was the main developer behind the BaseSpace App, MethylSeq, which is the bioinformatics cloud software to detect methylated cytosines in DNA from bisulfite treated next-gen sequencing data. To better utilize the computing resource of a multi-core system, I wrote a new Directed-Acyclic-Graph-based workflow system in C# and increased the MethylSeq running time by >5X. I wrote some bioinformatics software in GO, which speeds up some analyses considerably (>50X vs Python). I was the lead bioinformatician in inter-disciplinary projects involving cancer, forensics, exome, whole-genome sequencing. Before Illumina, I did a three-and-a-half-year PostDoc with Prof. Nelson Freimer at Human Genetics, UCLA (Nov 2010-Mar 2014), working on trait mapping and population genetic projects in vervet monkeys, analyzing the whole-genome DNA sequences from >700 monkeys of a vervet pedigree and >100 wild population monkeys. In Oct 2010, I completed my PhD in Computational Biology and Bioinformatics at USC, working primarily on association mapping and population genetics of Arabidopsis thaliana, under the supervision of Magnus Nordborg. I had also worked on the topic of gene function/network inference from gene expression data through graph theory.

Fascinated with computers ever since I tested my first BASIC program on an Intel-8088 PC in my 8th grade, I learnt C/C++, PostgreSQL DB, Java, Python, and everything about Linux in my undergraduate. Being in a PhD program founded by a mathematician (M.S. Waterman), I learnt all I can about statistics and probability.

GitHub: https://github.com/polyactis

ORCID: https://orcid.org/0000-0001-5967-4948


Blog Posts

Email

polyactis at gmail.com

Education

2003.08 - 2010.10 University of Southern California, Los Angeles, Ph.D. in Bioinformatics

1999.09 - 2003.07 复旦大学 Fudan University, Shanghai, B.S. in Biological Sciences

1996.09 - 1999.07 川沙中学 Shanghai Chuansha Senior High School

Employment


2022 - Senior Director of Bioinformatics, 臻和生物科技 Genecast Biotechnology Corp. Ltd. China

2015 - 2021 Director of Bioinformatics, Professor, Shanghai Institute of Material Medica

2014 - 2015 Bioinformatics Scientist, Illumina Inc. San Diego, USA

2010 - 2014 PostDoc, University of California Los Angeles, USA

Research Directions

  • Statistical models and algorithms in MRD and MCED (Multi-Cancer Early Detection).
  • Big-data analytical platforms
  • Bioinformatics models and algorithms in personalized medicine. 个性化药物相关的生物信息模型和算法
  • AI models in drug design and target discovery. 药物设计和药物靶标发现的人工智能模型
  • Statistical models in drug-biomarker discovery. 药物临床标志物的统计模型.

Expertise

Expertise Bioinformatics, Machine/Deep/Statistical Learning, AI Models & Algorithms, Optimization, Distributed Computing, Population Genetics

Programming (https://github.com/polyactis)

Daily: Python, C/C++, Rust, SQL, shell, R, awk

Occasional: GO, C#, Vue.js, Java, Julia, PHP, Perl, FORTRAN, Pascal, MATLAB

Library Parallel-Computing (open-MPI, MPICH), Boost C++ Library, Pegasus workflow system

SysAdmin: PostgreSQL DB, Lustre FS, zfs, LDAP, K8S, Kubeflow, iptables, NGINX, NFS, Ceph, MySQL


Awards & Honours

  • 2016 国家高层次人才计划B
  • 2015 国家高层次人才计划A
  • 2003-2008 USC Graduate School Merit Award
  • 1999-2003 People’s Scholarship, Fudan University
  • 2000 Computer Programming Contest, Fudan University,  3rd Award
  • 1999 National High-School Mathematics Contest, 3rd Award

Notable works

  • The first non-human primate (vervet monkeys) population genomic resource and trait mapping in complex pedigrees (>700 members).
  • DNA-methylation BaseSpace App at Illumina Inc.
  • The first non-human Genome Wide Association Studies (GWAS), Nature, 2010
  • Selected Publications

Hobbies



  • No labels