CV

General Information

Full Name Yu S. Huang/黄宇
Languages Chinese, English

Experience

  • 2022 -
    Senior Director of Bioinformatics
    臻和 Genecast Biotechnology Corp Ltd, Shanghai/Beijing/Wuhan, China
    • AI model for MCED (Multi Cancer Early Detection) https://www.genecast.com.cn/solutions/detail?id=29 (Bie et al. Nature Communications).
    • Optimize core bioinformatics algorithms using deep learning, machine learning techniques.
    • Statiscal models for MRD (Minimal Residual Disease) fixed-panel and WES custom-panel products.
    • Teach Bayesian Statistics, Machine/Deep Learning, Julia/Rust Programming.
    • AI computing platform.
  • 2015 - 2021
    Professor/Principal Investigator, Director of Bioinformatics
    Shanghai Institute of Materia Medica), Chinese Academy of Sciences 中科院
    • Develop AI models and algorithms in personalized medicine (target discovery, validation, biomarker) and AI models in drug design and virtual screening.
    • Establish the AI computing platform.
    • Accucopy _ a computational method that infers Allele-specific Copy Number alterations from low-coverage low-purity tumor sequencing Data.
    • Teach Chris Bishop 2006 book "Pattern Recognition and Machine Learning".
    • Teach Julia programming language, Matrix Computing, Optimization.
  • 2014-2015
    Bioinformatics Scientist
    Illumina Inc., San Diego, California, USA
    • MethylSeq BaseSpace app (in C#).
    • UFlow a Directed-Acyclic-Graph Workflow system in C# that boosts Illumina bioinformatics workflow runtime by >50X.
    • Bioinformatics libraries in GOlang that sped up the analysis by >100X.
    • Forensics, cancer, whole-genome, exome competitive analyses.

Education

  • 2010
    PhD in Computational Biology and Bioinformatics
    University of Southern California, Los Angeles, USA
  • 2003
    B.S. in Biology
    Fudan University, Shanghai, China

Open Source Projects

  • 2017-now
    Accucopy
    • A computational method that infers Allele-specific Copy Number alterations from low-coverage low-purity tumor sequencing Data.
  • 2021-now
    eGADA
    • enhanced GADA: a fast segmentation algorithm utilizing the Sparse Bayesian Learning (or Relevance Vector Machine). It can be applied to array intensity data, NGS sequencing data, or any sequential data that displays characteristics of stepwise functions. Enhancements include: 1) a customized Red-Black tree to significantly expedite the final backward elimination step; 2) coded in C++, which is better structured than C; 3) export eGADA.so, a Python API.

Honors and Awards

  • 2023
    • 江苏省省双创人才
  • 2016
    • China Thousand-Talent Program
  • 2015
    • Hundred-Talent Program of Chinese Academy of Sciences
  • 2003-2008
    • Merit Award Fellowship, University of Southern California.
  • 2022
    • Third Award, Computer Programming Contest, Fudan University.
  • 1999-2003
    • People's Scholarship, Fudan University.
  • 1998
    • Third Award, National High School Mathematics Competition of China.
  • 1996
    • Third Award, Junior High School Physics Competition, Shanghai.

Expertise & Skills

  • Modelling & Algorithm
    • Statistical Learning, Machine/Deep Learning, Optimization
  • Programming
    • Rust, Python, C++, Julia, GO, R, Java, SQL, shell, awk
  • Library
    • Parallel-Computing (open-MPI, MPICH), Boost C++ Library, Pegasus workflow system
  • SysAdmin
    • PostgreSQL, MySQL DB, Lustre FS, zfs, NFS, Ceph, LDAP, K8S, Kubeflow, iptables, NGINX

Hobbies

  • Surf, Snowboard, Swim, Reading